A capsule network provides an effective way to model part-to-whole relationships between entities and allows to learn viewpoint invariant representations. Through this improved representation learning, capsule networks are able to achieve good performance in multiple domains with a drastic decrease in the number of parameters. Recently, capsule networks have shown state-of-the-art results for human action localization in a video, object segmentation in medical images, and text classification. This tutorial will provide a basic understanding of capsule network, and we will discuss its use in a variety of computer vision tasks such as image classification, object segmentation, and activity detection.