Skip to main content

Final Oral Examination for Doctor of Philosophy (Computer Science)

Ishan Dave

Wednesday, October 30, 2024
1:00PM – 2:00PM
HEC 101A
[Bifold]

Dissertation

Video understanding involves tasks such as action recognition, video retrieval, and human pose propagation, which are crucial for applications like surveillance, surgical video analysis, sports analysis, and content recommendation. Progress in this domain has been largely driven by advancements in deep learning, relying on largescale labeled datasets. However, the process of labeling these video datasets requires a tremendous amount of human effort and time. Another significant issue in video understanding is the leakage of private attributes, such as skin color, race, gender, and clothing information, which is undesirable when recognizing a person’s actions.

This dissertation addresses these fundamental challenges in video understanding through several contributions. We introduce contrastive learning-based and temporal pretext task-based self-supervised frameworks to learn from unlabeled video datasets. To leverage smaller labeled datasets, we also propose semi-supervised learning frameworks for both coarse-grained and fine-grained action recognition. Additionally, we tackle privacy preservation in action recognition by introducing a selfsupervised privacy removal objective.