CRCV team ‘knights’ got first place at Visual Inductive Priors for Data-Efficient Action Recognition challenge, ICCV 2021. The team includes graduate students Ishan Dave, Brandon Clark, Rohit Gupta, and undergraduate intern Naman Biyani from the Indian Institute of Technology Kanpur. https://vipriors.github.io/challenges/#action-recognition
Data is fueling deep learning, yet it is costly to gather and to annotate. Training on massive datasets has a huge energy consumption adding to our carbon footprint. In addition, there are only a select few deep learning behemoths which have billions of data points and thousands of expensive deep learning hardware GPUs at their disposal. This challenge focuses on how to pre-wire deep networks with generic visual inductive innate knowledge structures, which allows incorporating hard-won existing generic knowledge. Visual inductive priors are data-efficient: what is built-in no longer has to be learned, saving valuable training data.
Our proposed solution achieves 73% on Kinetics400ViPriors test set, which is the best among all the other entries. The approach has 3 main components: state-of-the-art TCLR self-supervised pretraining, video transformer models, and optical flow modality.