Skip to main content

In a collaborative effort with Microsoft Research and SRI International, our team has developed and released a series of innovative benchmarks to evaluate and improve the robustness of visual perception models. These benchmarks focus on critical challenges such as distribution shifts, perturbations, occlusions, and low-resolution conditions, offering insights into the performance of video action recognition and video-language models in real-world scenarios.

Notable contributions include the UCF101-DS dataset, which highlights real-world distribution shifts across user-generated videos, and perturbation-focused benchmarks such as HMDB51-P, UCF101-P, and Kinetics400-P, introducing 90 distinct perturbations to test model resilience under diverse conditions. Additionally, occlusion benchmarks like O-UCF, Real-OUCF, and K400-O offer controlled and realistic scenarios to evaluate the impact of visual obstructions. The TinyVIRAT dataset pushes the boundaries further by examining low-resolution video action recognition in challenging surveillance-like settings. These efforts, presented at premier venues such as CVPR and NeurIPS, aim to set new standards in understanding and addressing robustness in visual perception.

For more details, visit the project pages or access the datasets, code, and publications linked here: https://www.crcv.ucf.edu/robustness-benchmarking-datasets.