Skip to main content

UCF101-DS: Action Recognition for Real-World Distribution Shifts



Schiappa, Madeline C and Biyani, Naman and Kamtam, Prudvi and Vyas, Shruti and Palangi, Hamid and Vineet, Vibhav and Rawat, Yogesh. Large-scale Robustness Analysis of Video Action Recognition Models. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023).

Dataset Overview

Existing benchmark datasets in real-world distribution shifts are generally synthetically generated via augmentations to simulate real-world shifts such as weather and camera rotation. The UCF101-DS dataset consists of real-world distribution shifts from user-generated videos without synthetic augmentation. It has videos for 47 UCF-101 classes with 63 different distribution shifts that can be categorized into 15 categories. A total of 536 unique videos split into a total of 4,708 clips. Each clip ranges from 7 to 10 seconds long.

Figure 1. The number of clips per activity class from the UCF101 dataset.

Distribution Shifts

The types of distribution shifts can be categorized into the following:

  • Actor: A change in the actor that is not typical, such as a dog on a balance beam or a cat on a computer.
  • Behavioral: The change in the typical behavior, this refers to pranks and/or reaction videos.
  • Crowd: When a video is somewhat occluded by a large crowd performing and/or watching the activity.
  • Ethnicity: A change in ethnicity from the original distribution in UCF101.
  • Indoor Scenery: An activity that is typically performed outside, is performed inside or in a different indoor location than is typical.
  • Outdoor Scenery: An activity that is typically performed inside, is performed outside or in a different outdoor location than is typical.
  • Occluded: When the activity is only partially visible due to an object interfering with the view.
  • Point-of-View: The point of view (POV) is different from the POV in videos in UCF101. This includes watching the activity on TV or from the perspective of the actor.
  • Speed: A change in the speed of the video as designed by the user who generated the video.
  • Style: The style of the video which can include animation, text on screen, filters, color, and similar changes.
  • Weather: Different weather patterns that may occlude activities such as fog and/or rain.
Figure 2. Number of clips per higher-level category of distribution shifts.

Annotation Format

All the annotations are provided in the .csv format.

  • label: The activity class based on UCF101 action classes.
  • video_id: The video ID for the unique video the current clip was derived from.
  • filename: The filename for the specific clip. To get the full filepath, append the “<label>/<filename>’’.
  • shift: The type of distribution shift present in the video. This is based on the key words used to find the video.
  • category: The high-level category the distribution shift belongs to.

Release Information

We release zips and a separate squashfs that can be downloaded off our server directly without having to extract separate zips.

Squashfs images can be unpacked in windows with 7-zip. The main fork 7-zip doesn’t support lz4 compression, but this one does.

In linux, they can be directly mounted or mounted with squashfuse. For example, either:

  • mount directly with the command (no additional software needed):

    sudo mount -o loop,ro image1.sq /mnt/image1

  • mount with squashfuse:

    mkdir image1; squashfuse image1.sq image1

The .sq file is squashfs and contains identical data to the .zip file (2.8G)


The data is released for non-commercial research purpose only. We downloaded videos that are the public copyright licenses named Creative Commons.

Contact Information

In case of any queries, please contact one of:

  • Dr. Yogesh Singh Rawat (
  • Prudvi Kamtam (
  • Madeline Chantry (

If you use this dataset for your research, please cite the following papers:

title={Large-scale Robustness Analysis of Video Action Recognition Models},
author={Schiappa, Madeline C and Biyani, Naman and Kamtam, Prudvi and Vyas, Shruti and Palangi, Hamid and Vineet, Vibhav and Rawat, Yogesh},
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},

title={UCF101: A dataset of 101 human actions classes from videos in the wild},
author={Soomro, Khurram and Zamir, Amir Roshan and Shah, Mubarak},
journal={arXiv preprint arXiv:1212.0402},