Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



Data Sets

UCF50 - Action Recognition Data Set


Click here to check the published results on UCF50 (updated September 12, 2012)

UCF50 is an action recognition data set with 50 action categories, consisting of realistic videos taken from youtube. This data set is an extension of YouTube Action data set (UCF11) which has 11 action categories.

Most of the available action recognition data sets are not realistic and are staged by actors. In our data set, the primary focus is to provide the computer vision community with an action recognition data set consisting of realistic videos which are taken from youtube. Our data set is very challenging due to large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, illumination conditions, etc. For all the 50 categories, the videos are grouped into 25 groups, where each group consists of more than 4 action clips. The video clips in the same group may share some common features, such as the same person, similar background, similar viewpoint, and so on.

UCF50 data set's 50 action categories collected from youtube are: Baseball Pitch, Basketball Shooting, Bench Press, Biking, Biking, Billiards Shot,Breaststroke, Clean and Jerk, Diving, Drumming, Fencing, Golf Swing, Playing Guitar, High Jump, Horse Race, Horse Riding, Hula Hoop, Javelin Throw, Juggling Balls, Jump Rope, Jumping Jack, Kayaking, Lunges, Military Parade, Mixing Batter, Nun chucks, Playing Piano, Pizza Tossing, Pole Vault, Pommel Horse, Pull Ups, Punch, Push Ups, Rock Climbing Indoor, Rope Climbing, Rowing, Salsa Spins, Skate Boarding, Skiing, Skijet, Soccer Juggling, Swing, Playing Tabla, TaiChi, Tennis Swing, Trampoline Jumping, Playing Violin, Volleyball Spiking, Walking with a dog, and Yo Yo.

The data set can be downloaded by clicking here.

If you use this data set, please refer to the following paper:
Kishore K. Reddy, and Mubarak Shah, Recognizing 50 Human Action Categories of Web Videos, Machine Vision and Applications Journal (MVAP), September, 2012.

For questions regarding this data set, please contact Kishore Reddy.



Results on UCF50

If you happen to use UCF50, send us an email with the following details and we will update our webpage with your results.

Performance Experimental Setup Paper
76.90%
Leave One Group Out Cross-validation (25 cross-validations)
Reddy and Shah.
(MVAP), 2012
57.90%
5-fold group-wise cross-validation
Sadanand and Corso.
(CVPR), 2012
76.40%*
Video Wise Cross-validation (*Since videos belonging to a group are obtained from a single long video, similar videos can end up in both training and testing in "video-wise cross-validation" leading to high performance)
Sadanand and Corso.
(CVPR), 2012
81.03%*
2/3 training and 1/3 testing for each class (*From the details given in the paper, we are not sure if videos belonging to the same group are kept seperate in training and testing sets and the paper does not give details on number of cross-validations)
Todorovic.
(ECCV), 2012
73.70%
Leave One Group Out Cross-validation (25 cross-validations)
Solmaz, et al.
(MVAP), 2012
72.60%
Leave One Group Out Cross-validation (25 cross-validations)
Kliper-Gross, et al.
(ECCV), 2012
Note: It is very important to keep the videos belonging to the same group seperate in training and testing. Since the videos in a group are obtained from single long video, sharing videos from same group in training and testing sets would give high performance.

Related Publication

Kishore K. Reddy, and Mubarak Shah, Recognizing 50 Human Action Categories of Web Videos, Machine Vision and Applications Journal (MVAP), September 2012.