Dataset
All data is only for research purposes, unless stated differently. Please make sure to reference the authors properly when using the data.
Video Anomaly Dection Dataset
|
UCF-Crime dataset is a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed
real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary,
Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety.
This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group.
Second, for recognizing each of 13 anomalous activities.
Real-world Anomaly Detection in Surveillance Videos
Waqas Sultani,
Chen Chen,
Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[Paper]
[Video Presentation]
[Project Website]
[Note]
[Code]
[Download the dataset: copy this URL: www.crcv.ucf.edu/data1/chenchen/UCF_Crimes.zip]
(Note: The "Anomaly_Train.txt" file in the zip file is corrupted, please down it here: Anomaly_Train.txt)
Option 2: Download the dataset from Dropbox (multiple files): Link
|
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
|
Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets. However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark –VIGOR– for crossView Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and propose a novel end-to-end framework to localize the query in a coarse-to-fine manner. Apart from the image-level retrieval accuracy, we also evaluate the localization accuracy in terms of the actual distance (meters) using the raw GPS data.
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
Sijie Zhu, Taojiannan Yang, Chen Chen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Paper]
[Dataset and Code]
|
Cross-View Geolocalization Dataset
|
UCF cross-view geolocalization dataset is created for the geo-localization task using cross-view image matching.
The dataset has street view and bird's eye view image pairs around downtown Pittsburg,
Orlando and part of Manhattan.
There are 1,586, 1,324 and 5,941 GPS locations in Pittsburg, Orlando and Manhattan, respectively.
We utilize DualMaps to generate side-by-side street view and bird's eye view images at each GPS location with the same heading direction.
The street view images are from Google and the overhead 45 degree bird's eye view images are from Bing.
For each GPS location, four image pairs are generated with camera heading directions of 0 degree, 90 degree, 180 degree and 270 degree.
In order to learn the deep network for building matching, we annotate corresponding buildings in every street view and bird's eye view image pair.
Cross-View Image Matching for Geo-localization in Urban Environments
Yicong Tian, Chen Chen, Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[Paper]
[Project (Download Cross-view dataset and code)]
|
UTD-MHAD Dataset
|
UTD-MHAD dataset was collected as part of our research on human action recognition using fusion of depth and inertial sensor data.
The objective of this research has been to develop algorithms for more robust human action recognition using fusion of data from differing modality sensors.
The UTD-MHAD dataset consists of 27 different actions: (1) right arm swipe to the left, (2) right arm swipe to the right, (3) right hand wave, (4) two hand front clap, (5) right arm throw, (6) cross arms in the chest, (7) basketball shoot, (8) right hand draw x, (9) right hand draw circle (clockwise), (10) right hand draw circle (counter clockwise), (11) draw triangle, (12) bowling (right hand), (13) front boxing, (14) baseball swing from right, (15) tennis right hand forehand swing, (16) arm curl (two arms), (17) tennis serve, (18) two hand push, (19) right hand knock on door,
(20) right hand catch an object, (21) right hand pick up and throw, (22) jogging in place, (23) walking in place, (24) sit to stand, (25) stand to sit, (26) forward lunge (left foot forward), (27) squat (two arms stretch out).
UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor
Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz
IEEE International Conference on Image Processing (ICIP), 2015
[Paper]
[UTD Multimodal Human Action Dataset Website]
|
|