Skip to main content

Florida Wildlife Camera Trap Dataset



Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah, Florida Wildlife Camera Trap Dataset, IEEE Conference on Computer Vision and Pattern Recognition, CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Workshop, 2021.


Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research. Minimal human interference required to operate camera traps allows capturing unbiased species activities. Several studies – based on human and wildlife interactions, migratory patterns of various species, risk of extinction in endangered populations – are limited by the lack of rich data and the time-consuming nature of manually annotating trail camera imagery. We introduce a challenging wildlife camera trap classification dataset collected from two different locations in Southwestern Florida, consisting of 104,495 images featuring visually similar species, varying illumination conditions, skewed class distribution, and including samples of endangered species, i.e. Florida panthers. Experimental evaluations with ResNet-50 architecture indicate that this image classification-based dataset can further push the advancements in wildlife statistical modeling.

Problem & Motivation

Trail camera imagery has a wide variety of biological and ecological applications in conservation. Wildlife populations can be monitored via camera trap studies, yet processing the data generated in these studies is an enormous task. Recent studies have demonstrated that deep neural networks can automatically identify images of different wildlife species with accuracy on par with human volunteers, saving over 8.4 years at 40 hours per week of volunteer labeling effort. AI can dramatically improve the speed and efficiency of species identification and data management. In addition, contemporary models used to classify images from trail cameras, however, do not perform well when used on images from a different location compared to the training data, which implies that the models are non-transferable.

  • We propose a challenging real-world dataset to promote development in this area featuring: visually similar species, varying illumination conditions, skewed class distribution.
  • The dataset includes samples of endangered Florida panthers and exhibits the potential to be invaluable to the conservation of this rare species.
  • A preliminary model trained on dataset illustrating usability of the data to the research community.

Dataset Details

Our dataset is composed of 104,495 images from both of our trail camera study locations combined – Corkscrew Swamp (Corkscrew) and Okaloacoochee Slough State Forest (OKSSF). The dataset contains over 2500 images of Florida panthers. The locations of the survey were chosen due to their proximity to major roads and the relatively high level of panther activity in these areas. Figure 1 displays images from the dataset. The first two columns exhibit samples from the Corkscrew location, while the next set of columns contain specimens from the OKSSF location.


  • Table 1: A comprehensive list providing the number of images collected per species from two different locations in our dataset.
  • Figure 2: Location-based distribution of images per species.
  • Figure 3: Specimen images depicting the highlighted challenges associated with the dataset: (1) Motion blur, (2) Occlusion, (3) Camouflage, (4) Illumination issues, (5) Excessive proximity of the animals to the camera traps making it impossible to identify salient features, (6) Imbalance in the dataset due to substantial difference in the numbers of images for classes such as cattle/deer, (7/8) Multiple images with hardly any difference captured by continuous camera triggers, and (9) View-point based deception re- resulting in bobcats appearing similar to panthers.

Proposed Baseline

We train a baseline model using ResNet-50 architecture. The dataset comprising 104,495 samples is split into train, validation, and test set in 70%-10%-20% proportion. We have also employed random cropping, horizontal flip, and rotation for data augmentation. Images resized to 224 × 224px are processed through the model in a batch size of 64, using a learning rate of 1e-3. We achieve a baseline test accuracy of 78.75% on our ResNet-based model.

  • Figure 4: Confusion matrices illustrating results on the test data for 22 classes.
  • Figure 5: t-SNE visualization of randomly initialized features (left) and learned embeddings (right) for test images in the dataset.