
CRCV | Center for Research in Computer Vision
University of Central Florida
4328 Scorpius St.
HEC 221
Orlando, FL 32816-2365
Phone: 407-823-1047
E-mail: chen.chen@crcv.ucf.edu
Website: https://www.crcv.ucf.edu/chenchen/
[Special Issue][Call for Papers] IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE-JSTARS) is editing a special issue on “Semantic Extraction and Fusion of Multimodal Remote Sensing Data: Algorithms and Applications“. Submission window: January 1, 2021 – June 30, 2021
[Special Issue][Call for Papers] IEEE Journal of Biomedical and Health Informatics (IEEE-JBHI) is editing a special issue on “Emerging IoT-driven Smart Health: from Cloud to Edge“. Submission deadline: 1 March, 2021
- Dr. Chen Chen is currently an Assistant Professor at the Center for Research in Computer Vision (CRCV), University of Central Florida. His main research interests are in the area of computer vision, image and video processing, and machine learning.
- He was an Assistant Professor in the Department of Electrical and Computer Engineering at University of North Carolina at Charlotte from August 2018 to June 2021.
- He held a Postdoctoral Research Associate position at the Center for Research in Computer Vision (CRCV), University of Central Florida, from July 2016 to June 2018.
- He received the Ph.D. degree in Electrical Engineering at The University of Texas at Dallas in May 2016. His advisor is Dr. Nasser Kehtarnavaz. His co-advisor is Dr. Roozbeh Jafari at Texas A&M University. He received the M.S. degree in Electrical Engineering from Mississippi State University in 2012. His thesis advisor is Dr. James E. Fowler.
- He received the David Daniel Fellowship Award (Best Doctoral Dissertation Award for ECS) from University of Texas at Dallas in 2016.
- He is an Associate Editor for the following journals:
- IEEE Journal on Miniaturization for Air and Space Systems
- Journal of Real-Time Image Processing
- Signal, Image and Video Processing
- Sensors Journal
- Computer Vision
- Machine Learning
- Image and Video Processing
Principal Investigator
![]() |
Chen Chen | PhD, Assistant Professor |
Current Students
I am grateful for the opportunity to work with a group of exceptional students.
|
Sijie Zhu |
|
Taojiannan Yang |
|
Ce Zheng |
![]() |
Matias Mendieta |
![]() |
Wenhan Wu |
![]() |
Fatema Jannat |
Former Students
Thesis -- "Object Detection in Aerial Imagery"
Thesis -- "Efficient Unsupervised Monocular Depth Estimation Using Attention Guided Generative Adversarial Network"
Project title -- "Recognizing Exercises and Counting Repetitions in Real Time"
Visiting Students/Scholars
Current/Admitted Students
If you are a graduate/undergraduate student at UCF, please feel free to email me (include your CV) to set up a meeting and explore the possibility of joining my group or doing research project with me.Prospective Students
I am always looking for highly motivated students to join my lab. Please directly apply to the CS graduate program and indicate my name in your application form and research statement. You will find PhD application information here. General graduate admission information can be found here.Please contact me with your CV (PDF format) if you have a background in one or more of the following areas: computer vision, image and video processing, deep learning. I will consider your CV and possibly contact you to set up an appointment to discuss possible research and funding opportunities.
Research Sponsors
![]() |
Current Research Projects
NSF: CNS Core: UbiVision: Ubiquitous Machine Vision with Adaptive Wireless Networking and Edge Computing
(Funding Agency: NSF; Award #: 1910844; Amount: $419,794; PI: Tao Han, Co-PI: Chen Chen; 10/2019 - 09/2022)
![]() |
This project aims to realize an ambitious goal: ubiquitous machine vision (UbiVision) whose ultimate objective is to provide a platform that enables people from all over the world to share their smart cameras, which can be Uber, Airbnb, or Mobike in the context of smart cameras. For example, a person in New York City can “see” what is happening in Los Angeles via a wearable camera shared by another person located in Los Angeles. However, sharing the scenes captured by cameras directly will incur serious privacy issues. Moreover, the raw visual data may result in excessive traffic loads that congest the network and downgrade the system performance. To preserve privacy and reduce traffic loads, UbiVision performs visual data analysis on smart cameras and edge servers, which allows its customers to only share information extracted from camera scenes, e.g., how many people are queuing outside an Apple store for a new iPhone, or selected objects in the scene, e.g., a vagrant husky for the purpose of the lost and found. This project studies enabling technologies for realizing UbiVision. The UbiVision framework consists of three main research tasks. In this framework, smart cameras, radio access networks, and edge servers are recognized as infrastructure that can support multiple machine vision services through adaptive end-to-end multi-domain resource orchestration. The PIs envision that a machine vision service provider (MVSP) will own and manage a virtual network consisting of a radio access network and edge servers and have the access to ubiquitous cameras via camera sharing agreements with camera owners. Under this scenario, MVSPs are challenged to dynamically manage highly coupled resources and functions across multiple technology domains: 1) camera functions such as image preprocessing and embedded machine vision; 2) network resources in the radio access network; 3) computation resources and machine vision on the edge servers. To solve the problem, the PIs propose an interdisciplinary research project which integrates techniques and perspectives from wireless networking, computer vision, and edge computing in designing and optimizing UbiVision. Papers:
|
MLWiNS: Democratizing AI through Multi-Hop Federated Learning Over-the-Air
(Funding Agency: NSF & Intel Corporation; Award #: 2003198; Amount: $446,667 (NSF) + $223,333 (Intel); PI: Pu Wang, Co-PIs: Chen Chen, Minwoo Lee, Mohsen Dorodchi; 07/2020 - 06/2023)
![]() |
Federated learning (FL) has emerged as a key technology for enabling next-generation privacy-preserving AI at-scale, where a large number of edge devices, e.g., mobile phones, collaboratively learn a shared global model while keeping their data locally to prevent privacy leakage. Enabling FL over wireless multi-hop networks, such as wireless community mesh networks and wireless Internet over satellite constellations, not only can augment AI experiences for urban mobile users, but also can democratize AI and make it accessible in a low-cost manner to everyone, including people in low-income communities, rural areas, under-developed regions, and disaster areas. The overall objective of this project is to develop a novel wireless multi-hop FL system with guaranteed stability, high accuracy and fast convergence speed. This project is expected to advance the design of distributed deep learning (DL) systems, to promote the understanding of the strong synergy between distributed computing and distributed networking, and to bridge the gap between the theoretical foundations of distributed DL and its real-life applications. The project will also provide unique interdisciplinary training opportunities for graduate and undergraduate students through both research work and related courses that the PIs will develop and offer. Papers:
|
Cross-View Image Geo-localization
![]() |
Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. In scenarios where GPS signal is noisy, image geo-localization can provide additional information to achieve fine-grained localization. Street-to-aerial geo-localization is also proved effective on city-scale street navigation. These practical applications make cross-view image geo-localization an important and attractive research problem in the computer vision community. Existing works assume 1) the alignment between street and aerial views is available and 2) each query ground-view image has one corresponding reference aerialview image whose center is exactly aligned at the location of the query image. However, in practice, these assumptions may not hold true. This project aims to study three problems: 1) how the alignment information would affect the retrieval model in terms of performance; 2) without assuming the inference image pairs are aligned, how to effectively improve the retrieval performance; 3) how to effectively perform geo-localization in a more realistic setting that breaks the one-to-one correspondence. Papers:
|
Human Motion Analysis (Action Recognition, Detection, Segmentation, and Prediction; 2D/3D Pose Estimation)
![]() |
Visual analysis of human motion is one of the most active research topics in computer vision. This strong interest is driven by a wide spectrum of promising applications in many areas such as smart surveillance, human-computer interaction, augmented reality (AR), virtual reality (VR), etc. Human motion analysis concerns the detection, tracking and recognition of people, and more generally, the understanding of human behaviors, from sensor data (e.g., images, videos, etc.) involving humans. We aim to develop novel AI algorithms to analyze human motions from all the levels of actions, intentions and skills to study augmented human abilities. Papers:
|
COPYRIGHT: The copyright of the following materials belongs to corresponding publishers. They are provided only for research and educational use that does not conflict to the interests of the publishers.
The h5-index is the h-index for articles published in the last 5 complete years. According to Google Scholar Metrics, several conferences in computer vision and machine learning are ranked in the top 100 in the h5-index rankings:
Preprint
![]() |
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
|
![]() |
3D Human Pose Estimation with Spatial and Temporal Transformers
|
![]() |
Consistency-based Active Learning for Object Detection
|
![]() |
Degrade is Upgrade: Learning Degradation for Low-light Image Enhancement
|
![]() |
A Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images
|
![]() |
Deep Learning-Based Human Pose Estimation: A Survey
|
![]() |
A3D: Adaptive 3D Networks for Video Action Recognition
|
2021
![]() |
Visual Explanation for Deep Metric Learning
|
![]() |
TransBTS: Multimodal Brain Tumor Segmentation Using Transformer
|
![]() |
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
|
![]() |
Learning Normal Dynamics in Videos with Meta Prototype Network
|
![]() |
BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
|
![]() |
ArcNet: Series AC Arc Fault Detection Based on
Raw Current and Convolutional Neural Network |
![]() |
Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based |
![]() |
Efcient Unsupervised Monocular Depth Estimation Using Attention
Guided Generative Adversarial Network |
![]() |
Bilateral Attention Decoder: A Lightweight Decoder for Real-time Semantic Segmentation |
![]() |
Multi-level Memory Compensation Network for Rain Removal via Divide-and-Conquer Strategy |
![]() |
Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation
|
![]() |
Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection
|
2020
![]() |
Decomposition Makes Better Rain Removal: An Improved Attention-guided Deraining Network
|
![]() |
GradAug: A New Regularization Method for Deep Neural Networks
|
![]() |
Cross-directional Feature Fusion Network for Building Damage Assessment from Satellite Imagery
|
![]() |
Detecting Plant Invasion in Urban Parks with Aerial Image Time Series and Residual Neural Network
|
![]() |
NDVI-Net: A Fusion Network for Generating High-Resolution Normalized Difference |
![]() |
Efficient Deep Learning of Non-local Features for Hyperspectral Image Classification |
![]() |
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
|
![]() |
FedAir: Towards Multi-hop Federated Learning Over-the-Air |
![]() |
Video Anomaly Detection for Smart Surveillance
|
![]() |
Action Recognition in Real-World Videos
|
![]() |
Pan-GAN: An Unsupervised Learning Method for Pan-sharpening Using a Generative Adversarial Network
|
![]() |
CP-NAS: Child-Parent Neural Architecture Search for 1-bit CNNs |
![]() |
Density Map Guided Object Detection in Aerial Image |
![]() |
Attention Mechanism Exploits Temporal Contexts: Real-time 3D Human Pose Reconstruction |
![]() |
Multi-Scale Progressive Fusion Network for Single Image Deraining |
![]() |
Automated Monitoring for Security Camera Networks: Promise from Computer Vision Labs
|
2019
![]() |
Real-Time Dense Semantic Labeling with Dual-Path Framework for High-Resolution Remote Sensing Image
|
![]() |
Infrared and Visible Image Fusion via Detail Preserving Adversarial Learning
|
![]() |
SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention
|
![]() |
3D Dilated Multi-Fiber Network for Real-time Brain Tumor Segmentation in MRI
|
![]() |
An Efficient 3D CNN for Action/Object Segmentation in Video
|
![]() |
Deep Manifold Structure Transfer for Action Recognition
|
![]() |
S3D: Scalable Pedestrian Detection via Score Scale Surface Discrimination
|
![]() |
GEOCAPSNET: Ground to Aerial View Image Geo-localization Using Capsule Network
|
![]() |
Joint Dynamic Pose Image and Space Time Reversal for Human Action Recognition from Videos |
![]() |
Calibrated Stochastic Gradient Descent for Convolutional Neural Networks |
![]() |
Semi-Supervised Discriminant Multi-Manifold Analysis for Action Recognition |
2018
![]() |
Hyperspectral Image Classification in the Presence of Noisy Labels |
![]() |
JRTIP Special Issue Editorial: Special Issue on Advances in Real‑Time Image Processing for Remote Sensing |
![]() |
An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos |
![]() |
SLR: Semi-coupled Locality Constrained Representation for Very Low Resolution Face Recognition and Super Resolution |
![]() |
Memory Attention Networks for Skeleton-based Action Recognition |
![]() |
Real-world Anomaly Detection in Surveillance Videos |
![]() |
Gabor Convolutional Networks |
![]() |
One-two-one Networks for Compression Artifacts Reduction in Remote Sensing |
![]() |
Multispectral Satellite Image Denoising via Adaptive Cuckoo Search-Based Wiener Filter |
![]() |
SuperPCA: A Superpixelwise Principal Component Analysis Approach for Unsupervised |
2017
![]() |
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos |
![]() |
Cross-View Image Matching for Geo-localization in Urban Environments |
![]() |
Binary Coding for Partial Action Analysis with Limited Observation Ratios |
![]() |
Manifold Constrained Low-Rank Decomposition |
![]() |
Multi-Temporal Depth Motion Maps-Based Local Binary Patterns for 3-D Human Action Recognition |
![]() |
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions |
![]() |
Latent Constrained Correlation Filter |
![]() |
Person Re-identification via Discrepancy Matrix and Matrix Metric |
![]() |
Image Reconstruction via Manifold Constrained Convolutional Sparse Coding for Image Sets |
![]() |
Fusing Local and Global Features for High-Resolution Scene Classification |
![]() |
Enhanced Skeleton Visualization for View Invariant Human Action Recognition |
![]() |
Action Recognition Using 3D histograms of Texture and A Multi-class Boosting Classifier |
![]() |
3D Action Recognition Using Multi-scale Energy-based Global Ternary Image |
![]() |
Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification |
![]() |
Single Image Super-Resolution via Locally Regularized Anchored Neighborhood Regression and Nonlocal Means |
![]() |
Noise Robust Face Image Super-resolution through Smooth Sparse Representation |
![]() |
Output Constraint Transfer for Kernelized Correlation Filter in Tracking |
![]() |
SRLSP: A Face Image Super-Resolution Algorithm Using Smooth Regression with Local Structure Prior |
![]() |
Action Recognition from Depth Sequences Using Weighted Fusion of 2D and 3D Auto-Correlation of Gradients Features |
![]() |
A Survey of Depth and Inertial Sensor Fusion for Human Action Recognition |
![]() |
Real-time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals |
![]() |
Weighted Fusion of Depth and Inertial Data to Improve View Invariance for Real-time Human Action Recognition |
2016
![]() |
Remote Sensing Image Scene Classification Using Multi-scale Completed Local Binary Patterns and Fisher Vectors |
![]() |
Scene Classification Using Local and Global Features with Collaborative Representation Fusion |
![]() |
A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion |
![]() |
Land-Use Scene Classification Using Multi-Scale Completed Local Binary Patterns |
![]() |
Real-Time Human Action Recognition Based on Depth Motion Maps |
![]() |
3D Action Recognition Using Multi-temporal Depth Motion Maps and Fisher Vector |
![]() |
Energy-Based Global Ternary Image for Action Recognition Using Sole Depth Sequences |
![]() |
A Computationally Efficient Denoising and Hole-filling Method for Depth Image Enhancement |
![]() |
Fusion of Depth, Skeleton, and Inertial Data for Human Action Recognition |
![]() |
Hyperspectral Image Classification Using Set-to-Set Distance |
![]() |
L1-L1 Norms for Face Super-Resolution with Mixed Gaussian-Impulse Noise |
2015
![]() |
Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification |
![]() |
Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors |
![]() |
Gradient Local Auto-Correlations and Extreme Learning Machine for Depth-Based Activity Recognition |
![]() |
UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera |
![]() |
Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification |
![]() |
Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns |
2014 and Before
![]() |
Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition |
![]() |
Spectral-Spatial Classification of Hyperspectral Image based on Kernel Extreme Learning Machine |
![]() |
Spectral-Spatial Preprocessing Using Multihypothesis Prediction for Noise-Robust |
![]() |
Reconstruction of Hyperspectral Imagery from Random Projections Using Multihypothesis Prediction |
![]() |
Multi-HMM Classification for Hand Gesture Recognition Using Two Differing Modality Sensors |
![]() |
Home-based Senior Fitness Test Measurement System Using Collaborative Inertial and Depth Sensors |
![]() |
A Medication Adherence Monitoring System for Pill Bottles Based on a Wearable Inertial Sensor |
![]() |
Compressed-Sensing Recovery of Images and Video Using Multihypothesis Predictions |
![]() |
Wearable medication adherence monitoring
|
![]() |
Fusion of inertial and depth sensors for movement measurements and recognition
|
Video Anomaly Dection Dataset
![]() |
UCF-Crime dataset is a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed
real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary,
Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety.
This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group.
Second, for recognizing each of 13 anomalous activities.
Real-world Anomaly Detection in Surveillance Videos Waqas Sultani, Chen Chen, Mubarak Shah IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 [Paper] [Video Presentation] [Project Website] [Note] [Code] [Download the dataset] (Note: The "Anomaly_Train.txt" file in the zip file is corrupted, please down it here: Anomaly_Train.txt) Option 2: Download the dataset from Dropbox (multiple files): Link |
Satellite Smoke Scene Detection Dataset
![]() |
One important challenge for detecting fire smoke in satellite imagery is the similar disasters and multiple land covers.
The commonly used smoke detection methods mainly focus on smoke discrimination from a few specific classes, which reduces
their applicability in different regions of various classes. In addition, there is no satellite remote sensing smoke detection dataset so far.
To this end, we construct the USTC_SmokeRS dataset and integrate more smoke-like aerosol classes and land covers in the dataset,
for example, cloud, dust, haze, bright surfaces, lakes, seaside, vegetation, etc. The USTC_SmokeRS dataset contains a total of 6225 RGB images from six classes:
cloud, dust, haze, land, seaside, and smoke. Each image was saved as the ".tif" format with the size of 256 × 256.
SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention Rui Ba, Chen Chen, Jiang Yuan, Weiguo Song, Siuming Lo Remote Sensing, 2019. [Paper] [Project Website] [Download Dataset from Google Drive] [Download Dataset from OneDrive] [Download Dataset from Baidu Pan (download password: 5dlk)] |
Cross-View Geolocalization Dataset
![]() |
UCF cross-view geolocalization dataset is created for the geo-localization task using cross-view image matching.
The dataset has street view and bird's eye view image pairs around downtown Pittsburg,
Orlando and part of Manhattan.
There are 1,586, 1,324 and 5,941 GPS locations in Pittsburg, Orlando and Manhattan, respectively.
We utilize DualMaps to generate side-by-side street view and bird's eye view images at each GPS location with the same heading direction.
The street view images are from Google and the overhead 45 degree bird's eye view images are from Bing.
For each GPS location, four image pairs are generated with camera heading directions of 0 degree, 90 degree, 180 degree and 270 degree.
In order to learn the deep network for building matching, we annotate corresponding buildings in every street view and bird's eye view image pair.
Cross-View Image Matching for Geo-localization in Urban Environments Yicong Tian, Chen Chen, Mubarak Shah IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 [Paper] [Project (Download Cross-view dataset and code)] |
UTD-MHAD Dataset
![]() |
UTD-MHAD dataset was collected as part of our research on human action recognition using fusion of depth and inertial sensor data.
The objective of this research has been to develop algorithms for more robust human action recognition using fusion of data from differing modality sensors.
The UTD-MHAD dataset consists of 27 different actions: (1) right arm swipe to the left, (2) right arm swipe to the right, (3) right hand wave, (4) two hand front clap, (5) right arm throw, (6) cross arms in the chest, (7) basketball shoot, (8) right hand draw x, (9) right hand draw circle (clockwise), (10) right hand draw circle (counter clockwise), (11) draw triangle, (12) bowling (right hand), (13) front boxing, (14) baseball swing from right, (15) tennis right hand forehand swing, (16) arm curl (two arms), (17) tennis serve, (18) two hand push, (19) right hand knock on door,
(20) right hand catch an object, (21) right hand pick up and throw, (22) jogging in place, (23) walking in place, (24) sit to stand, (25) stand to sit, (26) forward lunge (left foot forward), (27) squat (two arms stretch out).
UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz IEEE International Conference on Image Processing (ICIP), 2015 [Paper] [UTD Multimodal Human Action Dataset Website] |
University of Central Florida (UCF)
- CAP4453 - Robot Vision (44 students, online course)
UNC-Charlotte
- ECGR 6090/8090 - Special Topics: Deep Learning in Computer Vision
-
ECGR 3090/4090 - Introduction to Machine Learning
(new course developed at UNCC) - ECGR 6119/8119 - Applied Artificial Intelligence
- ECGR 4124/5124 - Digital Signal Processing
- ECGR 6090/8090 - Special Topics: Deep Learning in Computer Vision
- ECGR 6119/8119 - Applied Artificial Intelligence
-
ECGR 6090/8090 - Special Topics: Deep Learning in Computer Vision
(new course developed at UNCC)