Fully Convolutional Deep Neural Networks for Persistent Multi-Frame Multi-Object Detection in Wide Area Aerial Videos
Multiple object detection in wide area aerial videos, has drawn the attention of the computer vision research community for a number of years. A novel framework is proposed in this paper using a fully convolutional deep neural network, which is able to detect all objects simultaneously for a given region of interest. The network is designed to accept multiple video frames at a time as the input and yields detection results for all objects in the temporally center frame. This multi-frame approach yield far better results than its single frame counterpart. Additionally, the proposed method can detect vehicles which are slowing, stopped, and/or partially or fully occluded during some frames, which cannot be handled by nearly all stateof- the-art methods. To the best of our knowledge, this is the first use of a multiple-frame, fully convolutional deep model for detecting multiple small objects and the only framework which can detect stopped and temporarily occluded vehicles, for aerial videos. The proposed network exceeds stateof- the-art results significantly on WPAFB 2009 dataset.
Rodney LaLonde, Dong Zhang, Mubarak Shah, Fully Convolutional Deep Neural Networks for Persistent Multi-Frame Multi-Object Detection in Wide Area Aerial Videos, Cornell University Library, arXiv:1704.02694v1 [cs.CV], [v1] Mon, 10 Apr 2017.