Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets
Tesfaye, Yonatan Tariku, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. “Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets.” [BibTeX]
In this paper, a unified three-layer hierarchical approach for solving tracking problem in a multiple non-overlapping cameras setting is proposed. Given a video and a set of detections (obtained by any person detector), we first solve within-camera tracking employing the first two layers of our framework and then, in the third layer, we solve across-camera tracking} by associating tracks of the same person in all cameras simultaneously. To best serve our purpose, we propose Fast-Constrained Dominant Set Clustering (FCDSC), a novel method which is several orders of magnitude faster (close to real time) than existing methods. FCDSC is a parameterized family of quadratic programs that generalizes the standard quadratic optimization problem.
In our method, we first build a graph where nodes of the graph represent short-tracklets, tracklets and tracks in the first, second and third layer of the framework, respectively. The edge weights reflect the similarity between nodes. FCDSC takes as input a constrained set, a subset of nodes from the graph which need to be included in the extracted cluster. Given a constrained set, FCDSC generates compact clusters by selecting nodes from the graph which are highly similar to each other and with elements in the constrained set.
We have tested this approach on a very large and challenging dataset (namely, MOTchallenge DukeMTMC) and show that the proposed framework outperforms the state-of-the-art approaches. Even though the main focus of this paper is on multi-target tracking in non-overlapping cameras, the proposed approach can also be applied to solve video-based person re-identification problem. We show that when the re-identification problem is formulated as a clustering problem, FCDSC can be used in conjunction with state-of-the-art video-based re-identification algorithms, to increase their already good performances. Our experiments demonstrate the general applicability of the proposed framework for multi-target multi-camera tracking and person re-identification tasks.
- Within-Camera Tracking
The figure shows within-camera tracking where short-tracklets (s) from different segments are used as input to our first layer of tracking. The resulting tracklets (t) from the first layer are inputs to the second layer, which determine tracks (T) for each person. The three dark green short-tracklets (s12, s110, s17), shown by dotted ellipse in the first layer, form a cluster resulting in a tracklet (t12) in the second layer, as shown with the black arrow. In the second layer, each cluster, shown in purple, green and dark red colors, form tracks of different targets, as can be seen on the top row, tracklets and tracks with the same color indicate same target. The two green clusters (with two tracklets and three tracklets) represent tracks of the person going in and out of the building (tracks Tp1 and T21 respectively)
- Across-Camera Tracking
Given tracks Tij of different cameras from the previous step, we build the graph G'(V’,E’,w’), where nodes represent tracks and their corresponding edge weights depicts the similarity between tracks.
Exemplar graph of tracks from three cameras. Tij represents the ith track of camera j. Black and colored edges, respectively, represent within- and across-camera relations of tracks. Colors of the nodes depict track IDs, nodes with similar color representing tracks of the same person, and the thick lines show both within- and across-camera association.
- Across-Camera Qualitative Results
- Within-Camera Quantitative Results
The results below show average performance of our approach and state-of-the-art approaches [34, 27] on the Test-Easy sequence of DukeMTMC dataset.
The results below show average performance of our approach and state-of-the-art approaches [34, 27] on the Test-Hard sequence of DukeMTMC dataset.
- Across-Camera Quantitative Results
Multi-camera performance of our approach and state-ofthe- art approaches [34, 27] on the Test-Easy and Test-Hard sequence of DukeMTMC dataset.
- Person Re-identification Results
The table below shows the comparison (based on rank-1 accuracy) of our approach with the state-of-the-art approaches on MARS dataset.
 Afshin Dehghan, Shayan Modiri Assari and Mubarak Shah, GMMCP-Tracker:Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking IEEE International Conference on Computer Vision and Pattern Recognition, 2015. [PDF] [BibTeX]
 Ergys Ristani, Carlo Tomasi, “Tracking Multiple People Online and in Real Time”, ACCV2014
 Amir Roshan Zamir, Afshin Dehghan, and Mubarak Shah, GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs, European Conference on Computer Vision (ECCV), 2012. [PDF] [BibTeX]
 Afshin Dehghan, Yicong Tian, Philip. H. S. Torr and Mubarak Shah, Target Identity-aware Network Flow for Online Multiple Target Tracking IEEE International Conference on Computer Vision and Pattern Recognition, 2015. [PDF] [BibTeX]
 Afshin Dehghan, Haroon Idrees, Amir Roshan Zamir, and Mubarak Shah, (In alphabetical order) Keynote: Automatic Detection and Tracking of Pedestrians in Videos with Various Crowd Densities In Proceedings of PED, June 2012. [PDF] [BibTeX]
Note: References displayed under the quantitative analysis tables can be accessed using the paper.