Target Identity-aware Network Flow for Online Multiple Target Tracking
- Introduction
- Pipeline
- Full Presentation
- Tracking Output and Groundtruth
- Code
- Data
- Related Publications
Introduction

In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously. Our method allows us to overcome the confinements of data association based MOT approaches; where the performance is dependent on the object detection results provided at input level. At the core of our method lies structured learning which learns a model for each target and infers the best location of all targets simultaneously in a video clip. The inference of our structured learning is done through a new Target Identity-aware Network Flow (TINF), where each node in the network encodes the probability of each target identity belonging to that node. The proposed Lagrangian relaxation optimization finds the high quality solution to the network. During optimization a soft spatial constraint is enforced between the nodes of the graph which helps reducing the ambiguity caused by nearby targets with similar appearance in crowded scenarios. We show that automatically detecting and tracking targets in a single framework can help resolve the ambiguities due to frequent occlusion and heavy articulation of targets. Our experiments involve challenging yet distinct datasets and show that our method can achieve results better than the state-of-art.
Pipeline

- Given annotated samples in multiple frames, structre learning is used to train appearance model for each target
- Inference during learning is used to find the most-violated constraints (TINF)
- The same inference technique is used to find track in next batch of frames
- Models are updates through passive aggressive algorithm if necessary
TINF vs Network flow

- TINF has multiple sources/sinks : one for each identity.
- TINF includes K observation edges for each candidate window.
- The observation edges encode the probability of assigning different lables to that candidate window.
Full Presentation
20-Minute presentation of the full paper by Afshin DehghanTracking Output and Groundtruth
Since the bounding box size is set manually and it might be inaccurate, we used 30% overlap threshold in all of our experiments.Download all
Parking Lot 1:
Tracking Output
Groundtruth
Parking Lot 2:
Tracking Output
Groundtruth
Parking Lot Pizza:
Tracking Output
Groundtruth
Running:
Tracking Output
Groundtruth
Dancing:
Tracking Output
Groundtruth
Code
Please Contact AuthorData
Dancing
Running
PowerPoint
Click herePoster
Click hereRelated Publications
Afshin Dehghan, Yicong Tian, Philip. H. S. Torr and Mubarak Shah, Target Identity-aware Network Flow for Online Multiple Target Tracking IEEE International Conference on Computer Vision and Pattern Recognition, 2015. [PDF], [BibTeX]Afshin Dehghan, Shayan Modiri Assari and Mubarak Shah, GMMCP-Tracker:Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking IEEE International Conference on Computer Vision and Pattern Recognition, 2015. [PDF], [BibTeX]
Amir Roshan Zamir, Afshin Dehghan, and Mubarak Shah, GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs, European Conference on Computer Vision (ECCV), 2012. [PDF], [BibTeX]
Afshin Dehghan, Haroon Idrees, Amir Roshan Zamir, and Mubarak Shah, (In alphabetical order) Keynote: Automatic Detection and Tracking of Pedestrians in Videos with Various Crowd Densities, In Proceedings of PED, June 2012, [PDF], [BibTeX]