Skip to main content

Deep Constrained Dominant Sets for Person Re-Identification

 

Publication

Leulseged Tesfaye Alemu, Mubarak Shah and Marcello Pelillo. “Deep Constrained Dominant Sets for Person Re-Identification.” 2019 IEEE International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 – November 2, 2019. [Supplementary]

Overview

In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem. Deep neural networks (DNN) have recently proven to be effective on a person re-identification task. In particular, rather than leveraging solely a probe-gallery similarity, diffusing the similarities among the gallery images in an end-to-end manner has proven to be effective in yielding a robust probe-gallery affinity. However, existing methods do not apply a probe image as a constraint, and are prone to noise propagation during the similarity diffusion process. To overcome this, we propose an intriguing scheme that treats the person-image retrieval problem as a constrained clustering optimization problem, called deep constrained dominant sets (DCDS).

Given a probe and gallery images, we re-formulate a person re-id problem as finding a constrained cluster, where the probe image is taken as a constraint (seed) and each cluster corresponds to a set of images corresponding to the same person. By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images. We further enhance the performance by integrating an auxiliary net alongside DCDS, which employs a multi-scale ResNet.

(f) Shows the big picture of the proposed method, where, P indicates the constraint (probe-image); and, images in the constrained cluster, the enclosed area, indicates the positive samples to the probe image

Above figure shows workflow of the proposed DCDS. Given n number of gallery images, G, and probe image P, we first extract their Resent101 features right before the global average pooling (GAP) layer, which are then fed to CDS-Net (upper stream) and V-Net (lower stream) branches. In the CDS-branch, after applying GAP, we compute the similarity between M2 pair of probe-gallery image features, fp and fGiT, using their dot products, where T denotes a transpose. Thereby, we obtain M x M affinity matrix. Then, we run CDS taking the probe image as a constraint to find the solution x* ∈ R(M × 1) (similarity), and the dissimilarity, xd* is computed as an additive inverse of the similarity x*. Likewise, in the lower stream (V-Net) we apply elementwise subtraction on M pair of probe-gallery features. This is followed by GAP, batch normalization (BN), and fully connected layer (FC) to obtain probe-gallery similarity score, R ∈ R(M × 1), and probe-gallery dissimilarity score, D ∈ R(M × 1). Afterwards, we elementwise multiply x* and R, and xd* and D, to find the final similarity, Fs and disimilarity, Fd, scores, respectively. Finally, to find the prediction loss of our model, we apply a cross entropy loss, the ground truth Gt is given as Gt ∈ R(M × 1).

Experimental Setup

  • Auxiliary Net

  • The above figure illustrates the auxiliary net, which consists of two branches which are jointly trained. We first use features at different layers, S1, S2, S3, and then feed these to Maxpooling (MP), Conv, BN, ReLu, and FC layers for further encoding. We then compute triplet losses employing the features from the lower three streams after ReLu, shown by yellow, blue, and red circles. Next, after the final FC layer, we compute the cross-entropy loss (CE1,…,CE6) for each of the six different outputs, Oi, from the upper and lower stream shown by distinct colored-boxes. Note that even if the upper and lower stream applies the same operations, on S1, S2 and S3, they do not share the weights; thus, the encoding is different. We compute the final loss as the sum of the average of the triplet and cross-entropy losses.

  • Testing Phase

  • During testing, given a probe and gallery images, we extract DCDS and auxiliary features and concatenate them to find a single vector. Afterward, we build M × M affinity matrix and run CDS with constraint expansion mechanism to find the final probe-gallery similarity rank.

  • Constraint Expansion During Testing

  • a) Given a constraint (probe-image) Pj, we first collect k-NNs to the probe-image. Subsequently, we run CDS on the graph of the k-NN. Then, we choose a cluster-member with the highest membership-score, Ii.
    b) We re-run CDS, considering Pj and Ii as constraints, over the graph of the all set of images. Afterward, we consider the solution as our final rank.

Evaluations

To validate the effectiveness of our method we present experiments on several benchmark datasets and show that the proposed method can outperform state-of-the-art methods.

Related Publications

[1] Massimilano Pavan and Marcello Pelillo, Dominant sets and pairwise clustering, {IEEE} Trans. Pattern Anal. Mach. Intell. 2007 [BibTeX]

[2] Zemene et al. Interactive Image Segmentation Using Constrained Dominant Sets, 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016 [BibTeX]

[3] Yantao Shen et al Deep Group-Shuffling Random Walk for Person Re-Identification, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 [BibTeX]

Note: Please refer to paper for the references in the result tables.