Large-scale Image Geo-Localization Using Dominant Sets
Eyasu Zemene, Yonatan Tariku Tesfaye, Haroon Idrees, Andrea Prati, Marcello Pelillo and Mubarak Shah. “Large-scale Image Geo-Localization Using Dominant Sets.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 , pp. 148 – 161, 2018.
This paper presents a new approach for the challenging problem of geo-localization using image matching in a structured database of city-wide reference images with known GPS coordinates. We cast the geo-localization as a clustering problem of local image features. Akin to existing approaches to the problem, our framework builds on low-level features which allow local matching between images. For each local feature in the query image, we find its approximate nearest neighbors in the reference set. Next, we cluster the features from reference images using Dominant Set clustering, which affords several advantages over existing approaches. First, it permits variable number of nodes in the cluster, which we use to dynamically select the number of nearest neighbors for each query feature based on its discrimination value. Second, this approach is several orders of magnitude faster than existing approaches. Thus, we obtain multiple clusters (different local maximizers) and obtain a robust final solution to the problem using multiple weak solutions through constrained Dominant Set clustering on global image features, where we enforce the constraint that the query image must be included in the cluster. This second level of clustering also bypasses heuristic approaches to voting and selecting the reference image that matches to the query. We evaluate the proposed framework on an existing dataset of 102k street view images as well as a new larger dataset of 300k images and show that it outperforms the state-of-the-art by 20% and 7%, respectively, on the two datasets.
- New Dataset:
For each query feature, we dynamically select NNs and remove query features which are less informative, we build fully connected graph between selected NN and extract candidate matching reference images using Dominant set clustering (DSC). Finally, we employ constrained dominant sets (CDS) based post processing to select best matching reference image and finally we approximate the location of the query based of the location of the best matching reference image.
The WorldCities dataset is a new high-resolution reference dataset of 300k street view images that covers 14 different cities from different parts of the world: Europe (Amsterdam, Frankfurt, Rome, Milan and Paris), Australia (Sydney and Melbourne), USA (Vegas, Los Angeles, Phoenix, Houston, San Diego, Dallas, Chicago). Existence of similarity in buildings around the world, which can be in terms of their wall designs, edges, shapes, color etc, makes the dataset more challenging than the other. For the test set, we use 644 and 500 GPS-tagged user uploaded images downloaded from Picasa, Flickr and Panoramio for the 102k Google street view images and WorldCities datasets, respectively. Throughout our experiment, we use all the reference images from around the world to find the best match with the query image, not just with the ground truth city only.
The top four rows are sample street view images from eight different places of WorldCities dataset. The bottom two rows are sample user uploaded images from the test set.
Comparison of our baseline (without post processing) and final method with state-of-the-art approaches on the first dataset (102K Google street view images).
Comparison of overall geo-localization results using DSC with and without post processing and state-of-the-art approaches on the WorldCities dataset.
Sample qualitative results taken from Pittsburgh area. The green ones are the ground truth while yellow locations indicate our localization results.
 Amir Roshan Zamir, Shervin Ardeshir and Mubarak Shah, “GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor”, In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
 Amir Roshan Zamir, Afshin Dehghan and Mubarak Shah, “Visual Business Recognition – A Multimodal Approach”, In Proceeding of ACM International Conf. on Multimedia (ACM MM), 2013.