Skip to main content

“Where Am I?!” Accurate Image Localization Based on Google Maps Street View

In this project a new system for image localization and location recognition in terms of Longitude () and Latitude () with an accuracy which is comparable to hand held GPS-devices is proposed. Our geolocation system is based on Google Maps Street view.

Google Maps Street View Reference Dataset

The reference dataset if based on Google Maps Street View. There are ~100k images in the dataset collected from Pittsburgh, PA and Orlando, FL. ~50k of the reference images are Collected automatically from Street View website and the rest of them are provided by Google.

Reference Image Place marks in Green and Query Images in Red

Note: the dataset set provided by Google and the automatically captured images are overlapping in location and captured at different times.

There are 4 side view and 5 top views per place mark. The following figure shows the images of 3 sample place marks in Pittsburgh, PA.

Sample Reference Image and Their Location on the Map

In order to preprocess the reference dataset, the SIFT descriptors for SIFT interest points are computed and saved in a K-means tree using FLANN along with their GPS Tag.

  • Single Image Localization

The following block diagram shows each step of geolocating a query image. The input to the system is an image and the output is the found GPS location in terms of Longitude () and Latitude ().

First Row: Single Image Localization Block Diagram, Second Row: The result of each step for a sample query.

Geospatial Pruning and Smoothing steps are explained in more details in the following sections.

3.1 Geospatial Pruning

Geospatial Pruning is an essential step in the proposed geolocation system. Geospatial Pruning is helpful when reference images have overlap in scene (i.e. one object is in the view of several reference images) and when there are repeated structures such as man-made structures in urban areas (e.g. the windows of a skyscraper are almost identical). Regarding the following figures, using Lowe’s pruning method most of the detected interest points and their descriptors will be removed in the pruning step resulting in a sparse vote distribution which is not appropriate for a reliable geolocation. On the other hand, using the proposed geospatial pruning method which incorporates the GPS location of each reference descriptor, the incorrectly matched descriptors are removed while the repeated structures of urban area and overlap in reference images do not affect the pruned results adversely.

Geospatial Pruning: Two sample correctly-matched descriptors

Geospatial Pruning Equation

Regarding the above figures which show two sample correctly-matched descriptors, using the Lowe’s pruning method the two interest points will be removed from voting (descriptor ratio will be between 1st NN and 2nd NN) while the proposed method will retain them (descriptor ratio will be between 1st NN and 4th NN).

 3.2 Smoothing

Since Street View place marks are about 12 meters away from each other one object in a query image might be in the view of several reference images. This results in several short close peaks instead of one tall peak for the correct location in the vote function. Also, there might be some solitary peaks in the vote function which are due to incorrectly-matched descriptors. In order to amplify several close peaks and attenuate solitary peaks, the vote function is smoothed by Gaussian using the following figure.

Smoothing By Gaussian

 3.3 Confidence of Localization (CoL)

A parameter called Confidence of Localization (CoL) which represents the reliability of localizing a query image is proposed in this project. The vote distribution after Gaussian smoothing can be normalized and treated as a Probability Distribution Function with the random variables of Longitude () and Latitude (). The proposed parameter is based on the Kurtosis (normalized forth central moment) of the vote distribution. This is due to the facts that a more peaked vote distribution function is corresponding to a more reliable localization task and the Kurtosis is a measure of how peaked a PDF is. The following figure shows how CoL changes with respect to vote distribution.

Confidence of Localization (CoL)

CoL value is not limited since there is no upper limited to the Kurtosis of a PDF, so CoL makes more sense when used on a comparative basis. For instance in order to find the correct city for a query image among two different cities, the query image can be geolocated within each one and the city with the higher associated CoL value should be selected as the correct one

Image Group Localization

We propose a method for geolocating a group of query images instead of geolocating them individually. The proposed method leverages the adjacency information of query images in geolocating them. The assumption of the proposed method is that the query images are taken within a distance from each other (e.g. 300 meters). The following figure shows different steps of the proposed method for a group of 3 query images. First, each query image is geolocated individually. Later the other query images are geolocated within the neighborhood of the found location. The correct neighborhood and associated locations for each query image is neighborhood with the highest CoLgroup value.

Image Group Localization


We test the proposed method on a test set of 521 GPS-Tagged user-uploaded images downloaded from Flickr, Panoramio, Picasa, etc. Since the GPS-tags of user-uploaded image are usually very noisy and inaccurate, we have manually double checked and adjusted the GPS location of the test set images.

The following figures show the results of geolocating the test set images using the proposed methods. The vertical axis shows the percentage of the test set images geolocated within the distance threshold (horizontal axis) of the ground truth.

Single Localization Results                            Image Group Localization Results

In order to examine the performance of the proposed Confidence of Localization parameter, the CoL values of geolocating the test set of single image localization are grouped into 8 bins based on their CoL value. The following figure shows the mean error (vertical axis) of each bin versus the mean CoL value of the bin (horizontal axis). As can be observed in the figure, higher CoL values are corresponding to lower error meaning the localization is more reliable.

Confidence of Localization vs. Geolocation Error (m)

(Since theoretically the value of the Kurtosis is not limited, we normalized the CoL values and showed them ranging from 0 to 1 on the horizontal axis of the plot.)

The following figures are more localization examples of the test set image using the proposed methods. Each example shows one query image, the retrieved image, their GPS locations and the error in meters along with the vote distribution for each step of localization.


Related Papers

  1. Amir Roshan Zamir, Mubarak Shah, “Accurate Image Localization Based on Google Maps Street View”, European Conference on Computer Vision (ECCV), 2010[PDF][BibTeX] – Winner of ECCV’10 Travel Grant
  2. Gonzalo Vaca, Amir Roshan Zamir, Mubarak Shah, “City Scale Geo-spatial Trajectory Estimation of a Moving Camera”, 25th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2012


Data set: Please contact us if you are interested in obtaining an expanded version of our Street View dataset and test set.

Code: Please email us your name and affiliation in order to obtain the code.

Presentation: The power point presentation of the paper is available here. The poster is available here.

Note: This version contains minor typographical corrections over the version published in the ECCV10 proceedings.