Accession Number:



Learning Deep Representations for Ground to Aerial Geolocalization (Open Access)

Descriptive Note:

Conference Paper

Corporate Author:

Cornell Tech New York United States

Report Date:


Pagination or Media Count:



The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations e.g., street-view data. However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or birds eye imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.

Subject Categories:

  • Cybernetics
  • Cartography and Aerial Photography

Distribution Statement: