Posted by André Araujo and Tobias Weyand, Software Engineers, Google Research
Image classification technology has shown remarkable improvement over the past few years, exemplified in part by the Imagenet classification challenge, where error rates continue to drop substantially every year. In order to continue advancing the state of the art in computer vision, many researchers are now putting more focus on fine-grained and instance-level recognition problems – instead of recognizing general entities such as buildings, mountains and (of course) cats, many are designing machine learning algorithms capable of identifying the Eiffel Tower, Mount Fuji or Persian cats. However, a significant obstacle for research in this area has been the lack of large annotated datasets.
Today, we are excited to advance instance-level recognition by releasing Google-Landmarks, the largest worldwide dataset for recognition of human-made and natural landmarks. Google-Landmarks is being released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges, which will be the focus of the CVPR’18 Landmarks workshop. The dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world (their geographic distribution is presented below), a number of classes that is ~30x larger than what is available in commonly used datasets. Additionally, to spur research in this field, we are open-sourcing Deep Local Features (DELF), an attentive local feature descriptor that we believe is especially suited for this kind of task.
|Geographic distribution of landmarks in our dataset.|
Landmark recognition presents some noteworthy differences from other problems. For example, even within a large annotated dataset, there might not be much training data available for some of the less popular landmarks. Additionally, since landmarks are generally rigid objects which do not move, the intra-class variation is very small (in other words, a landmark’s appearance does not change that much across different images of it). As a result, variations only arise due to image capture conditions, such as occlusions, different viewpoints, weather and illumination, making this distinct from other image recognition datasets where images of a particular class (such as a dog) can vary much more. These characteristics are also shared with other instance-level recognition problems, such as artwork recognition — so we hope the new dataset can benefit research for other image recognition problems as well.
The two Kaggle challenges provide access to annotated data to help researchers address these problems. The recognition track challenge is to build models that recognize the correct landmark in a dataset of challenging test images, while the retrieval track challenges participants to retrieve images containing the same landmark.
|A few examples of images from the Google-Landmarks dataset, including landmarks such as Big Ben, Sacre Coeur Basilica, the rock sculpture of Decebalus and the Megyeri Bridge, among others.|
If you plan to be at CVPR this year, we hope you’ll attend the CVPR’18 Landmarks workshop. However, everyone is able to participate in the challenge, and access to the new dataset is available via the Kaggle website. We hope this resource is valuable to your research and we can’t wait to see the ideas you will come up with for recognizing landmarks!
Jack Sim, Will Cukierski, Maggie Demkin, Hartwig Adam, Bohyung Han, Shih-Fu Chang, Ondrej Chum, Torsten Sattler, Giorgos Tolias, Xu Zhang, Fernando Brucher, Marco Andreetto, Gursheesh Kour.