Accession Number : ADA602439


Title :   Large Scale Hierarchical K-Means Based Image Retrieval With MapReduce


Descriptive Note : Master's thesis


Corporate Author : AIR FORCE INSTITUTE OF TECHNOLOGY WRIGHT-PATTERSON AFB OH GRADUATE SCHOOL OF ENGINEERING AND MANAGEMENT


Personal Author(s) : Murphy, William E


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a602439.pdf


Report Date : 27 Mar 2014


Pagination or Media Count : 87


Abstract : Image retrieval remains one of the most heavily researched areas in Computer Vision. Image retrieval methods have been used in autonomous vehicle localization research, object recognition applications, and commercially in projects such as Google Glass. Current methods for image retrieval become problematic when implemented on image datasets that can easily reach billions of images. In order to process these growing datasets, we distribute the necessary computation for image retrieval among a cluster of machines using Apache Hadoop. While there are many techniques for image retrieval, we focus on systems that use Hierarchical K-Means Trees. Successful image retrieval systems based on Hierarchical K-Means Trees have been built using the tree as a Visual Vocabulary to build an Inverted File Index and implementing a Bag of Words retrieval approach, or by building the tree as a Full Representation of every image in the database and implementing a K-Nearest Neighbor voting scheme for retrieval. Both approaches involve different levels of approximation, and each has strengths and weaknesses that must be weighed in accordance with the needs of the application. Both approaches are implemented with MapReduce, for the first time, and compared in terms of image retrieval precision, index creation run-time, and image retrieval throughput. Experiments that include up to 2 million images running on 20 virtual machines are shown.


Descriptors :   *COMPUTER VISION , *HIERARCHIES , *INFORMATION RETRIEVAL , COMPUTATIONS , DATA BASES , IMAGES , INDEXES , RECOGNITION , VEHICLES , VOCABULARY


Subject Categories : Information Science
      Computer Programming and Software
      Computer Systems Management and Standards


Distribution Statement : APPROVED FOR PUBLIC RELEASE