Accession Number:

ADA528612

Title:

Tractable Algorithms for Proximity Search on Large Graphs

Descriptive Note:

Doctoral thesis

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA MACHINE LEARNING DEPT

Personal Author(s):

Report Date:

2010-07-01

Pagination or Media Count:

195.0

Abstract:

Identifying the nearest neighbors of a node in a graph is a key ingredient in a diverse set of ranking problems, e.g. friend suggestion in social networks, keyword search in databases, web-spam detection etc. For finding these near neighbors, we need graph theoretic measures of similarity or proximity. Most popular graph-based similarity measures, e.g. length of shortest path, the number of common neighbors etc., look at the paths between two nodes in a graph. One such class of similarity measures arise from random walks. In the context of using these measures, we identify and address two important problems. First, we note that, while random walk based measures are useful, they are often hard to compute. Hence we focus on designing tractable algorithms for faster and better ranking using random walk based proximity measures in large graphs. Second, we theoretically justify why path-based similarity measures work so well in practice. For the first problem, we focus on improving the quality and speed of nearest neighbor search in real-world graphs. This work consists of three main components first we present an algorithmic framework for computing nearest neighbors in truncated hitting and commute times, which are proximity measures based on short term random walks. Second, we improve upon this ranking by incorporating user feedback, which can counteract ambiguities in queries and data. Third, we address the problem of nearest neighbor search when the underlying graph is too large to fit in main memory. We also prove a number of interesting theoretical properties of these measures, which have been key to designing most of the algorithms in this thesis. We address the second problem by bringing together a well known generative model for link formation, and geometric intuitions. As a measure of the quality of ranking, we examine link prediction, which has been

Subject Categories:

  • Numerical Mathematics
  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE