University of California, Los Angeles Los Angeles United States
We present a very efficient semi-supervised graph-based algorithm for classification of high-dimensional data that is motivated by the MBO method of Garcia-Cardona 2014 and derived using the similarity graph. Our procedure is an elegant combination of heat kernel page rank and the MBO method applied to study semi-supervised problems. The timing of our algorithm is highly dependent on how quickly the page rank can be computed we use two different yet very efficient approaches to calculate the page rank, one of which proceeds by simulating random walks of bounded length. Overall, our method is advantageous for very big, sparse data, in which the graph has few edges, and it produces good accuracy even if the number of labeled instances is very small. In fact, the accuracy of the procedure is comparable with or better than that of state-of-the-art methods and is demonstrated on benchmark data sets. In addition to experimental results, we include a thorough comparison of our algorithm to that of Garcia-Cardona 2014 and describe the advantages of both methods.