Accession Number:

ADA616935

Title:

A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory

Descriptive Note:

Final rept. 27 Mar 2013-26 Mar 2015

Corporate Author:

NATIONAL TAIWAN UNIV TAIPEI

Personal Author(s):

Report Date:

2015-03-27

Pagination or Media Count:

12.0

Abstract:

This project deals with issues on distributed learning for big data and addresses three major problems. 1 Learning a classifier where data contain many samples that do not help improve the model quality, which cost much IO and large memory to process. A Block Coordinate Descent combined with Approximate Nearest Neighbor ANN search to select active samples in dual mode was shown to outperform the-state-of-the-art. 2 Complex query search in which sending it to all the local machines is very costly. Decomposing the reference patterns into multi-resolution solved the distributed kNNkFN pattern matching very efficiently. 3 Distributed learning problem for unlimited unlabeled data stream from many clients needed to send to a server to learn a classifier. Integrating three learning techniques online, semi-supervised and active learning together with a selective sampling with minimum communication between the server and the clients solved this problem.

Subject Categories:

  • Information Science
  • Computer Systems
  • Computer Systems Management and Standards

Distribution Statement:

APPROVED FOR PUBLIC RELEASE