DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA616935
Title:
A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory
Descriptive Note:
Final rept. 27 Mar 2013-26 Mar 2015
Corporate Author:
NATIONAL TAIWAN UNIV TAIPEI
Report Date:
2015-03-27
Pagination or Media Count:
12.0
Abstract:
This project deals with issues on distributed learning for big data and addresses three major problems. 1 Learning a classifier where data contain many samples that do not help improve the model quality, which cost much IO and large memory to process. A Block Coordinate Descent combined with Approximate Nearest Neighbor ANN search to select active samples in dual mode was shown to outperform the-state-of-the-art. 2 Complex query search in which sending it to all the local machines is very costly. Decomposing the reference patterns into multi-resolution solved the distributed kNNkFN pattern matching very efficiently. 3 Distributed learning problem for unlimited unlabeled data stream from many clients needed to send to a server to learn a classifier. Integrating three learning techniques online, semi-supervised and active learning together with a selective sampling with minimum communication between the server and the clients solved this problem.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE