Accession Number:

ADA595011

Title:

A Submodularity Framework for Data Subset Selection

Descriptive Note:

Final rept. 30 Jul 2012-15 Sep 2013

Corporate Author:

WASHINGTON UNIV SEATTLE DEPT OF ELECTRICAL ENGINEERING

Report Date:

2013-09-01

Pagination or Media Count:

57.0

Abstract:

This report describes the outcome of the project A Submodularity Framework for Data Subset Selection. The goal of the project was to develop and evaluate novel submodular functions for the purpose of subselecting large sets of acoustic and text data. The subselected data sets were used to train acoustic models for automatic speech recognition or translation models for machine translation, respectively. The submodular selection techniques were evaluated against random data selection and the best comparable data selection technique previously reported in the literature. Our results demonstrate that submodular data selection outperforms all baseline techniques, i.e. for a fixed data subset size, submodular selection resulted in systems with better performance. Additionally, submodular selection was applied to the problem of feature selection, where it outperformed standard modular feature selection techniques.

Subject Categories:

  • Linguistics
  • Cybernetics
  • Voice Communications

Distribution Statement:

APPROVED FOR PUBLIC RELEASE