Accession Number:

ADA460254

Title:

Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis

Descriptive Note:

Corporate Author:

IBM THOMAS J WATSON RESEARCH CENTER YORKTOWN HEIGHTS NY

Personal Author(s):

Report Date:

2004-01-01

Pagination or Media Count:

9.0

Abstract:

This paper considers the task of automatically collecting words with their entity class labels, starting from a small number of labeled examples seed words. We show that spectral analysis is useful for compensating for the paucity of labeled examples by learning from unlabeled data. The proposed method significantly outperforms a number of methods that employ techniques such as EM and co-training. Furthermore, when trained with 300 labeled examples and unlabeled data, it rivals Naive Bayes classifiers trained with 7500 labeled examples.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE