Accession Number:

ADA439688

Title:

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Descriptive Note:

Technical rept.

Corporate Author:

MINNESOTA UNIV MINNEAPOLIS DEPT OF COMPUTER SCIENCE

Report Date:

1999-05-17

Pagination or Media Count:

11.0

Abstract:

Categorization of documents is challenging, as the number of discriminating words can be very large. The authors present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques. The nearest neighbors for a particular document are then computed based on the matching words and their weights. They evaluate their scheme on both synthetic and real-world documents. Experiments with synthetic data sets show that this scheme is robust under different emulated conditions. Empirical results on real-world documents demonstrate that this scheme outperforms state-of-the-art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.

Subject Categories:

  • Information Science
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE