Accession Number:
ADA439688
Title:
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
Descriptive Note:
Technical rept.
Corporate Author:
MINNESOTA UNIV MINNEAPOLIS DEPT OF COMPUTER SCIENCE
Personal Author(s):
Report Date:
1999-05-17
Pagination or Media Count:
11.0
Abstract:
Categorization of documents is challenging, as the number of discriminating words can be very large. The authors present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques. The nearest neighbors for a particular document are then computed based on the matching words and their weights. They evaluate their scheme on both synthetic and real-world documents. Experiments with synthetic data sets show that this scheme is robust under different emulated conditions. Empirical results on real-world documents demonstrate that this scheme outperforms state-of-the-art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.
Descriptors:
Subject Categories:
- Information Science
- Cybernetics