Accession Number:

ADA518358

Title:

Investigation into Text Classification With Kernel Based Schemes

Descriptive Note:

Master's thesis

Corporate Author:

NAVAL POSTGRADUATE SCHOOL MONTEREY CA

Personal Author(s):

Report Date:

2010-03-01

Pagination or Media Count:

168.0

Abstract:

The development of the Internet has resulted in a rapid explosion of information available on the Web. In addition, the speed and anonymity of internet media publishing make this medium ideal for rapid dissemination of various contents. As a result, there is a strong need for automated text analysis and mining tools, which can identify the main topics of texts, chat room discussions, Web postings, etc. This thesis investigates whether the nonlinear kernel-based feature vector selection approach may be beneficial for categorizing unstructured text documents. Results using a nonlinear kernel-based classification are compared to results obtained using the Latent Semantic Analysis LSA Approach commonly used in text categorization applications. The nonlinear kernel-based scheme considered in this work applies the feature vector selection FVS approach followed by the Linear Discriminant Analysis LDA scheme. Titles, along with abstracts from IEEE journal articles published between 1990 and 1999 with specific key terms, were used to construct the data set for classification. Overall, taking into account both classification performance and timing issues, results showed the FVS-LDA with a polynomial kernel of degree 1, and an added constant of 1, to be the best classifier for the database considered.

Subject Categories:

  • Information Science
  • Numerical Mathematics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE