Accession Number:

ADA595830

Title:

Text Classification for Intelligent Portfolio Management

Descriptive Note:

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA ROBOTICS INST

Report Date:

2002-05-01

Pagination or Media Count:

22.0

Abstract:

In the application domain of stock portfolio management, software agents that evaluate the risks associated with the individual companies of a portfolio should be able to read electronic news articles that are written to give investors an indication of the financial outlook of a company. There is a positive correlation between news reports on a company financial outlook and the company attractiveness as an investment. However, because of the volume of such reports, it is impossible for financial analysts or investors to track and read each one. Therefore, it would be very helpful to have a system that automatically classifies news reports that reflect positively or negatively on a company financial outlook. To accomplish this task, we treat the analysis of news articles as a text classification problem. We developed a text classification algorithm that classifies financial news article by using a combination of a reduced but highly informative word feature sets and a variant of weighted majority algorithm. By clustering words represented in latent semantic vector space by LSA into groups with similar concepts, we are able to find semantically coherent word groups. A learning method with unlabeled data Self-Confident sampling was proposed to handle the problem of expensive data labeling. Vote entropy is the criterion that information-theoretically assigns a label to an unlabeled document. In comparison with naive Bayes classification boosted by Expectation Maximization EM, the proposed method showed a better performance in terms of accuracy. Two criteria are used to evaluate methods 1 how well they improve their performances with unlabeled data after being initially trained on a small number of human-labeled articles and 2 how well they classify the latest financial news articles which are mostly not seen during the training.

Subject Categories:

  • Information Science
  • Test Facilities, Equipment and Methods

Distribution Statement:

APPROVED FOR PUBLIC RELEASE