Adding a Capability to Extract Sentiment from Text Using HanDles
OHIO STATE UNIV COLUMBUS DEPT OF PSYCHOLOGY
Pagination or Media Count:
HanDles is a document visualization tool developed by Ohio State University for DRDC Toronto. One aspect of documents that might be of interest to analysts is the extent to which they express positive or negative opinion or sentiment toward some issue or group. In this report, we describe how HanDles was extended to include the ability to classify documents as containing predominantly positive or negative sentiment. The capability was added to the tool so that it could be used in Influence Operations contexts. As a test case, we trained HanDles to distinguish good and poor film reviews, and then tested it three times to see how well it classified documents. The first test was conducted on reviews of the Amazon Kindle. The second test was run on text segments of the original training set of movie reviews, and finally, it was tested on a set of movie reviews that it had not seen before. In general, HanDles did a poor job detecting the sentiment associated with the reviews of the Amazon Kindle. We attribute the poor performance to the fact that movie and product reviews discuss different issues, and as such, there is limited similarity in the two classes of document. Not surprisingly, HanDles did a good job classifying text segments of the original training set. Also, the finding demonstrated that, unlike many other sentiment analysis tools that only classify text at the whole-document level, HanDles can be used effectively to extract the issues being discussed within documents, and assign sentiment to those. For example, a review of a film might be classified as negative overall, but HanDles can determine that the acting was good, but the directing was poor. Finally, when we tested HanDles on a new set of movie reviews it had not seen before, it performed with 93.3 accuracy. The results of our trial suggest that there must be some similarity between the documents used during training and those used in the operational context for HanDles to work properly.
- Information Science