A Study of Topic and Topic Change in Conversational Threads
NAVAL POSTGRADUATE SCHOOL MONTEREY CA DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
This thesis applies Latent Dirichlet Allocation LDA to the problem of topic and topic change in conversational threads using e-mail. We demonstrate that LDA can be used to successfully classify raw e-mail messages with threads to which they belong, and compare the results with those for processed threads, where quoted and reply text have been removed. Raw thread classification performs better, but processed threads show promise. We then present two new, unsupervised techniques for identifying topic change in e-mail. The first is a keyword clustering approach using LDA and DBSCAN to identify clusters of topics, and transition points between them. The second is a sliding window technique which assesses the current topic for every window, identifying transition points. The keyword clustering performs better than the sliding window approach. Both can be used as a baseline for future work.
- Information Science
- Computer Programming and Software
- Command, Control and Communications Systems