Topic Time Series Analysis of Microblogs
CALIFORNIA UNIV LOS ANGELES DEPT OF MATHEMATICS
Pagination or Media Count:
Social media data tends to cluster in time and space around events, such as sports competitions and local news-worthy phenomena. However, transforming raw, free-form, real time text into meaningful information remains a challenging task. Confounding factors include the massive volume of posted data lack of reliable event information, hidden temporal trends, and the vastly diverse nature of content. In the present work, we examine spatio-temporal topic distributions and self-exciting time series models as applied to social media microblog data. We apply topic modeling using non-negative matrix factorization with sparsity constraints to discover prevalent topics as well as latent thematic word associations within topics. We then present two methods for mining interesting spatio-temporal dynamics and relations among topics one that compares the topic distributions directly, and another that models topics over time as temporal or spatio-temporal Hawkes process with exponential trigger functions. This second method allows identification of self-exciting topics and reveals unique temporal and spatial relationships among them.
- Numerical Mathematics