Authorship Attribution of Short Messages Using Multimodal Features
NAVAL POSTGRADUATE SCHOOL MONTEREY CA
Pagination or Media Count:
In this thesis, we develop a multimodal classifier for authorship attribution of short messages. Standard natural language processing authorship attribution techniques are applied to a Twitter text corpus. Using character n-gram features and a Na ve Bayes classifier, we build statistical models of the set of authors. The social network of the selected Twitter users is analyzed using the screen names referenced in their messages. The timestamps of the messages are used to generate a pattern-of-life model. We analyze the physical layer of a network by measuring modulation characteristics of GSM cell phones. A statistical model of each cell phone is created using a Na ve Bayes classifier. Each phone is assigned to a Twitter user, and the probability outputs of the individual classifiers are combined to show that the combination of natural-language and network-feature classifiers identifies a user to phone binding better than when the individual classifiers are used independently.
- Computer Systems