Automatic Author Profiling of Online Chat Logs
NAVAL POSTGRADUATE SCHOOL MONTEREY CA
Pagination or Media Count:
Now that the Internet has become easily accessible and more affordable, a larger number of people spend more time in front of a computer. Some spend so much time on the Internet that they develop virtual friendships and relationships people with whom they have regular contact via a computer screen and the Internet. While most of the dialogue exchanged online is not harmful or illegal, there are those with dishonest intentions lurking online. These people can be breaking the law by seducing a minor virtually or even going as far as meeting a minor in person. Terrorists can also use the Internet to facilitate communication and plan attacks. Since e-mail is one of the original means of communication on the Internet, methods for determining the author of an e-mail have already been studied. So far, however, no significant experimentation with online chat logs exists. The first of part of this study is comprised of generating an unbiased, random, and broad corpus of online chat logs. Having a general corpus with a wide-range of topics allows the results of this research to be applied in the most general case. Because developing a complete solution to the authorship attribution problem for chat logs is difficult, we limit our scope to predicting gender and age. The ultimate goal of this work, then, is to facilitate the jobs of law enforcers in tracking down criminals who attempt to use the Internet as a hiding place.