Constructing and Classifying Email Networks from Raw Forensic Images
Naval Postgraduate School Monterey United States
Pagination or Media Count:
Email addresses extracted from secondary storage devices are important to a forensic analyst when conducting an investigation. They can provide insight into the users social network and help identify other potential persons of interest. However, a large portion of the email addresses from any given device are artifacts of installed software and are of no interest to the analyst. We propose a method for discovering relevant email addresses by creating graphs consisting of extracted email addresses along with their byte-offset location in storage. We compute certain global attributes of these graphs to construct feature vectors, which we use to classify graphs into useful and not useful categories. This process filters out the majority of uninteresting email addresses. We show that using the network topological measures on the dataset tested, Nave Bayes and SVM were successful in identifying 100 and 955, respectively, of all graphs that contained useful email addresses both with areas under the curve above 97 and F1 scores at 80 and 90 for Nave Bayes and SVM, respectively. Our results show that using network science metrics as attributes to classify graphs of email addresses based on the graphs topology could be an effective and efficient tool for automatically delivering evidence to an analyst.
- Information Science