DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
AD1043714
Title:
Source-Code Stylometry Improvements in Python
Descriptive Note:
Technical Report,15 Sep 2017,31 Oct 2017
Corporate Author:
ARMY RESEARCH LAB ABERDEEN PROVING GROUND MD ABERDEEN PROVING GROUND United States
Report Date:
2017-12-14
Pagination or Media Count:
18.0
Abstract:
This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE