Accession Number : AD1043714


Title :   Source-Code Stylometry Improvements in Python


Descriptive Note : Technical Report,15 Sep 2017,31 Oct 2017


Corporate Author : ARMY RESEARCH LAB ABERDEEN PROVING GROUND MD ABERDEEN PROVING GROUND United States


Personal Author(s) : Shearer,Gregory ; Nelson,Frederica


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/1043714.pdf


Report Date : 14 Dec 2017


Pagination or Media Count : 18


Abstract : This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.


Descriptors :   PYTHON PROGRAMMING LANGUAGE , COMPUTER PROGRAMS , MACHINE LEARNING , data processing


Subject Categories : Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE