File Carving and Malware Identification Algorithms Applied to Firmware Reverse Engineering
AIR FORCE INST OF TECH WRIGHT-PATTERSON AFB OH GRADUATE SCHOOL OF ENGINEERING AND MANAGEMENT
Pagination or Media Count:
Modern society depends on critical infrastructure CI managed by Programmable Logic Controllers PLCs. PLCs depend on firmware, though firmware security vulnerabilities and contents remain largely unexplored. Attackers are acquiring the knowledge required to construct and install malicious firmware on CI. To the defender, firmware reverse engineering is a critical, but tedious, process. This thesis applies machine learning algorithms, from the le carving and malware identification fields, to firmware reverse engineering. It characterizes the algorithms performance. This research describes and characterizes a process to speed and simplify PLC firmware analysis. The system partitions binary firmwares into segments, labels each segment with a le type, determines the target architecture of code segments, then disassembles and performs rudimentary analysis on the code segments. The research discusses the systems accuracy on a set of pseudo- firmwares. Of the algorithms this research considers, a combination of a byte-value frequency file carving algorithm and a support vector machine SVM algorithm using information gain IG for feature selection achieve the best performance. That combination correctly identifies the file types of 57.4 of non-code bytes, and the architectures of 85.3 of code bytes. This research applies the Firmware Disassembly System to a real-world firmware and discusses the contents.
- Operations Research
- Computer Programming and Software