LAVA: Large scale Automated Vulnerability Addition
MIT Lincoln Laboratory Lexington United States
Pagination or Media Count:
Work on automating vulnerability discovery has long been hampered by a shortage of ground-truth corpora with which to evaluate tools and techniques. To begin to address this, we present LAVA, a system for automatically and quickly injecting large numbers of realistic bugs into program source code. LAVA employs a pair of taint-based measures to identify program quantities that both depend upon specific input bytes in a simple way yet do not overly influence control flow. These DUAs dead-uncomplicated and available data are employed, via source-to-source transformation, to perturb program quantities at later program points that are likely to cause vulnerabilities. Every LAVA vulnerability is accompanied by a input that triggers it, whereas normal inputs are extremely unlikely to do so. Further, every injected bug is validated, and thus every working bug comes with both a proof-of-concept input and a known manifestation point. These vulnerabilities are synthetic but, we argue, still realistic, in the sense that they are embedded deep within programs and are triggered by real inputs. In order for an automated tool to discover them, it would have to be able to reason correctly and precisely about all the code executed up to the DUA. Using LAVA, we have injected thousands of bugs into popular programs such as file, readelf, bash, and tshark. We believe LAVA can form the basis of an approach for generating extremely high quality ground truth vulnerability corpora on demand.
- Computer Programming and Software
- Computer Systems Management and Standards