DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
AD1208823
Title:
COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks
Report Date:
2022-06-24
Abstract:
The ability to quickly identify whether two binaries are similar is critical for many security applications, with use cases ranging from triaging millions of novel malware samples, to identifying whether a binary contains a known exploitable bug. There have been many program analysis approaches to solving this problem, however, most machine learning approaches in the last 5 years have focused on function similarity, and there have been no techniques released that are able toperform robust many to many comparisons of full programs. In this paper,we present the xC;rst machine learning approach capable of learning arobust representation of programs based on their similarity, using a combinationof supervised natural language processing and graph learning. We name our prototype COBRA: Contrastive Learning to Optimize Binary Representation Analysis.We evaluate our model on several dixB;erent metrics for program similarity, such as compiler optimizations, code obfuscations, and dixB;erent pieces of semantically similar source code. Our approach outperforms current techniques for full binary dixE;ng, achieving an F1 score and AUC .6 and .12, respectively, higher than BinDixB; while also having the ability to perform many-to-many comparisons.
Document Type:
Conference:
Journal:
Pages:
20
File Size:
1.32MB
FA8702-15-D-0001
( FA870215D0001);
Contracts:
Grants:
Distribution Statement:
Approved For Public Release