View The Document

Accession Number:



COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks

Author Organization(s):

Report Date:



The ability to quickly identify whether two binaries are similar is critical for many security applications, with use cases ranging from triaging millions of novel malware samples, to identifying whether a binary contains a known exploitable bug. There have been many program analysis approaches to solving this problem, however, most machine learning approaches in the last 5 years have focused on function similarity, and there have been no techniques released that are able toperform robust many to many comparisons of full programs. In this paper,we present the xC;rst machine learning approach capable of learning arobust representation of programs based on their similarity, using a combinationof supervised natural language processing and graph learning. We name our prototype COBRA: Contrastive Learning to Optimize Binary Representation Analysis.We evaluate our model on several dixB;erent metrics for program similarity, such as compiler optimizations, code obfuscations, and dixB;erent pieces of semantically similar source code. Our approach outperforms current techniques for full binary dixE;ng, achieving an F1 score and AUC .6 and .12, respectively, higher than BinDixB; while also having the ability to perform many-to-many comparisons.



File Size:



Communities of Interest:

Distribution Statement:

Approved For Public Release

View The Document