Repository logo

Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code

dc.contributor.authorNaderi, Amir
dc.contributor.supervisorShirani, Paria
dc.date.accessioned2025-12-02T21:49:21Z
dc.date.available2025-12-02T21:49:21Z
dc.date.issued2025-12-02
dc.description.abstractThe growth of the Internet of Things (IoT) has heightened the need for effective binary-level N-day vulnerability detection, particularly when source code is unavailable. The binaries deployed on IoT devices are often compiled under varying conditions, such as different compilers and optimization levels, and, in cross-platform settings, across multiple CPU architectures. This high degree of variability means that even functionally identical code can have vastly different binary representations, posing significant challenges to conventional code analysis techniques (e.g., simple syntactic or signature-based analysis methods). Existing ML-based methods typically rely either on structural graph representations binary code or on instruction sequences of the code, each capturing only a partial view of a binary's semantics. This thesis introduces a hybrid analysis framework that integrates both structural and semantic perspectives using the Code Property Graph with Natural Code Sequence (CPGNCS), with the goal of improving binary vulnerability detection across diverse compilation settings. This graph-based representation encodes syntax, control flow, and data dependencies, while preserving the natural order of instructions. To enhance semantic understanding within individual nodes in the CPG-NCS graph, we incorporate CodeBERT, a transformer-based language model pretrained on code. These enriched node embeddings are processed by a Gated Graph Neural Network (GGNN) trained in a Siamese architecture to identify functional similarity across binary functions. Our method achieves significant improvements in vulnerability detection performance under diverse compilation settings. Notably, it yields an average increase of 5% in F1-score and up to 8% in AUC compared with state-of-the-art solutions. We also test our approach on a variety of obfuscated code built using different obfuscators. Additionally, we introduce a formulation for estimating the number of GGNN layers based on graph feature metrics and validate its effectiveness through extensive experiments. The practical utility of our approach is further demonstrated by successfully identifying known vulnerabilities in real-world IoT firmware images (e.g. IoT operating system).
dc.identifier.urihttp://hdl.handle.net/10393/51137
dc.identifier.urihttps://doi.org/10.20381/ruor-31586
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectVulnerability Detection
dc.subjectInternet of Things
dc.subjectBinary Code Similarity Detection
dc.titleTransformer and Graph Neural Networks Based Vulnerability Detection in Binary Code
dc.typeThesisen
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelMasters
thesis.degree.nameMCS
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files