Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code

Naderi, Amir

Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code

dc.contributor.author	Naderi, Amir
dc.contributor.supervisor	Shirani, Paria
dc.date.accessioned	2025-12-02T21:49:21Z
dc.date.available	2025-12-02T21:49:21Z
dc.date.issued	2025-12-02
dc.description.abstract	The growth of the Internet of Things (IoT) has heightened the need for effective binary-level N-day vulnerability detection, particularly when source code is unavailable. The binaries deployed on IoT devices are often compiled under varying conditions, such as different compilers and optimization levels, and, in cross-platform settings, across multiple CPU architectures. This high degree of variability means that even functionally identical code can have vastly different binary representations, posing significant challenges to conventional code analysis techniques (e.g., simple syntactic or signature-based analysis methods). Existing ML-based methods typically rely either on structural graph representations binary code or on instruction sequences of the code, each capturing only a partial view of a binary's semantics. This thesis introduces a hybrid analysis framework that integrates both structural and semantic perspectives using the Code Property Graph with Natural Code Sequence (CPGNCS), with the goal of improving binary vulnerability detection across diverse compilation settings. This graph-based representation encodes syntax, control flow, and data dependencies, while preserving the natural order of instructions. To enhance semantic understanding within individual nodes in the CPG-NCS graph, we incorporate CodeBERT, a transformer-based language model pretrained on code. These enriched node embeddings are processed by a Gated Graph Neural Network (GGNN) trained in a Siamese architecture to identify functional similarity across binary functions. Our method achieves significant improvements in vulnerability detection performance under diverse compilation settings. Notably, it yields an average increase of 5% in F1-score and up to 8% in AUC compared with state-of-the-art solutions. We also test our approach on a variety of obfuscated code built using different obfuscators. Additionally, we introduce a formulation for estimating the number of GGNN layers based on graph feature metrics and validate its effectiveness through extensive experiments. The practical utility of our approach is further demonstrated by successfully identifying known vulnerabilities in real-world IoT firmware images (e.g. IoT operating system).
dc.identifier.uri	http://hdl.handle.net/10393/51137
dc.identifier.uri	https://doi.org/10.20381/ruor-31586
dc.language.iso	en
dc.publisher	Université d'Ottawa / University of Ottawa
dc.subject	Vulnerability Detection
dc.subject	Internet of Things
dc.subject	Binary Code Similarity Detection
dc.title	Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code
dc.type	Thesis	en
thesis.degree.discipline	Génie / Engineering
thesis.degree.level	Masters
thesis.degree.name	MCS
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science

Collections

Thèses - Embargo // Theses - Embargo

Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code

Fichiers

Collections