Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code
| dc.contributor.author | Naderi, Amir | |
| dc.contributor.supervisor | Shirani, Paria | |
| dc.date.accessioned | 2025-12-02T21:49:21Z | |
| dc.date.available | 2025-12-02T21:49:21Z | |
| dc.date.issued | 2025-12-02 | |
| dc.description.abstract | The growth of the Internet of Things (IoT) has heightened the need for effective binary-level N-day vulnerability detection, particularly when source code is unavailable. The binaries deployed on IoT devices are often compiled under varying conditions, such as different compilers and optimization levels, and, in cross-platform settings, across multiple CPU architectures. This high degree of variability means that even functionally identical code can have vastly different binary representations, posing significant challenges to conventional code analysis techniques (e.g., simple syntactic or signature-based analysis methods). Existing ML-based methods typically rely either on structural graph representations binary code or on instruction sequences of the code, each capturing only a partial view of a binary's semantics. This thesis introduces a hybrid analysis framework that integrates both structural and semantic perspectives using the Code Property Graph with Natural Code Sequence (CPGNCS), with the goal of improving binary vulnerability detection across diverse compilation settings. This graph-based representation encodes syntax, control flow, and data dependencies, while preserving the natural order of instructions. To enhance semantic understanding within individual nodes in the CPG-NCS graph, we incorporate CodeBERT, a transformer-based language model pretrained on code. These enriched node embeddings are processed by a Gated Graph Neural Network (GGNN) trained in a Siamese architecture to identify functional similarity across binary functions. Our method achieves significant improvements in vulnerability detection performance under diverse compilation settings. Notably, it yields an average increase of 5% in F1-score and up to 8% in AUC compared with state-of-the-art solutions. We also test our approach on a variety of obfuscated code built using different obfuscators. Additionally, we introduce a formulation for estimating the number of GGNN layers based on graph feature metrics and validate its effectiveness through extensive experiments. The practical utility of our approach is further demonstrated by successfully identifying known vulnerabilities in real-world IoT firmware images (e.g. IoT operating system). | |
| dc.identifier.uri | http://hdl.handle.net/10393/51137 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31586 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.subject | Vulnerability Detection | |
| dc.subject | Internet of Things | |
| dc.subject | Binary Code Similarity Detection | |
| dc.title | Transformer and Graph Neural Networks Based Vulnerability Detection in Binary Code | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Génie / Engineering | |
| thesis.degree.level | Masters | |
| thesis.degree.name | MCS | |
| uottawa.department | Science informatique et génie électrique / Electrical Engineering and Computer Science |
