Adversarial Robustness of Deep Learning Models

Tian, Runzhi2025-05-222025-05-222025-05-22http://hdl.handle.net/10393/50508https://doi.org/10.20381/ruor-31141Deep neural networks (DNNs) have demonstrated remarkable success across various machine learning tasks but remain highly vulnerable to adversarial perturbations. Adversarial training (AT) and its variants aim to enhance robustness by incorporating adversarial examples into training. However, AT often leads to both standard and robust generalization issues, the causes of which remain largely elusive due to the complex learning dynamics involved. This thesis investigates the learning behavior of AT by analyzing the evolution of perturbation-induced data distributions. Our findings reveal a surprising phenomenon: the distribution induced by adversarial perturbations during AT becomes progressively more difficult to learn. We establish a theoretical explanation for this behavior by deriving a generalization bound that attributes it to the increasing local dispersion of the perturbation operator. Experimental results validate this explanation and further link this deteriorating behavior of the induced distributions to robust overfitting in AT. To advance the understanding of generalization in adversarial settings, we propose a unified framework for analyzing perturbation-induced loss functions. Within this framework, we introduce a novel stability analysis of AT and derive generalization upper bounds based on the expansiveness properties of adversarial perturbations. These expansiveness parameters appear to not only govern the vanishing rate of the generalization error but also govern its scaling constant. Our analysis attributes robust overfitting in Projected Gradient Descent (PGD)-based AT to the sign function used in PGD attacks, which results in poor expansiveness properties. We further show that similar issues extend to a broader class of PGD-like iterative attack algorithms, highlighting an intrinsic challenge in adversarial training. By providing theoretical insights and empirical validations, this thesis deepens our understanding of the learning behavior of AT and paves the way for more principled approaches to improving robust generalization.enAdversarial RobustnessDeep LearningGeneralization TheoryAdversarial Robustness of Deep Learning ModelsThesis