Repository logo

Adversarial Robustness of Deep Learning Models

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa | University of Ottawa

Abstract

Deep neural networks (DNNs) have demonstrated remarkable success across various machine learning tasks but remain highly vulnerable to adversarial perturbations. Adversarial training (AT) and its variants aim to enhance robustness by incorporating adversarial examples into training. However, AT often leads to both standard and robust generalization issues, the causes of which remain largely elusive due to the complex learning dynamics involved. This thesis investigates the learning behavior of AT by analyzing the evolution of perturbation-induced data distributions. Our findings reveal a surprising phenomenon: the distribution induced by adversarial perturbations during AT becomes progressively more difficult to learn. We establish a theoretical explanation for this behavior by deriving a generalization bound that attributes it to the increasing local dispersion of the perturbation operator. Experimental results validate this explanation and further link this deteriorating behavior of the induced distributions to robust overfitting in AT. To advance the understanding of generalization in adversarial settings, we propose a unified framework for analyzing perturbation-induced loss functions. Within this framework, we introduce a novel stability analysis of AT and derive generalization upper bounds based on the expansiveness properties of adversarial perturbations. These expansiveness parameters appear to not only govern the vanishing rate of the generalization error but also govern its scaling constant. Our analysis attributes robust overfitting in Projected Gradient Descent (PGD)-based AT to the sign function used in PGD attacks, which results in poor expansiveness properties. We further show that similar issues extend to a broader class of PGD-like iterative attack algorithms, highlighting an intrinsic challenge in adversarial training. By providing theoretical insights and empirical validations, this thesis deepens our understanding of the learning behavior of AT and paves the way for more principled approaches to improving robust generalization.

Description

Keywords

Adversarial Robustness, Deep Learning, Generalization Theory

Citation

Related Materials

Alternate Version