Generalization in Machine Learning Through Information-Theoretic Lens

Wang, Ziqiao

Generalization in Machine Learning Through Information-Theoretic Lens

Files

Wang_Ziqiao_2024_thesis.pdf (43.95 MB)

Date

2024-03-22

Authors

Wang, Ziqiao

Publisher

Université d'Ottawa | University of Ottawa

Creative Commons

Abstract

In this thesis, we utilize an information-theoretic framework to investigate generalization theory in machine learning, a critical area of research today. Specifically, we develop novel information-theoretic generalization bounds for machine learning algorithms. First, we apply information-theoretic analysis for models trained using stochastic gradient descent (SGD). We do so by invoking an auxiliary weight process and by approximating SGD using stochastic differential equations (SDE). Our analysis reveals intriguing phenomena such as epoch-wise double descent of gradient dispersion when trained with noisy labels. We also use our bounds to design new regularization techniques, including dynamic gradient clipping and Gaussian model perturbation, that can improve generalization performance. Furthermore, our framework is not limited to SGD-based algorithms; we also derive new information-theoretic bounds for any black-box learning algorithm, which are tighter than previous results based on the same settings. In addition, we apply our analysis to unsupervised domain adaptation (UDA), obtaining generalization bounds for two notions of the generalization error. Our algorithm-dependent bounds enable us to design new regularization techniques that can boost the performance of domain adaptation algorithms. Finally, we combine the stability-based generalization analysis with our information-theoretic analysis to derive novel generalization bounds, which explain the generalization in cases where previous information-theoretic bounds have fallen short.

Keywords

Machine Learning, Information Theory, Statistical Learning Theory, Generalization

URI

http://hdl.handle.net/10393/46050
https://doi.org/10.20381/ruor-30225

Collections

- Thèses, 2011 - // Theses, 2011 -

Full item page Statistics

Generalization in Machine Learning Through Information-Theoretic Lens

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Creative Commons

Abstract

Description

Keywords

Citation

URI

Collections

Related Materials

Alternate Version