Repository logo

Generalization in Machine Learning Through Information-Theoretic Lens

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa | University of Ottawa

Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Abstract

In this thesis, we utilize an information-theoretic framework to investigate generalization theory in machine learning, a critical area of research today. Specifically, we develop novel information-theoretic generalization bounds for machine learning algorithms. First, we apply information-theoretic analysis for models trained using stochastic gradient descent (SGD). We do so by invoking an auxiliary weight process and by approximating SGD using stochastic differential equations (SDE). Our analysis reveals intriguing phenomena such as epoch-wise double descent of gradient dispersion when trained with noisy labels. We also use our bounds to design new regularization techniques, including dynamic gradient clipping and Gaussian model perturbation, that can improve generalization performance. Furthermore, our framework is not limited to SGD-based algorithms; we also derive new information-theoretic bounds for any black-box learning algorithm, which are tighter than previous results based on the same settings. In addition, we apply our analysis to unsupervised domain adaptation (UDA), obtaining generalization bounds for two notions of the generalization error. Our algorithm-dependent bounds enable us to design new regularization techniques that can boost the performance of domain adaptation algorithms. Finally, we combine the stability-based generalization analysis with our information-theoretic analysis to derive novel generalization bounds, which explain the generalization in cases where previous information-theoretic bounds have fallen short.

Description

Keywords

Machine Learning, Information Theory, Statistical Learning Theory, Generalization

Citation

Related Materials

Alternate Version