Repository logo

Generalization in Machine Learning Through Information-Theoretic Lens

dc.contributor.authorWang, Ziqiao
dc.contributor.supervisorMao, Yongyi
dc.date.accessioned2024-03-22T19:24:38Z
dc.date.available2024-03-22T19:24:38Z
dc.date.issued2024-03-22
dc.description.abstractIn this thesis, we utilize an information-theoretic framework to investigate generalization theory in machine learning, a critical area of research today. Specifically, we develop novel information-theoretic generalization bounds for machine learning algorithms. First, we apply information-theoretic analysis for models trained using stochastic gradient descent (SGD). We do so by invoking an auxiliary weight process and by approximating SGD using stochastic differential equations (SDE). Our analysis reveals intriguing phenomena such as epoch-wise double descent of gradient dispersion when trained with noisy labels. We also use our bounds to design new regularization techniques, including dynamic gradient clipping and Gaussian model perturbation, that can improve generalization performance. Furthermore, our framework is not limited to SGD-based algorithms; we also derive new information-theoretic bounds for any black-box learning algorithm, which are tighter than previous results based on the same settings. In addition, we apply our analysis to unsupervised domain adaptation (UDA), obtaining generalization bounds for two notions of the generalization error. Our algorithm-dependent bounds enable us to design new regularization techniques that can boost the performance of domain adaptation algorithms. Finally, we combine the stability-based generalization analysis with our information-theoretic analysis to derive novel generalization bounds, which explain the generalization in cases where previous information-theoretic bounds have fallen short.
dc.identifier.urihttp://hdl.handle.net/10393/46050
dc.identifier.urihttps://doi.org/10.20381/ruor-30225
dc.language.isoen
dc.publisherUniversité d'Ottawa | University of Ottawa
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectMachine Learning
dc.subjectInformation Theory
dc.subjectStatistical Learning Theory
dc.subjectGeneralization
dc.titleGeneralization in Machine Learning Through Information-Theoretic Lens
dc.typeThesisen
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelDoctoral
thesis.degree.namePhD
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Wang_Ziqiao_2024_thesis.pdf
Size:
43.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: