On Confidence, Agreement and Calibration of Deep Ensembles

Manjiyani, Alim2024-04-122024-04-122024-04-12http://hdl.handle.net/10393/46095https://doi.org/10.20381/ruor-30259Deep ensembles, composed of multiple independently trained Deep neural networks (DNNs), have demonstrated strong predictive power and reliability in various machine learning applications such as classification [6], uncertainty estimation [21], anomaly detection [11], medical diagnostics [1] etc. This paper delves into the intricate interplay between confidence, agreement, and calibration within deep ensembles, shedding light on their relationships and combined properties. We begin this thesis by providing a brief yet insightful background of Calibration, its various notions, evaluation methods, and the existing techniques to achieve calibration. We extend on Generalization Disagreement Equality (GDE) [16] - a natural phenomenon of deep ensembles, which states that the agreement and accuracy of a deep ensemble are equal in expectation over population. We introduce a more generalized version of this phenomenon and call it Generalized GDE (GGDE). Similar to [16], we provide empirical evidence of GGDE and provide the sufficient calibration conditions required for GGDE.We further study the interaction between agreement and the true-confidence (accuracy) and establish bounds on their absolute difference. As a result, we shed light upon the possibility of instance-wise GGDE and provide a suitable calibration condition for the same. In the second half of the thesis, we formulate the notion of “Confidence Calibration” and emphasize its importance theoretically. We study its properties and unveil its interesting properties. Remarkably, our work suggests that Confidence Calibration can serve as a theoretical foundation for understanding the effectiveness of Label Smoothing [34], thus establishing a connection between calibration and regularization. We end the thesis by combining the properties of GGDE, Label Smoothing, and Confidence Calibration to provide a novel training algorithm termed Agreement Guided Label Smoothing (AGLS). We provide a theoretical justification for its functioning and demonstrate its effectiveness empirically. By advancing our understanding of Confidence Calibration, this work contributes to the broader goal of enhancing the trustworthiness and applicability of deep learning models in various domains.enCalibrationConfidenceAgreementLabel SmoothingDeep LearningGeneralization Disagreement Equality (GDE)On Confidence, Agreement and Calibration of Deep EnsemblesThesis