Contributions to Statistical Theory of Data Privacy

Qu, ChangContributions to Statistical Theory of Data PrivacyUniversité d'Ottawa | University of Ottawa2025Data privacySynthetic data generationUniversité d'Ottawa | University of OttawaUniversité d'Ottawa | University of Ottawa2025-01-142025-01-142025-01-14enThesishttp://hdl.handle.net/10393/50095https://doi.org/10.20381/ruor-30858Attribution-NonCommercial-ShareAlike 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-sa/4.0/This thesis explores key challenges and methodologies in the statistical theory of data privacy, focusing on disclosure risk assessment and synthetic data generation. The research reviews established privacy frameworks, such as k-anonymity,-diversity, t-closeness, and differential privacy, and highlights their practical limitations. To address these gaps, a new approach to Correct Attribution Probability (CAP) is proposed, utilizing equivalence classes to enhance applicability and interpretability. The thesis also provides a detailed analysis of synthetic data generation methods, assessing their utility and privacy implications, and thoroughly examines the Synthpop package. Several improvements to Synthpop are proposed, including better handling of data dependencies, the incorporation of privacy metrics like differential privacy, and more robust utility evaluation methods. These contributions aim to improve the balance between data privacy and utility.