The Re-identification Risk of Canadians from Longitudinal Demographics
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: The public is less willing to allow their personal health information to be disclosed for research
purposes if they do not trust researchers and how researchers manage their data. However, the public is more
comfortable with their data being used for research if the risk of re-identification is low. There are few studies on
the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their
longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and
longitudinal demographics of Canadians.
Methods: Uniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample
of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and
gender as well as their generalizations, for periods ranging from 1 year to 11 years.
Results: Almost 98% of the population was unique on full postal code, date of birth and gender: these three
variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable
generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed
guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.
Conclusions: A large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data
sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk.
Data custodians need to generalize their demographic information further for longitudinal data sets.
Description
Keywords
longitudinal demographics, Re-identification of health data, demographic identifiers, Montreal unique demographic identifiers, Canadian health information
