A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes

Description
Title: A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes
Authors: Jung, Ho-Won
El Emam, Khaled
Date: 2014-05-29
Abstract: Abstract Background A linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail (e.g., geocodes such as ZIP or postal codes, census blocks, and locations on maps) while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small. The LP model determines the transition probability from an original location of a patient to a new randomized location. However, it has a limitation for the cases of areas with a small population (e.g., median of 10 people in a ZIP code). Methods We extend the previous LP model to accommodate the cases of a smaller population in some locations, while creating de-identified patient spatial data sets which ensure the risk of re-identification is very small. Results Our LP model was applied to a data set of 11,740 postal codes in the City of Ottawa, Canada. On this data set we demonstrated the limitations of the previous LP model, in that it produces improbable results, and showed how our extensions to deal with small areas allows the de-identification of the whole data set. Conclusions The LP model described in this study can be used to de-identify geospatial information for areas with small populations with minimal distortion to postal codes. Our LP model can be extended to include other information, such as age and gender.
URL: http://dx.doi.org/10.1186/1476-072X-13-16
http://hdl.handle.net/10393/33714
CollectionLibre accès - Publications // Open Access - Publications
Files