Privacy-Preserving Data Mining

Title: Privacy-Preserving Data Mining
Authors: Samet, Saeed
Date: 2010
Abstract: Although data is very valuable in every organization, it must be processed in order to be useful. Data mining is a collection of techniques which find patterns and associations in raw data, and classify or cluster the items according to their attributes. Related data is normally distributed among two or more parties in different configurations, and mining can be done in an accurate and useful way for all parties involved, on all data collections. However, security and privacy often represent crucial requirements in different scenarios as organizations and parties involved may not want to disclose their own private information to each other. Assuring adequate and verifiable security and privacy in these scenarios faces various challenges. One such challenge is whether proposed protocols can be used over public channels, such as the internet. There are many applications running on the world wide web among multiple parties in which different subsets of the parties, in the intermediate steps, need to exchange private information. However, over public channels, other parties will also receive that information, which can then compromise the privacy of the whole protocol. Another possible issue is whether the complete final result of a privacy-preserving protocol can be broadcasted to, or be received by all parties involved, which in return may cause leakage of private data belonging to each party. For instance, if the final decision tree is known by all parties, some private patterns owned by a party could be disclosed to the others. Collusion attacks can pose another security challenge in multi-party settings. For example, it may be possible that in a given protocol, two or more conspiring parties can learn about private data owned by one or more other parties, even if they correctly follow the protocol steps. Finally, it may be desirable to design incremental versions of privacy-preserving protocols to improve both security and efficiency. Such algorithms can decrease the amount of data which is exchanged through the intermediate steps of the protocols and can also efficiently use some previous results to update and prepare final aggregate knowledge using new incoming data. These will increase the levels of security and efficiency of the protocols, while decreasing the amount of memory needed to store previous data for future use. To address these and related problems, in this thesis we have designed new secure building blocks and privacy-preserving data mining protocols, while considering their performance in terms of security and efficiency. Our building blocks, and the resulting protocols which take advantage of these blocks such as decision trees, k-means clustering, and association rule mining, can be implemented over public channels, have a balanced distribution of the final results, and are resistant to collusion attacks. We have also used these blocks to design novel privacy-preserving protocols for some popular learning and data mining techniques, such as Bayesian networks and neural networks, in particular perceptron, back-propagation and K 2 algorithms. We have also proposed incremental learning versions of some of these protocols. Detailed complexity and security analyses for the proposed protocols are also provided along with different experimental results for the secure building blocks and the main protocols to show their performance and applicability to various real world applications.
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
NR66263.PDF6.28 MBAdobe PDFOpen