A Toolbox for privacy preserving distributed data mining
Kaya, Selim Volkan (2007) A Toolbox for privacy preserving distributed data mining. [Thesis]
Distributed structure of individual data makes it necessary for data holders to perform collaborative analysis over the collective database for better data mining results. However each site has to ensure the privacy of its individual data, which means no information is revealed about individual values. Privacy preserving distributed data mining is utilized for that purpose. In this study, we try to draw more attention to the topic of privacy preserving data mining by showing a model which is realistic for data mining, and allows for very efficient protocols. We give two protocols which are useful tools in data mining: a protocol for Yaoѫs millionaires problem, and a protocol for numerical distance. Our solution to Yaoѫs millionaires problem is of independent interest since it gives a solution which improves on known protocols with respect to both computation complexity and communication overhead. This protocol can be used for different purposes in privacy preserving data mining algorithms such as comparison and equality test of data records. Our numerical distance protocol is also applicable to variety of algorithms. In this study we applied our numerical distance protocol in a privacy preserving distributed clustering protocol for horizontally partitioned data. We show application of our protocol over different attribute types such as interval-scaled,binary, nominal, ordinal, ratio-scaled, and alphanumeric. We present proof of security of our protocol, and explain communication, and computation complexity analysis indetail.
Repository Staff Only: item control page