Privacy preserving data publishing with multiple sensitive attributes
Abdalaal, Ahmed Fuad (2012) Privacy preserving data publishing with multiple sensitive attributes. [Thesis]
Official URL: http://192.168.1.20/record=b1479292 (Table of Contents)
Data mining is the process of extracting hidden predictive information from large databases, it has a great potential to help governments, researchers and companies focus on the most significant information in their data warehouses. High quality data and effective data publishing are needed to gain a high impact from data mining process. However there is a clear need to preserve individual privacy in the released data. Privacy-preserving data publishing is a research topic of eliminating privacy threats. At the same time it provides useful information in the released data. Normally datasets include many sensitive attributes; it may contain static data or dynamic data. Datasets may need to publish multiple updated releases with different time stamps. As a concrete example, public opinions include highly sensitive information about an individual and may reflect a person's perspective, understanding, particular feelings, way of life, and desires. On one hand, public opinion is often collected through a central server which keeps a user profile for each participant and needs to publish this data for researchers to deeply analyze. On the other hand, new privacy concerns arise and user's privacy can be at risk. The user's opinion is sensitive information and it must be protected before and after data publishing. Opinions are about a few issues, while the total number of issues is huge. In this case we will deal with multiple sensitive attributes in order to develop an efficient model. Furthermore, opinions are gathered and published periodically, correlations between sensitive attributes in different releases may occur. Thus the anonymization technique must care about previous releases as well as the dependencies between released issues. This dissertation identifies a new privacy problem of public opinions. In addition it presents two probabilistic anonymization algorithms based on the concepts of k-anonymity [1, 2] and l-diversity [3, 4] to solve the problem of both publishing datasets with multiple sensitive attributes and publishing dynamic datasets. Proposed algorithms provide a heuristic solution for multidimensional quasi-identifier and multidimensional sensitive attributes using probabilistic l-diverse definition. Experimental results show that these algorithms clearly outperform the existing algorithms in term of anonymization accuracy.
Repository Staff Only: item control page