Privacy preserving publishing of hierarchical data
Özalp, İsmet (2017) Privacy preserving publishing of hierarchical data. [Thesis]
Many applications today rely on storage and management of semi-structured information, e.g., XML databases and document-oriented databases. This data often has to be shared with untrusted third parties, which makes individuals' privacy a fundamental problem. In this thesis, we propose anonymization techniques for privacy preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We addressed these challenges by utilizing two major privacy techniques; generalization and anatomization. Data generalization encapsulates data by mapping nearly low-level values (e.g., influenza) to higher-level concepts (e.g., respiratory system diseases). Using generalizations and suppression of data values, we revised two standards for privacy protection: kanonymity that hides individuals within groups of k members and `-diversity that bounds the probability of linking sensitive values with individuals.We then apply these standards to hierarchical data and present utility-aware algorithms that enforce the standards. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees. Data anatomization masks the link between identifying attributes and sensitive attributes. This mechanism removes the necessity for generalization and opens up the possibility for higher utility. While this is so, anatomization has not been proposed for hierarchical data where utility is a serious concern due to high dimensionality. In this thesis we show, how one can perform the non-trivial task of defining anatomization in the context of hierarchical data. Moreover, we extend the definition of classical `-diversity and introduce (p,m)-privacy that bounds the probability of being linked to more than m occurrences of any sensitive values by p. Again, in our experiments we have observed that even under stricter privacy conditions our method performs exemplary.
Repository Staff Only: item control page