Privacy preserving publishing of hierarchical data

Özalp, İsmet (2017) Privacy preserving publishing of hierarchical data. [Thesis]

[thumbnail of IsmetOzalp_10161208.pdf] PDF
IsmetOzalp_10161208.pdf

Download (1MB)

Abstract

Many applications today rely on storage and management of semi-structured information, e.g., XML databases and document-oriented databases. This data often has to be shared with untrusted third parties, which makes individuals' privacy a fundamental problem. In this thesis, we propose anonymization techniques for privacy preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We addressed these challenges by utilizing two major privacy techniques; generalization and anatomization. Data generalization encapsulates data by mapping nearly low-level values (e.g., influenza) to higher-level concepts (e.g., respiratory system diseases). Using generalizations and suppression of data values, we revised two standards for privacy protection: kanonymity that hides individuals within groups of k members and `-diversity that bounds the probability of linking sensitive values with individuals.We then apply these standards to hierarchical data and present utility-aware algorithms that enforce the standards. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees. Data anatomization masks the link between identifying attributes and sensitive attributes. This mechanism removes the necessity for generalization and opens up the possibility for higher utility. While this is so, anatomization has not been proposed for hierarchical data where utility is a serious concern due to high dimensionality. In this thesis we show, how one can perform the non-trivial task of defining anatomization in the context of hierarchical data. Moreover, we extend the definition of classical `-diversity and introduce (p,m)-privacy that bounds the probability of being linked to more than m occurrences of any sensitive values by p. Again, in our experiments we have observed that even under stricter privacy conditions our method performs exemplary.
Item Type: Thesis
Additional Information: Yükseköğretim Kurulu Tez Merkezi Tez No: 478660.
Uncontrolled Keywords: Privacy. -- Data publishing. -- Hierachical data. -- k-anonimity. -- l-diversity. -- Anatomization. -- Mahremiyet. -- Veri yayınlanması. -- Hiyerarşik veri. -- k-anonim. -- l-çeşitlilik. -- Anatomlama.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 09 Apr 2018 13:10
Last Modified: 26 Apr 2022 10:15
URI: https://research.sabanciuniv.edu/id/eprint/34388

Actions (login required)

View Item
View Item