Suppressing microdata to prevent classification based inference

Azgın Hintoğlu, Ayça (2011) Suppressing microdata to prevent classification based inference. [Thesis]

[thumbnail of AycaAzginHintoglu_412279.pdf] PDF
AycaAzginHintoglu_412279.pdf

Download (1MB)

Abstract

The revolution of Internet together with the progression in computer technology makes it easy for institutions to collect unprecedented amount of personal data. This pervasive data collection rally coupled with the increasing necessity of sharing of it raised a lot of concerns about privacy. Widespread usage of data mining techniques, enabling institutions to extract previously unknown and strategically useful information from huge collections of data sets, and thus gain competitive advantages, has also contributed to the fears about privacy. One method to ensure privacy during disclosure is to selectively hide or generalize the confidential information. However, with data mining techniques it is now possible for an adversary to predict hidden or generalized confidential information using the rest of the disclosed data set. We concentrate on one such possible threat, classification, which is a data mining technique widely used for prediction purposes, and propose algorithms that modify a given microdata set either by inserting unknown values (i.e. deletion) or by generalizing the original values to prevent both probabilistic and decision tree classification based inference. To evaluate the proposed algorithms we experiment with real-life data sets. Results show that proposed algorithms successfully suppress microdata and prevent both probabilistic and decision tree classification based inference. The hybrid versions of the algorithms, which aim to suppress a confidential data value against both classification models, block the inference channels with substantially less side effects. Similarly, the enhanced versions of the algorithms, which aim to suppress multiple confidential data values, reduce the side effects by nearly 50%.
Item Type: Thesis
Uncontrolled Keywords: Privacy. -- Data disclosure protection. -- Data suppression. -- Data perturbation. -- Data generalization. -- Data mining. -- Mahremiyet. -- Verinin ifşa edilirken korunması. -- Veri bastırma. -- Veri karıştırma. -- Veri genelleme. -- Veri madenciliği.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 25 Sep 2014 14:41
Last Modified: 26 Apr 2022 10:02
URI: https://research.sabanciuniv.edu/id/eprint/24560

Actions (login required)

View Item
View Item