Suppressing microdata to prevent classification based inference

Warning The system is temporarily closed to updates for reporting purpose.

Azgın Hintoğlu, Ayça (2011) Suppressing microdata to prevent classification based inference. [Thesis]

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: (Table of Contents)


The revolution of Internet together with the progression in computer technology makes it easy for institutions to collect unprecedented amount of personal data. This pervasive data collection rally coupled with the increasing necessity of sharing of it raised a lot of concerns about privacy. Widespread usage of data mining techniques, enabling institutions to extract previously unknown and strategically useful information from huge collections of data sets, and thus gain competitive advantages, has also contributed to the fears about privacy. One method to ensure privacy during disclosure is to selectively hide or generalize the confidential information. However, with data mining techniques it is now possible for an adversary to predict hidden or generalized confidential information using the rest of the disclosed data set. We concentrate on one such possible threat, classification, which is a data mining technique widely used for prediction purposes, and propose algorithms that modify a given microdata set either by inserting unknown values (i.e. deletion) or by generalizing the original values to prevent both probabilistic and decision tree classification based inference. To evaluate the proposed algorithms we experiment with real-life data sets. Results show that proposed algorithms successfully suppress microdata and prevent both probabilistic and decision tree classification based inference. The hybrid versions of the algorithms, which aim to suppress a confidential data value against both classification models, block the inference channels with substantially less side effects. Similarly, the enhanced versions of the algorithms, which aim to suppress multiple confidential data values, reduce the side effects by nearly 50%.

Item Type:Thesis
Uncontrolled Keywords:Privacy. -- Data disclosure protection. -- Data suppression. -- Data perturbation. -- Data generalization. -- Data mining. -- Mahremiyet. -- Verinin ifşa edilirken korunması. -- Veri bastırma. -- Veri karıştırma. -- Veri genelleme. -- Veri madenciliği.
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
ID Code:24560
Deposited By:IC-Cataloging
Deposited On:25 Sep 2014 14:41
Last Modified:25 Mar 2019 17:09

Repository Staff Only: item control page