Private search over big data leveraging distributed file system and parallel processing

Selçuk, Ayşe (2015) Private search over big data leveraging distributed file system and parallel processing. [Thesis]

[thumbnail of AyseSelcuk_10064660.pdf] PDF
AyseSelcuk_10064660.pdf

Download (1MB)

Abstract

As the new technologies recently became widespread, enormous amount of data started to be generated in very high speeds and stored in untrusted servers. The big data concept covers not only the exceptional size of the datasets, but also high data generation rate and large variety of data types. Although the Big Data provides very tempting benefits, the security issues are still an open problem. In this thesis, we identify security and privacy problems associated with a certain big data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the big data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method, which is one of the fundamental operations that can be performed over encrypted data, to the big data setting, in which, not only data is huge but also changing and accumulating very fast. Therefore, in the big data setting, a secure index that allows search over encrypted data should be constructed and updated very fast in addition to an efficient and effective keyword-based search operation method. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model to work with very large datasets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in dynamically changing and large datasets. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data
Item Type: Thesis
Subjects: Q Science > QA Mathematics > QA076 Computer software
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 15 May 2017 16:29
Last Modified: 26 Apr 2022 10:09
URI: https://research.sabanciuniv.edu/id/eprint/31351

Actions (login required)

View Item
View Item