Private search over big data leveraging distributed file system and parallel processing

Selçuk, Ayşe (2015) Private search over big data leveraging distributed file system and parallel processing. [Thesis]

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: http://risc01.sabanciuniv.edu/record=b1606648 (Table of Contents)


As the new technologies recently became widespread, enormous amount of data started to be generated in very high speeds and stored in untrusted servers. The big data concept covers not only the exceptional size of the datasets, but also high data generation rate and large variety of data types. Although the Big Data provides very tempting benefits, the security issues are still an open problem. In this thesis, we identify security and privacy problems associated with a certain big data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the big data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method, which is one of the fundamental operations that can be performed over encrypted data, to the big data setting, in which, not only data is huge but also changing and accumulating very fast. Therefore, in the big data setting, a secure index that allows search over encrypted data should be constructed and updated very fast in addition to an efficient and effective keyword-based search operation method. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model to work with very large datasets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in dynamically changing and large datasets. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data

Item Type:Thesis
Subjects:Q Science > QA Mathematics > QA076 Computer software
ID Code:31351
Deposited By:IC-Cataloging
Deposited On:15 May 2017 16:29
Last Modified:25 Mar 2019 17:17

Repository Staff Only: item control page