A high performance CPU-GPU database for streaming data analysis

Warning The system is temporarily closed to updates for reporting purpose.

Abdennebi, Anes (2020) A high performance CPU-GPU database for streaming data analysis. [Thesis]

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://risc01.sabanciuniv.edu/record=b2486355 _ (Table of contents)


The outstanding spread of database management system architectures in the last decade, plus the increasing growth, volume, and velocity of the data, which is known nowadays as “Big Data”, are continuously urging researchers, businessmen and companies to build robust and scalable database management systems (DBMS) and improve them in a way they adjust smoothly with the evolution of data. On the other hand, there is a tendency to support the conventional processing units (PUs), which are the Central Processing Units (CPUs), with additional computing power like the emerging Graphical Processing Units (GPUs). The research community has accepted the potential of vigorous computing power for data-intensive applications. Several research studies were conducted in the last years that ended up in building remarkable DBMSs by integrating GPUs and using them according to different workload distribution algorithms and query optimization protocols. Thus, we try to address a new approach by building a hybrid columnar-based highperformance database management system calling it DOLAP which adopts the Online Analytical Processing (OLAP) infrastructure. Distinctively from previous hybrid DBMSs, our database, DOLAP, depends on Bloom filters while performing different operations on data (ingesting, checking, modifying, and deleting). We implement this probabilistic data structure in DOLAP to prevent unnecessary memory accesses while checking the database’s data records. This method is proved to be useful by reducing the total running times by 35%. Moreover, since there exist two main PUs with different characteristics, the CPU and GPU, a workload distribution model that effectively decides the query’s executing unit at a time T should be defined to improve the efficiency of our system. Therefore, we suggested 3 load balancing models, the Random-based, Algorithm-based and the Improved Algorithmbased models. We run our tests on the Chicago Taxi Driver dataset taken from Kaggle and among the 3 load balancing models, the improved algorithm-based model demonstrates its effectiveness in well distributing the query load between the CPUs and GPUs where it outperforms the other models in nearly all the test runs

Item Type:Thesis
Uncontrolled Keywords:OLAP Databases. -- CPU, GPU. -- Big Data. -- Bloom Filter. -- OLAP veritabanları. -- Büyük Veri. -- Bloom Filre.
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
ID Code:41177
Deposited By:IC-Cataloging
Deposited On:23 Oct 2020 22:34
Last Modified:23 Oct 2020 22:34

Repository Staff Only: item control page