A high performance CPU-GPU database for streaming data analysis

Abdennebi, Anes (2020) A high performance CPU-GPU database for streaming data analysis. [Thesis]

[thumbnail of 10352609_Abdennebi_Anes.pdf]

PDF
10352609_Abdennebi_Anes.pdf
Download (1MB)

Official URL: https://risc01.sabanciuniv.edu/record=b2486355 _ (Table of contents)

Abstract

The outstanding spread of database management system architectures in the last decade, plus the increasing growth, volume, and velocity of the data, which is known nowadays as “Big Data”, are continuously urging researchers, businessmen and companies to build robust and scalable database management systems (DBMS) and improve them in a way they adjust smoothly with the evolution of data. On the other hand, there is a tendency to support the conventional processing units (PUs), which are the Central Processing Units (CPUs), with additional computing power like the emerging Graphical Processing Units (GPUs). The research community has accepted the potential of vigorous computing power for data-intensive applications. Several research studies were conducted in the last years that ended up in building remarkable DBMSs by integrating GPUs and using them according to different workload distribution algorithms and query optimization protocols. Thus, we try to address a new approach by building a hybrid columnar-based highperformance database management system calling it DOLAP which adopts the Online Analytical Processing (OLAP) infrastructure. Distinctively from previous hybrid DBMSs, our database, DOLAP, depends on Bloom filters while performing different operations on data (ingesting, checking, modifying, and deleting). We implement this probabilistic data structure in DOLAP to prevent unnecessary memory accesses while checking the database’s data records. This method is proved to be useful by reducing the total running times by 35%. Moreover, since there exist two main PUs with different characteristics, the CPU and GPU, a workload distribution model that effectively decides the query’s executing unit at a time T should be defined to improve the efficiency of our system. Therefore, we suggested 3 load balancing models, the Random-based, Algorithm-based and the Improved Algorithmbased models. We run our tests on the Chicago Taxi Driver dataset taken from Kaggle and among the 3 load balancing models, the improved algorithm-based model demonstrates its effectiveness in well distributing the query load between the CPUs and GPUs where it outperforms the other models in nearly all the test runs

Item Type:	Thesis
Uncontrolled Keywords:	OLAP Databases. -- CPU, GPU. -- Big Data. -- Bloom Filter. -- OLAP veritabanları. -- Büyük Veri. -- Bloom Filre.
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions:	Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences
Depositing User:	IC-Cataloging
Date Deposited:	23 Oct 2020 22:34
Last Modified:	26 Apr 2022 10:34
URI:	https://research.sabanciuniv.edu/id/eprint/41177

Actions (login required)

: View Item