Enhancing retrieval-augmented generation for datascience: a comprehensıve framework for academicliterature navigation

Aytar, Ahmet Yasin (2024) Enhancing retrieval-augmented generation for datascience: a comprehensıve framework for academicliterature navigation. [Thesis]

PDF
10687211.pdf

Download (799kB)

Abstract

In the rapidly evolving field of data science, efficiently navigating the expansive body ofacademic literature is crucial for informed decision-making and innovation. This thesispresents an enhanced Retrieval-Augmented Generation (RAG) application designed toassist data scientists in accessing precise and contextually relevant academic resources.The application integrates advanced techniques, including GeneRation Of BIbliographicData (GROBID), fine-tuning embedding model, semantic chunking, and an abstract-firstretrieval method, to significantly improve the relevance and accuracy of the retrievedinformation. A comprehensive evaluation using the Retrieval-Augmented GenerationAssessment System (RAGAS) framework demonstrates substantial improvements in keymetrics, particularly Context Relevance, underscoring the system’s effectiveness in reducinginformation overload and enhancing decision-making processes. Our findings highlightthe potential of this enhanced RAG system to transform academic exploration within datascience, providing a valuable tool for researchers and practitioners alike.
Item Type: Thesis
Uncontrolled Keywords: Retrieval-Augmented Generation (RAG), Data Science, Literature Retrieval,Academic Insights, Large Language Models (LLM). -- Retrieval-Augmented Generation (RAG), Veri Bilimi, LiteratürTarama, Akademik İçgörüler, Büyük Dil Modelleri (LLM).
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 21 Apr 2025 22:54
Last Modified: 21 Apr 2025 22:54
URI: https://research.sabanciuniv.edu/id/eprint/51768

Actions (login required)

View Item
View Item