Aytar, Ahmet Yasin (2024) Enhancing retrieval-augmented generation for datascience: a comprehensıve framework for academicliterature navigation. [Thesis]

10687211.pdf
Download (799kB)
Abstract
In the rapidly evolving field of data science, efficiently navigating the expansive body ofacademic literature is crucial for informed decision-making and innovation. This thesispresents an enhanced Retrieval-Augmented Generation (RAG) application designed toassist data scientists in accessing precise and contextually relevant academic resources.The application integrates advanced techniques, including GeneRation Of BIbliographicData (GROBID), fine-tuning embedding model, semantic chunking, and an abstract-firstretrieval method, to significantly improve the relevance and accuracy of the retrievedinformation. A comprehensive evaluation using the Retrieval-Augmented GenerationAssessment System (RAGAS) framework demonstrates substantial improvements in keymetrics, particularly Context Relevance, underscoring the system’s effectiveness in reducinginformation overload and enhancing decision-making processes. Our findings highlightthe potential of this enhanced RAG system to transform academic exploration within datascience, providing a valuable tool for researchers and practitioners alike.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Retrieval-Augmented Generation (RAG), Data Science, Literature Retrieval,Academic Insights, Large Language Models (LLM). -- Retrieval-Augmented Generation (RAG), Veri Bilimi, LiteratürTarama, Akademik İçgörüler, Büyük Dil Modelleri (LLM). |
Divisions: | Faculty of Engineering and Natural Sciences |
Depositing User: | Dila Günay |
Date Deposited: | 21 Apr 2025 22:54 |
Last Modified: | 21 Apr 2025 22:54 |
URI: | https://research.sabanciuniv.edu/id/eprint/51768 |