Odacı, Berke (2025) LLM-Assisted Onboarding Via Retrieval-Augmented Interactive Computational Notebooks. [Thesis]
10729389.pdf
Download (2MB)
Abstract
Recent advancements in large language models (LLMs) have significantly improvedtheir ability to understand programming workflows and generate functional code.While these models are widely used for code-related tasks such as generation andcompletion, they often fall short in providing sufficient explanation or contextual understanding,both of which are essential for effectively working with existing projects.This challenge is particularly evident in Visual Analytics workflows, where interactivecomputational notebooks (e.g., Jupyter Notebooks) are commonly used toprototype and document complex visualizations, data transformations, and machinelearning pipelines. These notebooks are accessed not only by developers but also bydomain experts such as economists, analysts, or researchers who interact with theoutputs, interpret the findings, or request changes. For both groups, onboardinginto an unfamiliar project can be time-consuming and error-prone due to missingdocumentation, implicit logic, and the complexity of the code-output relationship.To address this, we present a tool that supports the onboarding process by leveragingLLMs to analyze, explain, and edit interactive computational notebooks. Thesystem parses the notebook into a directed graph of cells, generates natural languageexplanations for each cell, and stores them in a retrieval-augmented vector store.Users interact with the notebook through a web-based interface, where they can asknatural language questions, select specific cells for focused explanations, and evenrequest code modifications, all with the ability to revert changes if needed. We evaluate the tool with both software developers and domain experts through amixed-method study, including task-based interactions and post-task surveys. Resultsshow that the tool improves users’ understanding of unfamiliar notebooks,increases their confidence in continuing the project, and is highly valued as a futureonboarding aid. The tool demonstrates the potential of LLMs to bridge thegap between code and interpretation in data-driven environments, supporting moreefficient collaboration and knowledge transfer acros
| Item Type: | Thesis |
|---|---|
| Uncontrolled Keywords: | Large Language Models, Interactive Computational Notebooks,Onboarding Support, Code Comprehension, Visual Analytics. -- Büyük Dil Modelleri, Etkilesimli Hesaplama Defterleri,Uyum Süreci , Kod Anlayısı, Görsel Analitik. |
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware |
| Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
| Depositing User: | Dila Günay |
| Date Deposited: | 15 Jan 2026 16:10 |
| Last Modified: | 15 Jan 2026 16:10 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53623 |

