Generating multiple-choice knowledge questions with interpretable difficulty estimation using knowledge graphs and large language models

Şakiroğlu, Mehmet Can and Güvenir, Halil Altay and Kaya, Kamer (2026) Generating multiple-choice knowledge questions with interpretable difficulty estimation using knowledge graphs and large language models. Machine Learning and Knowledge Extraction, 8 (5). ISSN 2504-4990

PDF (Open Access (© 2026 by the authors))
Generating.pdf
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from input documents by utilizing knowledge graphs (KGs) and large language models (LLMs). Our approach uses an LLM to construct a KG from input documents, from which MCQs are then systematically generated. Each MCQ is generated by selecting a node from the KG as the key, sampling a related triple or quintuple—optionally augmented with an extra triple—and prompting an LLM to generate a corresponding stem from these graph components. Distractors are then selected from the KG. For each MCQ, nine difficulty signals are computed and combined into a unified difficulty score using a data-driven approach. Within a 150-MCQ, proof-of-concept dataset from Wikipedia, the proposed signals show interpretable associations with empirical incorrect-answer rates aligning with human responses/performance. The results support the feasibility of the proposed pipeline, yet a larger-scale human study may be required to establish deployment-scale validity. Our approach improves automated MCQ generation by integrating structured knowledge representations with LLMs and a data-driven difficulty estimation model.
Item Type: Article
Additional Information: This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Uncontrolled Keywords: difficulty estimation; interpretability; knowledge graph; large language models; multiple-choice question generation
Divisions: Center of Excellence in Data Analytics
Faculty of Engineering and Natural Sciences
Depositing User: Kamer Kaya
Date Deposited: 10 Jun 2026 11:28
Last Modified: 10 Jun 2026 11:28
URI: https://research.sabanciuniv.edu/id/eprint/54154

Actions (login required)

View Item
View Item