Generating multiple-choice knowledge questions with interpretable difficulty estimation using knowledge graphs and large language models

Şakiroğlu, Mehmet Can and Güvenir, Halil Altay and Kaya, Kamer (2026) Generating multiple-choice knowledge questions with interpretable difficulty estimation using knowledge graphs and large language models. Machine Learning and Knowledge Extraction, 8 (5). ISSN 2504-4990

PDF (Open Access (© 2026 by the authors))
Generating.pdf
Available under License Creative Commons Attribution.
Download (3MB)

Official URL: https://dx.doi.org/10.3390/make8050137

Abstract

Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from input documents by utilizing knowledge graphs (KGs) and large language models (LLMs). Our approach uses an LLM to construct a KG from input documents, from which MCQs are then systematically generated. Each MCQ is generated by selecting a node from the KG as the key, sampling a related triple or quintuple—optionally augmented with an extra triple—and prompting an LLM to generate a corresponding stem from these graph components. Distractors are then selected from the KG. For each MCQ, nine difficulty signals are computed and combined into a unified difficulty score using a data-driven approach. Within a 150-MCQ, proof-of-concept dataset from Wikipedia, the proposed signals show interpretable associations with empirical incorrect-answer rates aligning with human responses/performance. The results support the feasibility of the proposed pipeline, yet a larger-scale human study may be required to establish deployment-scale validity. Our approach improves automated MCQ generation by integrating structured knowledge representations with LLMs and a data-driven difficulty estimation model.

Item Type:	Article
Additional Information:	This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Uncontrolled Keywords:	difficulty estimation; interpretability; knowledge graph; large language models; multiple-choice question generation
Divisions:	Center of Excellence in Data Analytics Faculty of Engineering and Natural Sciences
Depositing User:	Kamer Kaya
Date Deposited:	10 Jun 2026 11:28
Last Modified:	10 Jun 2026 11:28
URI:	https://research.sabanciuniv.edu/id/eprint/54154

Actions (login required)

: View Item