Şakiroğlu, Mehmet Can and Güvenir, Halil Altay and Kaya, Kamer (2026) Generating multiple-choice knowledge questions with interpretable difficulty estimation using knowledge graphs and large language models. Machine Learning and Knowledge Extraction, 8 (5). ISSN 2504-4990
Generating.pdf
Available under License Creative Commons Attribution.
Download (3MB)
Official URL: https://dx.doi.org/10.3390/make8050137
Abstract
Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from input documents by utilizing knowledge graphs (KGs) and large language models (LLMs). Our approach uses an LLM to construct a KG from input documents, from which MCQs are then systematically generated. Each MCQ is generated by selecting a node from the KG as the key, sampling a related triple or quintuple—optionally augmented with an extra triple—and prompting an LLM to generate a corresponding stem from these graph components. Distractors are then selected from the KG. For each MCQ, nine difficulty signals are computed and combined into a unified difficulty score using a data-driven approach. Within a 150-MCQ, proof-of-concept dataset from Wikipedia, the proposed signals show interpretable associations with empirical incorrect-answer rates aligning with human responses/performance. The results support the feasibility of the proposed pipeline, yet a larger-scale human study may be required to establish deployment-scale validity. Our approach improves automated MCQ generation by integrating structured knowledge representations with LLMs and a data-driven difficulty estimation model.
| Item Type: | Article |
|---|---|
| Additional Information: | This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. |
| Uncontrolled Keywords: | difficulty estimation; interpretability; knowledge graph; large language models; multiple-choice question generation |
| Divisions: | Center of Excellence in Data Analytics Faculty of Engineering and Natural Sciences |
| Depositing User: | Kamer Kaya |
| Date Deposited: | 10 Jun 2026 11:28 |
| Last Modified: | 10 Jun 2026 11:28 |
| URI: | https://research.sabanciuniv.edu/id/eprint/54154 |

