Evaluation of features for predicting document difficulty

Erdal, Büşra (2022) Evaluation of features for predicting document difficulty. [Thesis]

[thumbnail of 10453675.pdf] PDF

Download (3MB)


Knowing the difficulty of a text document, in particular learning materials, has many benefits, such as recommending documents that are tailored towards a specific target group with the goal of maximizing understanding when reading these recommended documents. While different factors exist that affect document difficulty, they capture different aspects of it. One of which is readability, which captures syntactical and lexical text properties and relates to linguistic difficulty. Another one is the background knowledge needed for readers to understand a given document because concepts therein might be more or less complex. Although both factors have been analyzed in isolation, their interplay is unknown. Similarly, the importance of both factors has not been examined, although addressing any of those problems could improve the understanding of document difficulty and thus pave the way towards more reliable models for predicting document difficulty. Hence, this work investigates both problems by proposing a supervised model that extracts 20 features related to background knowledge and readability of a document to predict its difficulty. This model serves as the basis for analyzing the importance of these features and the interplay between background knowledge and readability for estimating document difficulty. We find that linguistic difficulty is more important than background knowledge across all datasets. To the best of our knowledge, there are no datasets in the educational domain available for predicting document difficulty, thus we created one about biological concepts. We release this dataset to the research community in the hope to stimulate more research and provide more data to assess the reliability of methods for predicting document difficulty across different domains.
Item Type: Thesis
Uncontrolled Keywords: document difficulty. -- machine learning. -- explainable AI. -- conceptual complexity. -- readability assessment. -- doküman zorluğu. -- makine öğrenimi. -- açıklanabilir yapay zeka. -- kavramsal karmaşıklık. -- okunabilirlik analizi.
Subjects: T Technology > T Technology (General) > T055.4-60.8 Industrial engineering. Management engineering > T58.5 Information technology
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 25 Apr 2023 14:37
Last Modified: 25 Apr 2023 14:37
URI: https://research.sabanciuniv.edu/id/eprint/47161

Actions (login required)

View Item
View Item