Context-based extraction of concepts from unstructured textual documents

Gul, Saima and Rabiger, Stefan and Saygın, Yücel (2022) Context-based extraction of concepts from unstructured textual documents. Information Sciences, 588 . pp. 248-264. ISSN 0020-0255 (Print) 1872-6291 (Online)

Full text not available from this repository. (Request a copy)

Official URL: https://dx.doi.org/10.1016/j.ins.2021.12.056

Abstract

Summarizing a collection of unstructured textual documents, e.g., lecture slides or book chapters, by extracting the most relevant concepts helps learners realize connections among these concepts. However, to accomplish this goal existing methods neglect the context in which concepts are extracted - because a concept might be irrelevant in one context, but relevant in another one. To that end we propose a novel unsupervised method for extracting the relevant concepts from a collection of unstructured textual documents assuming that the documents are related to a certain topic. Our two-step method first identifies candidate concepts from the textual documents, then infers the context information for the input documents and finally ranks them with respect to the inferred context. In the second step this context information is enriched with more abstract information to improve the ranking process. In the experiments we demonstrate that our method outperforms seven supervised and unsupervised approaches on five datasets and is competitive on the other two. Furthermore, we release three new benchmark datasets that were created from books in the educational domain. Our code and datasets are available at: https://github.com/gulsaima/COBEC.

Item Type:	Article
Uncontrolled Keywords:	Concept extraction; Keyword extraction; Knowledge base; Unsupervised learning
Divisions:	Faculty of Engineering and Natural Sciences
Depositing User:	Yücel Saygın
Date Deposited:	26 Aug 2022 20:47
Last Modified:	26 Aug 2022 20:47
URI:	https://research.sabanciuniv.edu/id/eprint/43916

Actions (login required)

: View Item