Özdemir, Anıl (2021) Data driven exploration of document collection to understand underlying social fabric using graph representation learning. [Thesis]
PDF
10411555.pdf
Download (2MB)
10411555.pdf
Download (2MB)
Abstract
An enormous collection of documents is digitally available in text, images, and other representations for cultural heritage (CH). The availability of such extensive data creates a need for various approaches that allow users and archivists to understand latent relationships in collections. However, one of the biggest challenges of documents in cultural heritage is that it takes a long time and is difficult for archivists to analyze and process documents. Due to this manual process, there may be situations where the person, place, and events mentioned in these documents are not expressed in the same linguistic terms and words, or they contain ambiguous concepts that make it difficult to understand; as a result, it is challenging to uncover these relationships without careful examination by a professional. Therefore, there is a need for an archivist who will re-analyze these terms to capture similar events, persons, and places between the documents and thus reveal the latent relationship. To fill this gap, we proposed a system that combines various NLP algorithms and graph representation learning methods using only the textual summary of the documents and the documents’ metadata. The system automatically extracts substantial terms in the documents, then produces embedding for the documents themselves and these terms. Finally, the proposed system has been used to explore the document collection and perform document recommendations by utilizing calculated document embeddings. We evaluated and compared the performance of the proposed work with alternative methods through an experiment we conducted with archive experts.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Graph Theory. -- Natural Language Processing. -- Machine Learning. -- Graph Representation Learning. -- Recommendation Systems. -- Heterojen Bilgi Agları. -- Grafik Teorisi. -- Dogal Dil Isleme. -- Makine Ögrenmesi. -- Grafik Temsili Ögrenme. -- Öneri Sistemleri. |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | IC-Cataloging |
Date Deposited: | 20 Oct 2021 10:24 |
Last Modified: | 26 Apr 2022 10:39 |
URI: | https://research.sabanciuniv.edu/id/eprint/42502 |