Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning

Özdemir, Anıl and Odacı, Berke and Tanatar Baruh, Lorans and Varol, Onur and Balcısoy, Selim (2025) Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning. Journal on Computing and Cultural Heritage, 18 (4). ISSN 1556-4673 (Print) 1556-4711 (Online)

This is the latest version of this item.

Full text not available from this repository. (Request a copy)

Abstract

Recent efforts to digitize textual, visual, and physical forms of cultural heritage require advanced tools for preservation and analysis. The availability of extensive online data creates a need for intelligent systems to help users and archivists understand latent relationships in these collections. A major challenge in cultural heritage studies is the labor-intensive process of analyzing these materials. Inconsistent linguistic terms and ambiguous concepts in digital documents make it difficult to uncover relationships without expert supervision. Moreover, while advanced models based on large-scale pretraining demonstrate strong performance in extracting semantic relationships, they depend on extensive pretraining on large external datasets, limiting their applicability for smaller or specialized collections. We propose a system that combines natural language processing for entity extraction with graph representation learning to model relationships among documents, categories, and n-grams, resulting in a fully connected network representation. Unlike methods requiring large-scale pretraining, our approach operates effectively using only the information available in the dataset itself, making it particularly suited for smaller cultural heritage document collections. The system extracts significant terms from document metadata, produces embeddings for each document, and uses these embeddings to build a recommendation system for entity discovery. We tested the system on a collection of early 20th-century documents from Crete, evaluating its performance against alternative methods in collaboration with experts from the archival research organization SALT. This approach not only facilitates deeper insights into smaller, specialized collections but also reduces dependency on vast external training resources, enhancing its practical utility in cultural heritage studies.
Item Type: Article
Uncontrolled Keywords: Graph Representation Learning; Machine Learning; Natural Language Processing; Recommendation Systems
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: Onur Varol
Date Deposited: 23 Mar 2026 11:03
Last Modified: 23 Mar 2026 11:03
URI: https://research.sabanciuniv.edu/id/eprint/53604

Available Versions of this Item

Actions (login required)

View Item
View Item