Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning

Warning The system is temporarily closed to updates for reporting purpose.

Özdemir, Anıl and Odacı, Berke and Baruh, Lorans and Varol, Onur and Balcısoy, Selim (2025) Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning. Journal on Computing and Cultural Heritage . ISSN 1556-4673 (Print) 1556-4711 (Online) Published Online First http://dx.doi.org/10.1145/3746658

Full text not available from this repository. (Request a copy)

Abstract

Recent efforts to digitize textual, visual, and physical forms of cultural heritage require advanced tools for preservation and analysis. The availability of extensive online data creates a need for intelligent systems to help users and archivists understand latent relationships in these collections. A major challenge in cultural heritage studies is the labor-intensive process of analyzing these materials. Inconsistent linguistic terms and ambiguous concepts in digital documents make it difficult to uncover relationships without expert supervision. Moreover, while advanced models based on large-scale pretraining demonstrate strong performance in extracting semantic relationships, they depend on extensive pretraining on large external datasets, limiting their applicability for smaller or specialized collections. We propose a system that combines natural language processing for entity extraction with graph representation learning to model relationships among documents, categories, and n-grams, resulting in a fully-connected network representation. Unlike methods requiring large-scale pretraining, our approach operates effectively using only the information available in the dataset itself, making it particularly suited for smaller cultural heritage document collections. The system extracts significant terms from document metadata, produces embeddings for each document, and uses these embeddings to build a recommendation system for entity discovery. We tested the system on a collection of early 20th-century documents from Crete, evaluating its performance against alternative methods in collaboration with experts from the archival research organization SALT. This approach not only facilitates deeper insights into smaller, specialized collections but also reduces dependency on vast external training resources, enhancing its practical utility in cultural heritage studies.
Item Type: Article
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Selim Balcısoy
Date Deposited: 26 Sep 2025 15:51
Last Modified: 26 Sep 2025 15:51
URI: https://research.sabanciuniv.edu/id/eprint/52362

Actions (login required)

View Item
View Item