Özdemir, Anıl and Odacı, Berke and Tanatar Baruh, Lorans and Varol, Onur and Balcısoy, Selim (2025) Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning. Journal on Computing and Cultural Heritage, 18 (4). ISSN 1556-4673 (Print) 1556-4711 (Online)
This is the latest version of this item.
Official URL: https://dx.doi.org/10.1145/3746658
Abstract
Recent efforts to digitize textual, visual, and physical forms of cultural heritage require advanced tools for preservation and analysis. The availability of extensive online data creates a need for intelligent systems to help users and archivists understand latent relationships in these collections. A major challenge in cultural heritage studies is the labor-intensive process of analyzing these materials. Inconsistent linguistic terms and ambiguous concepts in digital documents make it difficult to uncover relationships without expert supervision. Moreover, while advanced models based on large-scale pretraining demonstrate strong performance in extracting semantic relationships, they depend on extensive pretraining on large external datasets, limiting their applicability for smaller or specialized collections. We propose a system that combines natural language processing for entity extraction with graph representation learning to model relationships among documents, categories, and n-grams, resulting in a fully connected network representation. Unlike methods requiring large-scale pretraining, our approach operates effectively using only the information available in the dataset itself, making it particularly suited for smaller cultural heritage document collections. The system extracts significant terms from document metadata, produces embeddings for each document, and uses these embeddings to build a recommendation system for entity discovery. We tested the system on a collection of early 20th-century documents from Crete, evaluating its performance against alternative methods in collaboration with experts from the archival research organization SALT. This approach not only facilitates deeper insights into smaller, specialized collections but also reduces dependency on vast external training resources, enhancing its practical utility in cultural heritage studies.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Graph Representation Learning; Machine Learning; Natural Language Processing; Recommendation Systems |
| Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
| Depositing User: | Onur Varol |
| Date Deposited: | 23 Mar 2026 11:03 |
| Last Modified: | 23 Mar 2026 11:03 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53604 |
Available Versions of this Item
-
Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning. (deposited 26 Sep 2025 15:51)
- Enhancing cultural heritage archive analysis via automated entity extraction and graph-based representation learning. (deposited 23 Mar 2026 11:03) [Currently Displayed]

