Domain Generalized Remote Sensing Scene Captionıng Via Country-Level Geographic Information

Aydın, Kerem (2025) Domain Generalized Remote Sensing Scene Captionıng Via Country-Level Geographic Information. [Thesis]

PDF
10732315.pdf
Download (21MB)

Abstract

This thesis investigates the impact of incorporating country-level, text-based geographicalinformation into a large-scale vision-language model fine-tuned for captioningoptical remote sensing imagery. We hypothesize that enriching visual inputs withcorresponding geographical context can enhance model performance, particularly ingeneralizing to images from previously unseen countries or continents. To test this,we fine-tune the Large Language and Vision Assistant (LLaVA) on optical satelliteimages from European countries, augmenting them with textual geographicaldescriptions, and evaluate its performance on images from other global regions. Experimentsconducted across 175 countries using the newly released SkyScript datasetreveal that even lightweight geographical context—extracted from Wikipedia—canmitigate cross-country domain shifts, leading to notable improvements in captioningaccuracy. These findings highlight the potential of multimodal approaches inenhancing the geographic generalization capabilities of vision-language models.

Item Type:	Thesis
Uncontrolled Keywords:	Scene Captioning, Domain Generalization, Remote Sensing and OpenVocabulary Classification. -- Sahne Altyazılama, Alan Genellemesi, Uzaktan Algılama veAçık Sözlük Sınıflandırılması.
Divisions:	Faculty of Engineering and Natural Sciences
Depositing User:	Dila Günay
Date Deposited:	30 Dec 2025 15:54
Last Modified:	30 Dec 2025 15:54
URI:	https://research.sabanciuniv.edu/id/eprint/53568

Actions (login required)

: View Item