Aydın, Kerem (2025) Domain Generalized Remote Sensing Scene Captionıng Via Country-Level Geographic Information. [Thesis]
10732315.pdf
Download (21MB)
Abstract
This thesis investigates the impact of incorporating country-level, text-based geographicalinformation into a large-scale vision-language model fine-tuned for captioningoptical remote sensing imagery. We hypothesize that enriching visual inputs withcorresponding geographical context can enhance model performance, particularly ingeneralizing to images from previously unseen countries or continents. To test this,we fine-tune the Large Language and Vision Assistant (LLaVA) on optical satelliteimages from European countries, augmenting them with textual geographicaldescriptions, and evaluate its performance on images from other global regions. Experimentsconducted across 175 countries using the newly released SkyScript datasetreveal that even lightweight geographical context—extracted from Wikipedia—canmitigate cross-country domain shifts, leading to notable improvements in captioningaccuracy. These findings highlight the potential of multimodal approaches inenhancing the geographic generalization capabilities of vision-language models.
| Item Type: | Thesis |
|---|---|
| Uncontrolled Keywords: | Scene Captioning, Domain Generalization, Remote Sensing and OpenVocabulary Classification. -- Sahne Altyazılama, Alan Genellemesi, Uzaktan Algılama veAçık Sözlük Sınıflandırılması. |
| Divisions: | Faculty of Engineering and Natural Sciences |
| Depositing User: | Dila Günay |
| Date Deposited: | 30 Dec 2025 15:54 |
| Last Modified: | 30 Dec 2025 15:54 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53568 |


