Slide 24
Slide 24 text
● Soni, S., et al. (2025). EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
arXiv:2412.15190. https://arxiv.org/abs/2412.15190
● Yunkai Dang et al. (2025) FUSE_RSVLM: Feature Fusion Vision-Language Model for Remote Sensing.
arXiv:2512.24022. https://arxiv.org/pdf/2512.24022
● Klemmer et al., 2023. SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery.
arXiv:2311.17179. https://arxiv.org/pdf/2311.17179
● Radford, A., et al. (2021) Learning Transferable Visual Models From Natural Language Supervision.
Proceedings of the International Conference on Machine Learning (ICML)
arXiv:2103.00020. https://arxiv.org/pdf/2103.00020
参考文献