[Kaneda+, RA-L/IROS24], RREx-BoT [Sigurdsson+, IROS23], RelaX-Former [Yashima+, RA-L25] 大規模画像・テキスト対 で訓練された基盤モデル CLIP [Radford+, ICML21], Long-CLIP [Zhang+, ECCV24], SigLIP [Zhai+, ICCV23], BLIP-2 [Li+, ICML23], BEiT-3 [Wang+, CVPR23] グラフ構造に基づく 実世界環境の表現 Embodied-RAG [Xie+, 24], MoMa-LLM [Honerkamp+, RA-L24], ConceptGraphs [Gu+, ICRA24], HOV-SG [Werby+, RSS24] RelaX-Former Embodied-RAG