Slide 38
Slide 38 text
文献一覧
[Wang+ ’24] T. Wang et al.: EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI, In CVPR, 2024.
[Lyu+ ’24] R. Lyu et al.: MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations, In NeurIPS, 2024.
[Liu+ ’24] Y. Liu et al.: Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, arXiv preprint arXiv:2407.06886, 2024
[Brazil+ ’23] G. Brazil et al.: Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild, In CVPR, 2023.
[Dai+ ’17] A. Dai et al.: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, In CVPR, 2017.
[Cao+ ’22] A-Q. Cao & R. de Charette: MonoScene: Monocular 3D Semantic Scene Completion, In CVPR, 2022.
[Wand+ ’19] J. Wald et al: RIO: 3D Object Instance Re-Localization in Changing Indoor Environments, In ICCV, 2019.
[Ding+ ’23] R. Ding et al.: PLA: Language-Driven Open-Vocabulary 3D Scene Understanding, In CVPR, 2023.
[Chen+ ’20] D. Z. Chen et al.: ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language, In ECCV, 2020.
[Achlioptas+ ’20] P. Achlioptas et al.: ReferIt3D: Neural Listeners for Fine-Grained3D Object Identification in Real-World Scenes, In ECCV, 2020.
[Rukhovich+ ’20] D. Rukhovich et al.: FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection, In ECCV, 2022.
[Wang+ ’23] W. Wang et al.: CogVLM: Visual Expert for Pretrained Language Models. arXiv preprint arXiv:2311.03079, 2023.
[Chen+ ’24] Z. Chen et al.: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites, arXiv preprint
arXiv:2404.16821, 2024.
38