Slide 108
Slide 108 text
参考文献(6/6)
[Kamikubo+, CHI'25] Kamikubo, R., Kayukawa, S., Kaniwa, Y., Wang, A., Kacorri, H., Takagi, H., & Asakawa, C. (2025, April). Beyond Omakase: Designing Shared
Control for Navigation Robots with Blind People. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (pp. 1-17).
[Stanescu+, ISMAR’23] Stanescu, A., Mohr, P., Kozinski, M., Mori, S., Schmalstieg, D., & Kalkofen, D. (2023, October). State-Aware Configuration Detection for
Augmented Reality Step-by-Step Tutorials. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 157-166). IEEE.
[Yagi+, IJCV'25] Yagi, T., Ohashi, M., Huang, Y., Furuta, R., Adachi, S., Mitsuyama, T., & Sato, Y. (2025). FineBio: a fine-grained video dataset of biological
experiments with hierarchical annotation. International Journal of Computer Vision, 1-16.
[Nair+, CoRL’22] Nair, S., Rajeswaran, A., Kumar, V., Finn, C., & Gupta, A. (2022, August). R3M: A Universal Visual Representation for Robot Manipulation. In 6th
Annual Conference on Robot Learning.
[Kareer+, ArXiv'24] Kareer, S., Patel, D., Punamiya, R., Mathur, P., Cheng, S., Wang, C., ... & Xu, D. (2024). Egomimic: Scaling imitation learning via egocentric
video. arXiv preprint arXiv:2410.24221.
[Shi+, ICRA‘25] Shi, J., Zhao, Z., Wang, T., Pedroza, I., Luo, A., Wang, J., ... & Jayaraman, D. (2025). ZeroMimic: Distilling Robotic Manipulation Skills from Web
Videos, ICRA.
[Yang+, Arxiv’25] Yang, R., Yu, Q., Wu, Y., Yan, R., Li, B., Cheng, A. C., ... & Wang, X. (2025). EgoVLA: Learning Vision-Language-Action Models from Egocentric
Human Videos. arXiv preprint arXiv:2507.12440.
[Luo+, ArXiv’25] Luo, H., Feng, Y., Zhang, W., Zheng, S., Wang, Y., Yuan, H., ... & Lu, Z. (2025). Being-H0: Vision-Language-Action Pretraining from Large-Scale
Human Videos. arXiv preprint arXiv:2507.15597.
[Bahl+, CVPR'23] Bahl, S., Mendonca, R., Chen, L., Jain, U., & Pathak, D. (2023). Affordances from human videos as a versatile representation for robotics. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13778-13790).
[Hoque+, ArXiv'25] Hoque, R., Huang, P., Yoon, D. J., Sivapurapu, M., & Zhang, J. (2025). EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric
Video. arXiv preprint arXiv:2505.11709.
[Singh+, WACV’16] Singh, K. K., Fatahalian, K., & Efros, A. A. (2016, March). Krishnacam: Using a longitudinal, single-person, egocentric dataset for scene
understanding tasks. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1-9). IEEE.
[Yang+, CVPR'25] Yang, J., Liu, S., Guo, H., Dong, Y., Zhang, X., Zhang, S., ... & Liu, Z. (2025). Egolife: Towards egocentric life assistant. In Proceedings of the
Computer Vision and Pattern Recognition Conference (pp. 28885-28900).
[Chatterjee+, ICCV’25] Chatterjee, D., Remelli, E., Song, Y., Tekin, B., Mittal, A., Bhatnagar, B., ... & Sener, F. (2025). Memory-efficient Streaming VideoLLMs for
Real-time Procedural Video Understanding. ICCV.
109