A., Kacorri, H., Takagi, H., & Asakawa, C. (2025, April). Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (pp. 1-17). [Stanescu+, ISMAR’23] Stanescu, A., Mohr, P., Kozinski, M., Mori, S., Schmalstieg, D., & Kalkofen, D. (2023, October). State-Aware Configuration Detection for Augmented Reality Step-by-Step Tutorials. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 157-166). IEEE. [Yagi+, IJCV'25] Yagi, T., Ohashi, M., Huang, Y., Furuta, R., Adachi, S., Mitsuyama, T., & Sato, Y. (2025). FineBio: a fine-grained video dataset of biological experiments with hierarchical annotation. International Journal of Computer Vision, 1-16. [Nair+, CoRL’22] Nair, S., Rajeswaran, A., Kumar, V., Finn, C., & Gupta, A. (2022, August). R3M: A Universal Visual Representation for Robot Manipulation. In 6th Annual Conference on Robot Learning. [Kareer+, ArXiv'24] Kareer, S., Patel, D., Punamiya, R., Mathur, P., Cheng, S., Wang, C., ... & Xu, D. (2024). Egomimic: Scaling imitation learning via egocentric video. arXiv preprint arXiv:2410.24221. [Shi+, ICRA‘25] Shi, J., Zhao, Z., Wang, T., Pedroza, I., Luo, A., Wang, J., ... & Jayaraman, D. (2025). ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos, ICRA. [Yang+, Arxiv’25] Yang, R., Yu, Q., Wu, Y., Yan, R., Li, B., Cheng, A. C., ... & Wang, X. (2025). EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos. arXiv preprint arXiv:2507.12440. [Luo+, ArXiv’25] Luo, H., Feng, Y., Zhang, W., Zheng, S., Wang, Y., Yuan, H., ... & Lu, Z. (2025). Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos. arXiv preprint arXiv:2507.15597. [Bahl+, CVPR'23] Bahl, S., Mendonca, R., Chen, L., Jain, U., & Pathak, D. (2023). Affordances from human videos as a versatile representation for robotics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13778-13790). [Hoque+, ArXiv'25] Hoque, R., Huang, P., Yoon, D. J., Sivapurapu, M., & Zhang, J. (2025). EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video. arXiv preprint arXiv:2505.11709. [Singh+, WACV’16] Singh, K. K., Fatahalian, K., & Efros, A. A. (2016, March). Krishnacam: Using a longitudinal, single-person, egocentric dataset for scene understanding tasks. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1-9). IEEE. [Yang+, CVPR'25] Yang, J., Liu, S., Guo, H., Dong, Y., Zhang, X., Zhang, S., ... & Liu, Z. (2025). Egolife: Towards egocentric life assistant. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 28885-28900). [Chatterjee+, ICCV’25] Chatterjee, D., Remelli, E., Song, Y., Tekin, B., Mittal, A., Bhatnagar, B., ... & Sener, F. (2025). Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding. ICCV. 109