Slide 79
Slide 79 text
参考文献
[Grauman+, CVPR’22] Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., ... & Malik, J. (2022). Ego4d: Around the world in 3,000 hours of egocentric
video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18995-19012).
[Zhang+. UIST’17] Zhang, X., Sugano, Y., & Bulling, A. (2017, October). Everyday eye contact detection using unsupervised gaze target discovery. In Proceedings of the 30th
annual ACM symposium on user interface software and technology (pp. 193-203).
[Cheng+, NeurIPS’23] Cheng, T., Shan, D., Hassen, A., Higgins, R., & Fouhey, D. (2023). Towards a richer 2d understanding of hands at scale. Advances in Neural Information
Processing Systems, 36, 30453-30465.
[Yagi+, IUI’21] Yagi, T., Nishiyasu, T., Kawasaki, K., Matsuki, M., & Sato, Y. (2021, April). GO-finder: a registration-free wearable system for assisting users in finding lost objects
via hand-held object discovery. In 26th International Conference on Intelligent User Interfaces (pp. 139-149).
[Goyal+, CVPR’22] Goyal, M., Modi, S., Goyal, R., & Gupta, S. (2022). Human hands as probes for interactive object understanding. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (pp. 3293-3303).
[Bansal+, ECCV’22] Bansal, S., Arora, C., & Jawahar, C. V. (2022, October). My view is the best view: Procedure learning from egocentric videos. In European Conference on
Computer Vision (pp. 657-675). Cham: Springer Nature Switzerland.
[Yagi+, ArXiv’24] Yagi, T., Ohashi, M., Huang, Y., Furuta, R., Adachi, S., Mitsuyama, T., & Sato, Y. (2024). FineBio: A Fine-Grained Video Dataset of Biological Experiments with
Hierarchical Annotation. arXiv preprint arXiv:2402.00293.
[Damen+, ECCV’18] Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E., ... & Wray, M. (2018). Scaling egocentric vision: The epic-kitchens dataset. In
Proceedings of the European conference on computer vision (ECCV) (pp. 720-736).
[Damen+, IJCV’22] Damen, D., Doughty, H., Farinella, G. M., Furnari, A., Kazakos, E., Ma, J., ... & Wray, M. (2022). Rescaling egocentric vision: Collection, pipeline and
challenges for epic-kitchens-100. International Journal of Computer Vision, 1-23.
[Grauman+, CVPR’24] Grauman, K., Westbury, A., Torresani, L., Kitani, K., Malik, J., Afouras, T., ... & Wray, M. (2023). Ego-exo4d: Understanding skilled human activity from
first-and third-person perspectives. CVPR 2024.
[Darkhalil+, NeurIPS’22] Darkhalil, A., Shan, D., Zhu, B., Ma, J., Kar, A., Higgins, R., ... & Damen, D. (2022). Epic-kitchens visor benchmark: Video segmentations and object
relations. Advances in Neural Information Processing Systems, 35, 13745-13758.
[Huh+ ICASSP’23] Huh, J., Chalk, J., Kazakos, E., Damen, D., & Zisserman, A. (2023, June). Epic-sounds: A large-scale dataset of actions that sound. In ICASSP 2023-2023 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
[Dunnhofer+, IJCV’23] Dunnhofer, M., Furnari, A., Farinella, G. M., & Micheloni, C. (2023). Visual object tracking in first person vision. International Journal of Computer Vision,
131(1), 259-283.
[Zhao+, CVPR’23] Zhao, Y., Misra, I., Krähenbühl, P., & Girdhar, R. (2023). Learning video representations from large language models. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 6586-6597).
[Zhu+, ICCV’23] Zhu, C., Xiao, F., Alvarado, A., Babaei, Y., Hu, J., El-Mohri, H., ... & Yan, Z. (2023). Egoobjects: A large-scale egocentric dataset for fine-grained object
understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 20110-20120). 79