Slide 57
Slide 57 text
© Seitaro Shinagawa, NAIST
参考文献
56/59
P.10
[Dosovitskiy+,2021] Dosovitskiy et al. An Image Is Worth 16x16 Words:
Transformers for Image Recognition at Scale. ICLR. 2021.
https://openreview.net/forum?id=YicbFdNTTy
P.15
[Radford+, 2021] Radford et al. Learning Transferable Visual Models From
Natural Language Supervision.” ICML2021.
[Rombach+, 2022] Rombach et al. High-Resolution Image Synthesis with
Latent Diffusion Models, CVPR2022.
P.16
[He+, 2022] He et al. Masked autoencoders are scalable vision learners. CVPR.
2022.
P.28
[Ba+, 2016] Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer Normalization. arXiv.
2016.
P.29
[He+, 2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for
image recognition. CVPR. 2016.
P.36
[Tolstikhin+, 2021] Tolstikhin, I. O. et al. MLP-Mixer: An all-MLP Architecture
for Vision. NeurIPS. 2021.