Image Recognition at Scale • https://arxiv.org/abs/2010.11929 • google-research/vision_transformer • https://github.com/google-research/vision_transformer • emla2805/vision-transformer: Tensorflow implementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale) • https://github.com/emla2805/vision-transformer • lucidrains/vit-pytorch: Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch • https://github.com/lucidrains/vit-pytorch • ը૾ೝࣝͷେֵ໋ɻAIքͰരൃதͷʮVision TransformerʯΛղઆʂ - Qiita • https://qiita.com/omiita/items/0049ade809c4817670d7 • Transformer Ͱը૾ೝࣝΛͬͯΈΔ ~ Vision Transformer ~ | GMOΠϯλʔωοτ ࣍ੈγεςϜݚڀࣨ • https://recruit.gmo.jp/engineer/jisedai/blog/vision_transformer/
End-to-End Object Detection with Transformers • https://arxiv.org/abs/2005.12872 • Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation • https://arxiv.org/abs/2003.07853 • Image Transformer • https://arxiv.org/abs/1802.05751 • VideoBERT: A Joint Model for Video and Language Representation Learning • https://arxiv.org/abs/1904.01766 • Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks • https://arxiv.org/abs/1810.00825