Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vision Transformer / pyml-niigata-20210220-vision-transformer

kasacchiful
February 20, 2021

Vision Transformer / pyml-niigata-20210220-vision-transformer

2021/02/20 Python機械学習勉強会 in 新潟 で発表した資料です。

kasacchiful

February 20, 2021
Tweet

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. Software Developer Favorite: Community: • JAWS-UG Niigata • Python ML

    in Niigata (New!!) • JaSST Niigata • ASTER • SWANII • etc. Hiroshi Kasahara @kasacchiful @kasacchiful 2 New!!
  2. Transformer • AttentionͰߏ੒͞Εͨػց຋༁Ϟσϧ • Attention = Dictionary Object (Query, Key,

    Value) ͱղऍ • QueryΛೖΕΔͱɺࢀর͢΂͖৔ॴ(Key)ΛಘΒΕɺͦͷ৔ॴͷ஋(Value)͕ಘΒΕΔ • KeyͱValue͕ࣄલ஌ࣝʹΑͬͯಘΒΕΔͨΊɺMemoryʹ૬౰͢Δ • Self-Attention: จষ಺ͷ୯ޠؒͷؔ܎ΛͱΒ͑ΔɻQuery/Key/Value͸ಉ͡୯ޠ͔Βੜ੒ɻ • Source Target Attention: 2ͭͷܥྻؒͷରԠؔ܎ΛͱΒ͑ΔɻQuery͸σίʔμଆɺKey/Value͸Τ ϯίʔμଆ͔Βੜ੒ɻ • Vision TransformerͰ͸ɺTransformerͷEncoder෦෼Λվྑͨ͠΋ͷΛ࢖༻͍ͯ͠Δ
  3. Vision TransformerΛ্खʹ࢖͏ʹ͸ʁ • େྔͷσʔλΛͲͷΑ͏ʹ༻ҙ͢Δʁ • େن໛σʔληοτͷࣄલֶशࡁϞσϧ͕ެ։͞Ε͍ͯΔͳΒɺ ͦΕΛ࢖ͬͯϑΝΠϯνϡʔχϯάͯ͠ར༻͢Δ • ࣗલͰ༻ҙ͢Δ &

    ࣗલͰੜ੒͢Δ • ʮࣗݾڭࢣ͋Γֶशʯͷݚڀ͕ਐΜͰ͍ΔͷͰɺڭࢣσʔλ Λʮࣗݾڭࢣ͋Γֶशʯʹ͋Δఔ౓೚ͤΔํ๏΋ߟ͑ΒΕΔ
  4. ը૾෼ྨҎ֎ͷ Transformer ద༻ྫ ֤छλεΫʹTransformerΛద༻ͨ͠΋ͷͷҰྫ • DETR: ෺ମݕ஌ʹTransformerΛར༻ • Axial-Attention: ηάϝϯςʔγϣϯʹTransformerΛར༻

    • Image Transformer: ը૾ੜ੒ʹTransformerΛར༻ • VideoBERT: ಈըཧղʹTransformerΛར༻ • Set Transformer: ΫϥελϦϯάʹTransformerΛར༻
  5. ·ͱΊ • ը૾෼ྨʹTransformerΛద༻ͨ͠ʮVision Transformerʯ͕ొ৔ • ߴੑೳͰɺֶशʹ͔͔ΔܭࢉϦιʔε͸গͳ͘ࡁΉ • ͨͩ͠ɺେྔͷσʔληοτ͕ඞཁ • Vision

    TransformerΛ࢖͏ʹ͸ɺେن໛σʔληοτͰͷࣄલֶशࡁϞσϧΛϑΝΠϯνϡʔ χϯάͯ͠࢖͏͔ɺࣗલͰσʔλ༻ҙͯ͠࢖͏͔ • ࣗલͰ༻ҙ͢Δ৔߹͸ʮࣗݾڭࢣ͋ΓֶशʯΛ࢖ͬͯϥϕϧ෇͚͢Δํ๏΋ݕ౼ͨ͠ํ ͕͍͍͔΋ • VisionҎ֎ʹ΋Transformer͕ద༻͞Ε͖͍ͯͯΔͷͰɺࠓޙͷτϨϯυͱͯ͠ԡ͓͑ͯ͜͞͏
  6. ࢀߟ • An Image is Worth 16x16 Words: Transformers for

    Image Recognition at Scale • https://arxiv.org/abs/2010.11929 • google-research/vision_transformer • https://github.com/google-research/vision_transformer • emla2805/vision-transformer: Tensorflow implementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale) • https://github.com/emla2805/vision-transformer • lucidrains/vit-pytorch: Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch • https://github.com/lucidrains/vit-pytorch • ը૾ೝࣝͷେֵ໋ɻAIքͰ࿩୊രൃதͷʮVision TransformerʯΛղઆʂ - Qiita • https://qiita.com/omiita/items/0049ade809c4817670d7 • Transformer Ͱը૾ೝࣝΛ΍ͬͯΈΔ ~ Vision Transformer ~ | GMOΠϯλʔωοτ ࣍ੈ୅γεςϜݚڀࣨ • https://recruit.gmo.jp/engineer/jisedai/blog/vision_transformer/
  7. ࢀߟ • Attention Is All You Need • https://arxiv.org/abs/1706.03762 •

    End-to-End Object Detection with Transformers • https://arxiv.org/abs/2005.12872 • Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation • https://arxiv.org/abs/2003.07853 • Image Transformer • https://arxiv.org/abs/1802.05751 • VideoBERT: A Joint Model for Video and Language Representation Learning • https://arxiv.org/abs/1904.01766 • Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks • https://arxiv.org/abs/1810.00825