Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transformerの最前線 〜 畳込みニューラルネットワークの先へ 〜

Transformerの最前線 〜 畳込みニューラルネットワークの先へ 〜

2022年6月8日にSSII 2022のチュートリアル講演で用いたスライドです。

2017年に機械翻訳を対象として提案されたTransformerは、従来の畳込みや再帰を排して自己注意機構を活用したニューラルネットワークです。2019年頃からコンピュータビジョン分野でも急速に応用が進んでいて、より柔軟かつ高精度なネットワーク構造としての地位を確立しつつあります。本チュートリアルでは、Transformerおよびその周辺のネットワーク構造について、コンピュータビジョンへの応用を中心とした最前線を概説しました。

Be0f86176276318b4b9775d795278f7e?s=128

Yoshitaka Ushiku
PRO

June 08, 2022
Tweet

More Decks by Yoshitaka Ushiku

Other Decks in Technology

Transcript

  1. Transformerの最前線 〜 畳込みニューラルネットワークの先へ 〜 牛久 祥孝 losnuevetoros オムロン サイニックエックス

  2. Transformerभਈ৐଍ ع ႙੢ाॽগشছঝॿॵॺডشॡभ੔ष ع  ฟ୲ ໋ේقड़঒টথ१ॖॽॵॡग़ॵॡ५ك SSII 2022͹ࡏ͹නࢶ

  3. ঽഞງஂ 2013.6ع2013.8 Microsoft Research Intern 2014.3 ೗૒(ੲਾ৶ੵ৾)ؚূ਎প৾ 2014.4ع2016.3 NTT CSଢ଼

    ଢ଼஢৩ 2016.4ع2018.9 ূ਎প৾ ൥ప (ਉিฟ୲ଢ଼஢஼) 2016.9ع ਓ঵ૼ୒੕়ଢ଼஢ਚ ੈৡଢ଼஢৩ 2016.12ع2018.9 বয়বୁଢ଼஢ਚ ુ৊ଢ଼஢৩ 2018.10ع ड़঒টথ१ॖॽॵॡग़ॵॡ५ઙૄভ঺ Principal Investigator 2019.1ع ઙૄভ঺ Ridge-i ਄ഁ૽ Chief Research Officer 2020.4ع ஸি྆প৾ శଞඐ൥ప 2021.7ع ূਨপ৾ శଞඐ൥ప ਚര੮৬ ACMؚIEEEؚਗ਼৕ੲਾৢਦ৾ভؚੲਾ૪৶৾ভؚ঩মট঎ॵॺ৾ভؚযੵੴચ৾ভؚ ૢ৷੟৶৾ভؚ૦ണੲਾ৾ভؚ঩মॹॕشউছشॽথॢੈভ [Ushiku+, ACMMM 2012] [Ushiku+, ICCV 2015] ઺൸य़কউ३ঙথেਛ ৿઺भ્৒ય৑ध य़কউ३ঙথभৼ൩ਫ਼ด [Yamaguchi+, ICCV 2017] A guy is skiing with no shirt on and yellow snow pants. A yellow train on the tracks near a train station.
  4. ম൥౰पणःथ Transformerभ੦ম৓ऩ৿੿ऊैૢ৷෇೧ؚਈ੺भMLP௺ॿॵॺডشॡ ऽदभᖊᰐ • जुजुTransformerढथء • Transformerᄑ௯ • Transformerमड़ড॥থءآ •

    Transformerभঀक़ঁक़ ବম൥౰भৱમभਈৗংش४ঙথ ق१ঈॱॖॺঝपुൕൗखथउॉऽघك
  5. ౦رୂऌघऍथଐऎীऊैऩःآधःअযभञीप • Transformerमॺشॡথق౐ୁृঃॵॳऩनك॑Residualமਢखऩऋ ै2णभঔ४গشঝ॑೷ॉନखथःॊटऐآ – ॺشॡথ॑೴छॊToken Mixer – ॺشॡথ॑૗ఌघॊMLP •

    ਵ਻पऩढञMLP௺ुৰम੦ম৓प৊गଡୗآ • …ँधम౦رधঀक़ঁक़ऋँॊभदਞ॑હऐऽखॆअ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ঋॡॺঝ॑ ऽछथ૗ఌ ঋॡॺঝ॑ ଻શप૗ఌ ਫ ૠ ৲ ਫ ૠ ৲ ਡ଍৖॑ളਯ৚೷ॉନघ
  6. そもそもTransformerって?

  7. Transformer௺૛ધदेऎৄॊ௕

  8. Transformer௺૛ધदेऎৄॊ௕ ᴍჺऩ௕टख ীऊॉृघऎৄइॊ ऐनীऊॉपऎः ͤ଻যभ૎୳दघ ী ͤ

  9. Transformerधम ભਔभ଻ਯभঋॡॺঝ॑ભਔभ଻ਯभঋॡॺঝप૗ఌघॊ ॿॵॺডشॡ ঳ᅖ૮ଳ

  10. Transformerधम ભਔभ଻ਯभঋॡॺঝ॑ભਔभ଻ਯभঋॡॺঝप૗ఌघॊ ॿॵॺডشॡ ঳ᅖ૮ଳ োৡ؟ ભਔभ଻ਯभঋॡॺঝ লৡ؟ ભਔभ଻ਯभঋॡॺঝ Positional Encoding?

    ?
  11. Positional Encoder োৡঋॡॺঝभਜ਼઼धলৡঋॡॺঝभਜ਼઼॑਀घঋॡॺঝ • ౐ୁभীങ਀ਠऩैَધরभ୦౐ୁ৯ءُ • ઺൸भ્ඉ୤ऩैَ઺൸৔भनऒभౠఏءُ • ৎ௺ഔभ્ඉ୤ऩैَ୦ଧৎਡभੲਾءُ RNNृCNNम૚્ඉ୤भਜ਼઼ੲਾ॑ੴढथःॊ

    • َRNNुCNNु૚ਏಞभৼৌਜ਼઼॑ੴढथःॊُ [Shaw+, NAACL-HLT’18][Dai+, ACL’19] • َCNNम૚ਏಞभബৌਜ਼઼॑৾ಆखथःॊُ[Islam+, ICLR’20] ઺൸भႀ෩ਙ௓৒दؚ઺൸৸৬॑োৡ vs. ঳৖भ઺൸॑োৡମजोझो”઺൸भরఙ”मႀ෩ਙऋপऌः [Vaswani+, NIPS’17]दम౐ୁभਜ਼઼॑ १ॖথ؞॥१ॖথदঋॡॺঝ৲ ମణभঐشॡम१ॖথ؞॥१ॖথभਔ௡
  12. Autoregressive vs. Non-autoregressive লৡभঋॡॺঝ॑… • Autoregressive= 1଻ङण௓৒घॊ – োৡभঋॡॺঝණधলৡੋाभঋॡॺঝණ॑৷ःथ ৗञऩঋॡॺঝ॑লৡ

    – َીॎॉُ॑ਔ௡घॊ౐ୁऋলॊऽद೷ॉନघ – ੪भTransformer [Vaswani+, NIPS’17] मAutoregressive – ಖ২मଐःऋ೚ः • Non-autoregressive=৸৖঳ਞप௓৒घॊ – ୦ैऊभ্১दളਯ଻भলৡঋॡॺঝ৷भ َॱॿُधऩॊঋॡॺঝ॑োৡ[Gu+, ICLR’18][Guo+, AAAI’19] • ధഔद1৚दੑ઴दऌॊभदசःऋؚಖ২ऋ଩ৣ – ঳ਞप௓৒घॊभ॑೷ॉନखथ౐ୁਯ॑৹ତ [Ghazvininejad+, EMNLP’19][Gu+, NeurIPS’19] ঳ᅖ૮ଳ ࢑଻भলৡੋा ঋॡॺঝ ࢑ + ૚଻৯भ লৡঋॡॺঝ ঳ᅖ૮ଳ ࡷ଻भَॱॿُ ঋॡॺঝ ࡷ଻भ লৡঋॡॺঝ
  13. ऌढधऒऒऽदमok ঳ᅖ૮ଳ

  14. ઃम૮ଳखथःञ৖ীभরମ धःअ௕मऒऒभ Multi-Head Attention॑ହ৥ घॊञीभ௕ ऒऒम • Residualமਢ • ૚ঋॡॺঝपMLP॑ి৷

    ٙ1x1 conv धಉ੼
  15. ઃम૮ଳखथःञ৖ীभরମ धःअ௕मऒऒभ Multi-Head Attention॑ହ৥ घॊञीभ௕ धःअ௕मऒऒभ Scaled Dot-Product Attention ॑ହ৥घॊञीभ௕

    ऒऒम • Residualமਢ • ૚ঋॡॺঝपMLP॑ి৷ ٙ1x1 conv धಉ੼
  16. Multi-head Attention उऔैः؟ ඞअभमભਔभ଻ਯभঋॡॺঝ • ঽே੉ୁ૪৶दँोय౐ୁभীങ਀ਠभ௺ഔٔਜ਼઼ੲਾ • ઺൸ੳ௙दँोयॱॸ¼চ॥पॳকॿঝਯীभ ઃ੪भঋॡॺঝقଂਚ્ඉ୤كऋ ధ॒टਯഔٔॱॸ؞চ॥भਜ਼઼ੲਾ

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ Ougai , ⌢ਗऔ॒١ you look like the cat that ate the canary .
  17. Multi-head Attention उऔैः؟ ඞअभमભਔभ଻ਯभঋॡॺঝ • ঽே੉ୁ૪৶दँोय౐ୁभীങ਀ਠभ௺ഔٔਜ਼઼ੲਾ • ઺൸ੳ௙दँोयॱॸ¼চ॥पॳকॿঝਯীभ ઃ੪भঋॡॺঝقଂਚ્ඉ୤كऋ ధ॒टਯഔٔॱॸ؞চ॥भਜ਼઼ੲਾ

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ Ougai , ⌢ਗऔ॒١ you look like the cat that ate the canary .
  18. ऒभঋॡॺঝ पି৯खथઅइॊ 1. ౎भঋॡॺঝَ॑ਫ਼ดُघॊञीभॡग़জࢗ = ܹொ࢞॑ੑ઴ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ = ܹொ࢞ ࢗ
  19. ऒभঋॡॺঝ पି৯खथઅइॊ 2. َథ๚ਫ਼ดُৌ଴धखथभ૚ঋॡॺঝभय़ش࢑ = ܹ௄࢞॑ੑ઴ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢗ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑
  20. ऒभঋॡॺঝ पି৯खथઅइॊ 3. ॡग़জधय़شभ৔஋॑ઃ੪ਯ݀धsoftmaxदਫૠ৲खञَథ๚২ُ ܽ = softmax ࢗୃ࢑/ ݀ ॑ੑ઴

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢗ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ࢑ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ = = = = = = = = = = = = ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞
  21. ऒभঋॡॺঝ पି৯खथઅइॊ 4. ૚ঋॡॺঝभ৻਀க࢜ = ܹ௏࢞॑ੑ઴ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜
  22. ऒभঋॡॺঝ पି৯खथઅइॊ 5. థ๚২ܽद੎ातऐखञਮσܽ࢜॑ੑ઴ମঋॡॺঝ पਸ઴قresidualமਢથك ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ܽ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢜ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ڄ ࢞ ಌৗ৏भঋॡॺঝ
  23. Scaled Dot-Product Attention ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ুದ1~5॑घसथभঋॡॺঝपৌखथৰষघॊ धःअ௕पჾਊ
  24. Multi-head Attentionदम ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ষഔܹொ, ܹ௄, ܹ௏݄॑଻৷ਔखؚजोझो॑৷ःऩऋै ুದ1~5॑घसथभঋॡॺঝपৌखथৰষघॊ धःअ௕पჾਊ
  25. Transformer ऽधी ભਔभ଻ਯभঋॡॺঝ॑ ஄ૄद૗஄घॊૼ୒ Encoder-Decoder N৚೷ॉନघ ୻इयGPT-3दम • 96৚೷ॉନघ •

    ૚৚96-headsभ Attentionق128ઃ ੪ك ମঃছওشॱਯ 175B ͤ2022ফ4াपGoogle ऋਁ৫खञPaLMम540B [Chowdhery+, 2022]
  26. قAttentionभ૮ःكCNNृRNNधभୀः ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ CNN Transformerम… ৸࢔଻भঋॡॺঝऊै৸࢔଻भঋॡॺঝषभ॔ॸথ३ঙথ॑ੑ઴ ܱ(݊ଶ)टऋ঳୞ઁୠपੲਾऋট५૮ऎ஫୸૭ચ CNNम… 3णऩनؚ৸৬ऊैघोय૘ਯभ ঋॡॺঝटऐभ႙੢ाੑ઴॑௒ਪ ܱ ݊ टऋੲਾऋ஫ॎॊभम੺ๆटऐ RNNम… ঋॡॺঝ॑঳णङण௒ਪखऩऋै ৔৖भ७ঝप૗ਯق੶༨ك॑৳ோ ܱ ݊ टऋশः௺ഔमਂ੭ু ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ RNN
  27. Transformer旋風

  28. Computer Visionਰਗ • ঽே੉ୁ૪৶ – ᄽ๨ऊणਉ෩ [Vaswani+, NIPS’17] जभ౎੗ਯآآآ –

    ੉ୁঔॹঝ GoogleभBERT [Devlin+, NAACL-HLT’19], PaLM [Chowdhery+, 2022], DeepMindभGopher [Rae+, 2022], OpenAIभGPT-2/3 [Radford+, 2019][Brown+, NeurIPS’20] ऩन੗ਯ – 2ூॺشॡথऊैऩॊॹشॱঋش५भਫ਼ด [Borgeaud+, 2021] • ఠଢ૪৶؞ਦಀ૪৶ – ਀ਠ৾ಆ HuBERT [Hsu+, TASLP’21], SSAST [Gong+, AAAI’22] – ఠଢੳ௙ [Lüscher+, INTERSPEECH’19] – ఠ௫েਛ [Huang+, ICLR’19][Choi+, ICML’20] – ৎ௺ഔ੒೾ [Li+, NeurIPS’19][Wu+, NeurIPS’20] • ॸشঈঝॹشॱ – FT-Transformer [Gorishniy+, NeurIPS’22] ͤञटख਀ॹشॱमൂேधखथGradient Boostingऋਘः • Bio/Chem-informatics – ী৕ଡୗੰෲ [Fuchs+, NeurIPS’20][Rong+, NeurIPS’20] • ग़ش४ख़থॺ؞ট঎ॸॕॡ५ – ঐঝॳग़ش४ख़থॺৢਦ [Inala+, NeurIPS’20] – One Shotदெኼ৾ಆ [Dasari+Guputa, CoRL’20] – ॱ५ॡ௺ഔभਘ৲৾ಆ Scene Memory Transformer [Fang+, CVPR’19], Decision Transformer [Chen+, NeurIPS‘21], Trajectory Transformer [Janner+, NeurIPS‘21], Gato [Reed+, 2022]
  29. Vision & Language ਀ਠ৾ಆମ੦ೕঔॹঝ VideoBERT [Sun+, ICCV’19] LXMERT [Tan+Bansal, EMNLP’19]

    ViLBERT [Lu+, NeurIPS’19] VL-BERT [Su+, ICLR’20] UNITER [Chen+, ECCV’20] OSCAR [Li+, ECCV’20] Voken [Tan+Bansal, EMNLP’20] COOT [Ging+, NeurIPS’20] Perceiver [Jaegle+, ICML’21] PolyViT [Likhosherstov+, 2021] Flamingo [Alayrac+, 2022] य़কউ३ঙথেਛ [Zhou+, CVPR’18][Li+, ICCV’19][Cornia+, CVPR’20] TextVQA [Kant+, ECCV’20] ৿઺ਫ਼ด [Gabeur+, ECCV’20] [Tan+Bansal, EMNLP’20] [Cornia+, CVPR’20] He starts his motorbike Then walks away. ૞ස਀ਠ৶ੰ MDETR [Kamath+, ICCV’21]
  30. ௅િ௓৒௺ ুਵभੳ௙ [Saunders+, ECCV’20] Proposed ুभ௅િ [Huang+, ECCV’20]

  31. ୩ୠীસ௺ Proposed Panopitc Segmentation [Kirillov+, ECCV’20] Semantic Segmentation पु Instance

    Segmentation पु [Zhang+, ECCV’20]
  32. ି؟xxxx Segmentationऋञऎऔ॒ँॊءءء [Kirillov+, CVPR 2019]

  33. ৿઺൸৶ੰ ৿੿ੳ௙ [Girdhar+, ECCV’20] ৽ଡ଼੒೾ [Yu+, ECCV’20]

  34. जभ౎भ୻ ५ॣॵॳदਫ਼ด [Ribeiro+, CVPR’20] ્৒੟৬ਫ਼ด Fine-grained௙શ [Kim+, ECCV’20] ઺൸েਛঔॹঝ தੰ൸ٔଓ౥

    [Parmar+, ICML’18] தੰ൸ [Yang+, CVPR’20]
  35. ੟৬ਫ਼লقDETRك ੟৬ਫ਼লقంكधPanoptic Segmentationقకك[Carion+, ECCV’20] ੟৬ਫ਼লम[Chi+, NeurIPS’20]ु઀੧

  36. Vision Transformer [Dosovitskiy+, ICLR’21] • ઺൸ऊैTransformerभाद৾ಆघॊध – ResNet152ಽधऺऻ৊ಉभಖ২ऊण25%ங২भ৾ಆৎ৑ – JFT-300Mधःअপૠெॹشॱ७ॵॺऋ૑ਏ

    – ੴ௙Ⴊ೏॑ੌा়ॎचथؚImageNetभ1000ॡছ५઺൸ॹشॱभाभ৾ಆदु EfficientNet॑தइञDeiTुથ੡ [Touvron, ICML’21] • ग़থ॥شॲभाभTransformer – ৰमड़জ४ॼঝभTransformerेॉुଡୗऋ౐ෞद৶ੰुઍಔ
  37. MobileViT • রఙभঈটॵॡदम… – ૚ঃॵॳ॑ంక্਱पன৫ (Unfold) – ອप५ছॖ५खञঋॡॺঝૐ়൐पTransformer॑ి৷ • ੉ःఌइॊधؚ݀ઃ੪ঋॡॺঝऋܰ଻ँॊૐ়঱दभTransformer॑P৚ి৷

    – ुअ঳২૚ঃॵॳ॑ਫ্஄पರघ (Fold) [Mehta+Rastegari, ICLR’22] ݄ ݓ ݀ ܲ = ݓ݄ ݀ Unfold MV2: MobileNet v2 block ଯ2: down sampling
  38. MobileViT • ImageNetभ1000ॡছ५઺൸ॹشॱटऐद৾ಆ[Mehta+Rastegari, ICLR’22] ౎भMobile௺ॿॵॺডشॡ ेॉৈಖ২ ౎भViT௺ॿॵॺডشॡ ेॉ૘ঃছওشॱ&ৈಖ২

  39. Vision Transformerद੟৬ਫ਼লृ७ॢওথॸش३ঙথ॑ষअपम [Dosovitskiy+, ICLR’21] • ViTभेअऩ႗ःঃॵॳटऐटध…಍ऊःংक़থ ॹॕথॢ঎ॵॡ५भਜ਼઼ৠीृ७ॢওথॸش ३ঙথभಖ২ऋ૳ठॊ • ঳্द಍ऊःঃॵॳ॑੗ऎ੿ढथखऽअधؚ॔

    ॸথ३ঙথभੑ઴दৎ৑ऋऊऊॊقঃॵॳਯभ2 ଭड़شॲشك
  40. Swin Transformer • CNNदउऩगाभআছ঑ॵॻଡୗ॑੅ठ੢ि – ॔ॸথ३ঙথੑ઴॑૚টشढ़ঝक़ॕথॻक़प଒৒खथੑ઴୤చ੖ – ऒोैभঞॖখشध૘खયજॉ॑ङैखञঞॖখش॑ઐ൩पგि ମটشढ़ঝक़ॕথॻक़॑தइञਭઍ৙॑୸ਛ –

    ੟৬ਫ਼লद௬੼ق७ॢওথॸش३ঙথमSwin-Unet [Cao+, 2021]ك [Liu+, ICCV’21] Best paper!
  41. Vision Transformer ध CNN भଡୗૻຎ • ViTधCNNभ৔৖਀ਠଡୗ – ViTमेॉ಑঳ऩ਀ਠ॑੅ठؚৣਜ਼ಽध঱ਜ਼ಽभథ๚ਙऋৈः –

    ૏ঔॹঝपႀ෩ऩ୷౮ऋँॊ • ଂਚ৓؞পୠ৓ऩੲਾभਹ৷্১ – ViTमResNetेॉुপୠ৓ऩੲਾ॑଩ঞॖখشप਄ॉ੢॒दःॊ – ञटखৣਜ਼ಽपଂਚ৓ऩੲਾ॑਄ॉ੢िऒधमൂேधखथ੎ਏदؚ পૠெऩহ৐৾ಆॹشॱऋ૑ਏ • ViTभॿॵॺডشॡଡୗ – ViTभ५य़ॵউமਢमResNetsेॉुऔैप୶஭ৡऋਘऎؚਙચध ਀ਠभథ๚ਙपਘः୶஭॑ଖइॊ • ViTभਜ਼઼ੲਾ – ViTभ্ऋ઺൸৔दभਜ਼઼ੲਾऋ৳੅औोथःॊ [Raghu+, NeurIPS’21]
  42. Transformerはオワコン?!

  43. MLP-Mixer [Tolstikhin+, NeurIPS’21]

  44. gMLP [Liu+, NeurIPS’21]

  45. gMLPपेॊ઺൸ীథಖ২ૻຎदम ऺध॒नघसथभ CNN઺൸ীథ ेॉु Transformerभ ઺൸ীథ ेॉु ق౎भMLP௺ेॉुك gMLPभ্ऋ ಖ২ऋଐः/

    ঃছওشॱऋ૘ऩः
  46. दলथऎॊ૎୳ऋ Transformerढथड़ড॥থदमآء

  47. 結論から言うと TransformerとMLP-MixerとgMLPは 本質的にやっていることは余り違いません 単にAttention is All You Needではなかっただけです

  48. Transformer [Vaswani+, NIPS’17]

  49. Vision Transformer [Dosovitskiy+, ICLR’21]

  50. ਏघॊप ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ঋॡॺঝ॑ ऽछथ૗ఌ ঋॡॺঝ॑ ଻શप૗ఌ ਫ ૠ ৲ ਫ ૠ ৲ ਡ଍৖॑ളਯ৚೷ॉନघ
  51. MLP-Mixer [Tolstikhin+, NeurIPS’21]

  52. ਏघॊप ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ঋॡॺঝ॑ ऽछथ૗ఌ ঋॡॺঝ॑ ଻શप૗ఌ ਫ ૠ ৲ ਫ ૠ ৲ ਡ଍৖॑ളਯ৚೷ॉନघ
  53. gMLP [Liu+, NeurIPS’21]

  54. ਏघॊप ঋॡॺঝ॑ ऽछथ૗ఌ ঋॡॺঝ॑ ଻શप૗ఌ ࢞ ࢞ ࢞ ࢞ ࢞

    ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ਫ ૠ ৲ ਫ ૠ ৲ ਡ଍৖॑ളਯ৚೷ॉନघ ঋॡॺঝ॑ ଻શप૗ఌ
  55. ৊गधऒौ vs. ૗ॎढञधऒौ • ा॒ऩ৊गधऒौ – ঋॡॺঝभૐ়॑૗ఌघॊঔ४গشঝ॑೷ॉନखి৷घॊ – ૗ఌमঋॡॺঝ॑ऽछथ૗ఌघॊऊؚঋॡॺঝ॑଻શप૗ఌघॊ ऊभ2ৢॉ

    – ঋॡॺঝ॑଻શप૗ఌघॊ্১मMLP – ෙ୷଎ଷृೌ৅॑ଆएञीभਫૠ৲قLayer Normalizationك – ৊ग৯৓द଑োऔोथःॊSkip Connection • Transformerऊै૗ॎढञधऒौ – ঋॡॺঝ॑ऽछथ૗ఌघॊ্১ऋAttentionऊैষഔ஋पऩढञ – ঋॡॺঝभਜ਼઼ੲਾ॑ঋॡॺঝঽମऋ৳੅قTransformerك ମঋॡॺঝभॖথॹॵॡ५पಡचथॿॵॺডشॡऋ৳੅قMLP௺ك
  56. ৊गधऒौ vs. ૗ॎढञधऒौ • ा॒ऩ৊गधऒौ – ঋॡॺঝभૐ়॑૗ఌघॊঔ४গشঝ॑೷ॉନखి৷घॊ – ૗ఌमঋॡॺঝ॑ऽछथ૗ఌघॊऊؚঋॡॺঝ॑଻શप૗ఌघॊ ऊभ2ৢॉ

    – ঋॡॺঝ॑଻શप૗ఌघॊ্১मMLP – ෙ୷଎ଷृೌ৅॑ଆएञीभਫૠ৲قLayer Normalizationك – ৊ग৯৓द଑োऔोथःॊSkip Connection • Transformerऊै૗ॎढञधऒौ – ঋॡॺঝ॑ऽछथ૗ఌघॊ্১ऋAttentionऊैষഔ஋पऩढञ – ঋॡॺঝभਜ਼઼ੲਾ॑ঋॡॺঝঽମऋ৳੅قTransformerك ମঋॡॺঝभॖথॹॵॡ५पಡचथॿॵॺডشॡऋ৳੅قMLP௺ك ৊गधऒौ vs. ૗ॎढञधऒौ • ा॒ऩ৊गधऒौ – ঋॡॺঝभૐ়॑૗ఌघॊঔ४গشঝ॑೷ॉନखి৷घॊ – ૗ఌमঋॡॺঝ॑ऽछथ૗ఌघॊऊؚঋॡॺঝ॑଻શप૗ఌघॊ ऊभ2ৢॉ – ঋॡॺঝ॑଻શप૗ఌघॊ্১मMLP – ෙ୷଎ଷृೌ৅॑ଆएञीभਫૠ৲قLayer Normalizationك – ৊ग৯৓द଑োऔोथःॊSkip Connection • Transformerऊै૗ॎढञधऒौ – ঋॡॺঝ॑ऽछथ૗ఌघॊ্১ऋAttentionऊैষഔ஋पऩढञ – ঋॡॺঝभਜ਼઼ੲਾ॑ঋॡॺঝঽମऋ৳੅قTransformerك ମঋॡॺঝभॖথॹॵॡ५पಡचथॿॵॺডشॡऋ৳੅قMLP௺ك – ঋॡॺঝ॑ऽछथ૗ఌघॊ্১ऋAttentionऊैষഔ஋पऩढञ gMLPभ૛ધ৔भਾઔ َॿॵॺডشॡ৔मऺध॒नMLPटऐनؚ ठॆढधटऐAttentionোोञু১قaMLPكम औैपৈਙચُ
  57. MetaFormer [Yu+, CVPR’22] ऩ॒थऒध॑2021ফႃऊै੉ढथःञै… • MLP-like modelधTransformerभୀःम Token Mixerभ৖ীटऐ ମऽऔपَঋॡॺঝ॑ऽछथ૗ఌُ

    • ೴छॊभPoolingदुଐऎऩःء • ImageNetभ1000ॡছ५ॹشॱभ৾ಆद ViT௺ृMLP௺भু১ेॉुৈಖ২टे
  58. HyperMixer [Mai+, 2022] ऩ॒थऒध॑2021ফႃऊै੉ढथःञै… • MLP-MixerधTransformerभୀःम Token Mixingभ৖ীटऐ ମऽऔपَঋॡॺঝ॑ऽछथ૗ఌُ •

    MLP-Mixerधୀढथਜ਼઼ਂ૗ऩToken Mixing॑ষअHyperMixer॑੿ढञे • ঽே੉ୁ૪৶भ૚ॱ५ॡदଐ஀ऩಖ২ ॑ৰਠखञे
  59. もう一度結論を言うと TransformerとMLP-MixerとgMLPは 本質的にやっていることは余り違いません 単にAttention is All You Needではなかっただけです

  60. Transformerのノウハウ

  61. Transformerभ੦মਙચ਱঱॑ੱऋऐञཾ඄ • ELU [Clevert+, ICLR 2016] • GeLU [Hendrycks+Gimpel, 2016]

    • Swish [Ramachandran+, ICLR WS 2018] • SELU [Klambauer+, NIPS 2017] • GLU [Dauphin+, ICML 2017] • RMS [Zhang+Sennrich, NeurIPS 2019] • ReZero [Bachlechner+, 2020] • Fixup [Zhang+, ICLR 2019] • Adaptive Softmax [Joulin+, ICML 2017] • Mixture of Softmaxes [Yang+, ICLR 2018]
  62. …ऋ૮ྤपਢऐैोथःॊ੺ফءآ • Transparent Attention [Bapna+, EMNLP 2018] • Evolved Transformer

    [So+, ICML 2019] • Synthesizer variants [Tay+, 2020] • Funnel Transformer [Dai+, NeurIPS 2020] • Lightweight and Dynamic convolution [Wu+, ICLR 2019] • Mixture of Experts Transformer [Shazeer+, NeurIPS 2018][Lepikhin+, ICLR 2021] • Switch Transformer [Fedus+, 2021] • Product Key Memory [Lample+, NeurIPS 2019] • Universal Transformer [Dehghani+, ICLR 2019]
  63. ॶॵ॥঑੪पणःथ • Google Researchभরभয16যद • 30॑தइॊTransformerभ੝ఒু১॑ • 50॑தइॊংজग़ش३ঙথध • 9भ௴ਡद௬੼खञ੥ટ

    প઄भু১ऋ੪भTransformerधপ୷ऩऊढञेध੉अਵ
  64. ੝ఒ্১भীథ 1. ણਙ৲ঢ়ਯ 2. ਫૠ৲ 3. ஥औ 4. ඇी੢ा 5.

    ঃছওشॱુથ 6. Softmaxभ੝ଐ 7. ৸৬भ॔شय़ॸॡॳক ͤ ૑ङखुTransformerभ੝ఒ॑ਔ௕खथ৅਀औोञু১टऐ दम૮ःभदିਔ
  65. 1. ણਙ৲ঢ়ਯभ੝ఒभഄఴ • ELU [Clevert+, ICLR 2016] – ଀भ৖ীटऐ੐ਯঢ়ਯद-1पሰ੺खؚ੦ম৓पෟैऊऩ஄॑धॊ؛ –

    ଺ਬ৷ਯ3000தآ • GeLU [Hendrycks+Gimpel, 2016] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ – BERTपुGPT-3पुઞॎोथःॊऐनICLRऊैमজ४ख़ॡॺऔोथःॊ؛ཱུ୑ो؛ • Swish [Ramachandran+, ICLR WS 2018] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ँो੉ढथःॊऒधऋGeLUध૗ॎैऩः؛ • SELU [Klambauer+, NIPS 2017] – Self-Normalizing Neural Networks॑઀੧घॊ૛ધभ঳৖؛ELU॑ਫभ৖ীदु଀भ৖ ীदु৒ਯ೅खञुभ؛ – SELUঽ৬भ௬੼मষॎोथःऩः؛ਪഭ঻ुॶॵ॥॒दःॊभप୦௛ৢढञ؛ • GLU [Dauphin+, ICML 2017] – LSTMभGate৖ী॑੅ढथऌञણਙ৲ঢ়ਯ؛ – ଍஄૗ఌٔણਙ৲ঢ়ਯق३ॢঔॖॻك॑ৢखञ0~1भக॑ुणGateधؚશಥੑ઴खञ ଍஄૗ఌभਏಞ஋؛
  66. 1. ણਙ৲ঢ়ਯभ੝ఒभഄఴ • ELU [Clevert+, ICLR 2016] – ଀भ৖ীटऐ੐ਯঢ়ਯद-1पሰ੺खؚ੦ম৓पෟैऊऩ஄॑धॊ؛ –

    ଺ਬ৷ਯ3000தآ • GeLU [Hendrycks+Gimpel, 2016] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ – BERTपुGPT-3पुઞॎोथःॊऐनICLRऊैमজ४ख़ॡॺऔोथःॊ؛ཱུ୑ो؛ • Swish [Ramachandran+, ICLR WS 2018] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ँो੉ढथःॊऒधऋGeLUध૗ॎैऩः؛ • SELU [Klambauer+, NIPS 2017] – Self-Normalizing Neural Networks॑઀੧घॊ૛ધभ঳৖؛ELU॑ਫभ৖ীदु଀भ৖ ীदु৒ਯ೅खञुभ؛ – SELUঽ৬भ௬੼मষॎोथःऩः؛ਪഭ঻ुॶॵ॥॒दःॊभप୦௛ৢढञ؛ • GLU [Dauphin+, ICML 2017] – LSTMभGate৖ী॑੅ढथऌञણਙ৲ঢ়ਯ؛ – ଍஄૗ఌٔણਙ৲ঢ়ਯق३ॢঔॖॻك॑ৢखञ0~1भக॑ुणGateधؚશಥੑ઴खञ ଍஄૗ఌभਏಞ஋؛ 1. ણਙ৲ঢ়ਯभ੝ఒभഄఴ • ELU [Clevert+, ICLR 2016 – ଀भ৖ীटऐ੐ਯঢ়ਯद-1 ऊऩ஄॑धॊ؛ – ଺ਬ৷ਯ3000தآ • GeLU [Hendrycks+Gimpe – ReLUप๚ञ஄टऐन଑ঢ়ਯ – BERTपुGPT-3पुઞॎो ४ख़ॡॺऔोथःॊ؛ཱུ୑ो؛ • Swish [Ramachandran+, ICLR WS 2018] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ँो੉ढथःॊऒधऋGeLUध૗ॎैऩः؛ • SELU 017] – Sel works॑઀੧घॊ૛ધभ ु଀भ৖ ীद – SEL ःऩः؛ਪഭ঻ुॶॵ॥॒दःॊभप୦௛ৢढञ؛ • GLU [Dauphin+, ICML 2017] – LST ञણਙ৲ঢ়ਯ؛ – ଍஄૗ఌٔણਙ৲ঢ়ਯق३ॢঔॖॻك॑ৢखञ0~ க धؚશಥੑ઴खञ ଍஄૗ఌभਏಞ஋؛ 1प प प प प प प प प प प प प प प प प प प प प प पሰ੺खؚ੦ম৓पෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟै ै ै ैऊ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯ ਯऋ ऋ ऋ ऋ ऋ ऋ ऋ ऋ ऋ ऋ ऋ ऋෟैऊ؛ ोथ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थ थः ः ःॊ ॊ ॊऐ ऐ ऐन न नI I IC C C C C C C C C C C C C C C C C C C C C C CL L L L L L L L L L L L L L L L L L L L L L LRऊैमজ४ lf-N N Normalizing Neural l l l l l l l l l l N N N N N N N N N Netw w द द द द द द द द द द द द द द द द द दु ु ु ु ु৒ਯ೅खञुभ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ؛ g g LU U U U U U U U U U U U U U U U U U U U Uঽ৬भ௬੼मষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষ ষॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎ ॎोथः ः ः TM M M M M M M M M MभGat t te e e e e e e e e e e e e৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী॑੅ढथऌञ ञ ஄૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ ૗ఌ ఌ ఌٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔ ٔણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણ ણਙ ਙ ਙ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ ৲ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ় ঢ়ਯ ਯ ਯ ਯ ਯ ਯ ਯق ق ق३ ३ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ ॢ भ भ भ भ भ भ भ भ भ भ भ঳৖؛ELU॑ਫभ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ ৖ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ী ীद द द द दु ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥ ॥॒दःॊभप୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦ ୦௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ ௛ৢढञ ञ ञ ञ ञ 1भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भ भக க க॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ुणG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G Gate e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e eध ध धؚશ શ શಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥ ಥੑ ਏघॊप… ReLU ELU/SELU GeLU/Swish ଀भ৖ীऋৣऋॊ প৬ReLUदෟैऊ
  67. 1. ણਙ৲ঢ়ਯभ੝ఒभഄఴ • ELU [Clevert+, ICLR 2016] – ଀भ৖ীटऐ੐ਯঢ়ਯद-1पሰ੺खؚ੦ম৓पෟैऊऩ஄॑धॊ؛ –

    ଺ਬ৷ਯ3000தآ • GeLU [Hendrycks+Gimpel, 2016] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ – BERTपुGPT-3पुઞॎोथःॊऐनICLRऊैमজ४ख़ॡॺऔोथःॊ؛ཱུ୑ो؛ • Swish [Ramachandran+, ICLR WS 2018] – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ँो੉ढथःॊऒधऋGeLUध૗ॎैऩः؛ • SELU [Klambauer+, NIPS 2017] – Self-Normalizing Neural Networks॑઀੧घॊ૛ધभ঳৖؛ELU॑ਫभ৖ীदु଀भ৖ ীदु৒ਯ೅खञुभ؛ – SELUঽ৬भ௬੼मষॎोथःऩः؛ਪഭ঻ुॶॵ॥॒दःॊभप୦௛ৢढञ؛ • GLU [Dauphin+, ICML 2017] – LSTMभGate৖ী॑੅ढथऌञણਙ৲ঢ়ਯ؛ – ଍஄૗ఌٔણਙ৲ঢ়ਯق३ॢঔॖॻك॑ৢखञ0~1भக॑ुणGateधؚશಥੑ઴खञ ଍஄૗ఌभਏಞ஋؛ 1. ણਙ৲ঢ়ਯभ੝ఒभഄఴ • ELU [Clevert+, ICLR 2016] – ଀भ৖ীटऐ੐ਯঢ়ਯद-1पሰ੺खؚ੦ম৓पෟैऊऩ஄॑धॊ؛ – ଺ਬ৷ਯ3000தآ • GeLU [Hendrycks+Gimpel – ReLUप๚ञ஄टऐन଑ঢ়ਯऋ – BERTपुGPT-3पुઞॎोथ ४ख़ॡॺऔोथःॊ؛ཱུ୑ो؛ • Swish [Ramachandran+, IC – ReLUप๚ञ஄टऐन଑ঢ়ਯऋෟैऊ؛ँो੉ढथःॊऒधऋGeLUध૗ॎैऩः؛ • SELU [Klambauer+, NIPS 2 – Self-Normalizing Neural Net भ঳৖؛ELU॑ਫभ৖ীदु଀भ৖ ীदु৒ਯ೅खञुभ؛ g g – SELUঽ৬भ௬੼मষॎोथःऩः؛ਪഭ঻ुॶॵ॥॒दःॊभप୦௛ৢढञ؛ • GLU [Dauphin+, ICML 2017] – LSTMभGate৖ী॑੅ढथऌञણਙ৲ঢ়ਯ؛ – ଍஄૗ఌٔણਙ৲ঢ়ਯق३ॢঔॖॻك॑ৢखञ0~1भக॑ुणGateधؚશಥੑ઴खञ ଍஄૗ఌभਏಞ஋؛ जखथवढठूऐ… पሰ੺खؚ੦ম৓पෟै ऋෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟ ෟै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ैऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ؛ ؛ ؛ ؛ ؛ ؛ ؛ ؛ ؛ थःॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐ ऐन न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न न नICL L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L LR R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Rऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊ ऊै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ैमজ४ ऋෟै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ै ैऊ ऊ ऊ ऊ ऊ؛ ؛ ؛ ؛ ؛ ؛ ؛ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँ ँो ो ो ो ो ो ो ो ो ो ो ो ो ो੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ ੉ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढ ढथः tworks s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑ ॑઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀ ઀੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧ ੧घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घ घॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ ॊ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ ૛ધभ ःऩः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः ः؛ ؛ ؛ ؛ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪ ਪഭ ഭ ഭ ഭ ഭ ഭ ഭ ഭ ഭ ഭ ഭ ഭ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ ঻ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ु ुॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶ ॶॵ॥ ಖ২পखथ঱ऋॉऽच॒
  68. ऒऒद੔प؟ৰୡ໪ਏ • ૡ୎৾ಆୖ਻؟Text-to-Text Transfer Transformer (T5) – ॸय़५ॺ॑োोथॸय़५ॺ॑লघളਯभୖ਻भञीभহ৐৾ಆ – 6.1TBभॸय़५ॺ॑ইॕঝॱজথॢखथ৺750GBपखञColossal

    Clean Crawled Corpus (C4)॑ਹ৷ – ম૛ધमઃभ3णभୖ਻఼॑৷ • QAृ௓૛ऩनभള়ॱ५ॡ SuperGLUE [Wang+, NeurIPS 2019] • ધभਏ৺ XSum [Narayan+, EMNLP 2018] • ସਖૢ௦ WebQuestions [Berant+, EMNLP 2013] • ਃ༊ᄽ๨ୖ਻؟WMT’14भஶஆᄽ๨ॱ५ॡ
  69. ऒऒद੔प؟ৰୡ໪ਏ उऒधॎॉ ম૛ધभৰୡमNLPୖ਻दघ

  70. ણਙ৲ঢ়ਯपऽणॎॊৰୡ੥ટ • ঃছওشॱਯधੑ઴୤ऋ⑫अ஘प৹ତ – Final loss = ધ১ঔॹঝਙચؚऒऒटऐ lower is

    better – SGLUE = ള়ୖ਻ – XSum = ਏ৺ୖ਻ – WebQ = ସਖૢ௦ – WMT EnDe = ਃ༊ᄽ๨ • ੥ટ؟ુৢखथਙચऋ਱঱खञু১ऋ૮ः – ਙચ਱঱ٙ୬ஊध੉ढथःॊऐनखयखय৑ୀढथःॊभदିਔ
  71. ಖ২ऋଐऎऩढञણਙ৲ঢ়ਯुँॊ • GLUभ৅ன஄ [Shazeer, 2020] – GLUम଍஄૗ఌٔણਙ৲ঢ়ਯपेॊ।شॺध଍஄૗ఌभਏಞ஋ – GLUঽ৬मણਙ৲ঢ়ਯ॑३ॢঔॖॻधखथःञ •

    ਰৣभংজग़ش३ঙথ॑૥खथाञ – GeGLU؟ણਙ৲ঢ়ਯऋGeLUقReLUाञःऩ஄भෟैऊऩृणك – ReGLU؟ણਙ৲ঢ়ਯऋReLU – SwiGLU؟ણਙ৲ঢ়ਯऋSwishقReLUाञःऩ஄भෟैऊऩᄲك • ঃছওشॱਯ5400੏भதೋপ੉ୁঔॹঝPaLMदुਹ৷ [Chowdhery+, 2022] – LiGLU؟ણਙ৲ঢ়ਯऩखق൨଍஄஄ૄك – ँोؚELUृSELUधुੌ়चथाऩःभ…ء
  72. GLUभংজग़ش३ঙথभ௬੼ • ௬੼্১म৐஽ৢॉ – Final loss = ધ১ঔॹঝਙચؚऒऒटऐ lower is

    better – SGLUE = ള়ୖ਻ – XSum = ਏ৺ୖ਻ – WebQ = ସਖૢ௦ – WMT EnDe = ਃ༊ᄽ๨ • GLUधLiGLUम૘खৣऋॊ੥ટुँॊऋ… ౎भংজग़ش३ঙথम঳ฮखथ஍ટ॔জ
  73. 2. ਫૠ৲भ੝ఒभഄఴ • Vanilla Transformer؟LayerNorm [NIPS DLS 2016] – ঋॡॺঝभਏಞओधभ਴಑धীങदਫૠ৲؛

    • RMS [Zhang+Sennrich, NeurIPS 2019] – LayerNormऋ೚ःभद਴಑௷ऌदਫૠ৲؛ • ReZero [Bachlechner+, 2020] – ॱॖॺঝभলटखऋReZero is All You Need… লञेٳٳ is All You Need௺૛ધ؛ – ੪भTransformerقంડكभ૗ఌ৖ীप৾ಆ૭ચ ঃছওشॱįقੂ਋க८টك॑ෳऐॊقకડك؛ • Fixup [Zhang+, ICLR 2019] – ਫૠ৲॑঳જचङपؚResidualঈটॵॡभੂ਋க॑0धऊ1पघॊट ऐदुBatchNormृLayerNormध੺ःಖ২॑୸ਛ؛
  74. ਫૠ৲पऽणॎॊৰୡ੥ટ • লৃ৭ু – RMS NormَLayerNormभੑ઴ଫऎघॊछُ – ReZeroَResidualঈটॵॡभ૗ఌ৖ী॑įقੂ਋க0ك೅घॊछُ – Fixupَੂ਋கठू॒धઅइॊधਫૠ৲ਂਏटछُ

    ஍ટभँढञু১मनोदखॆअऊء
  75. ਫૠ৲पऽणॎॊৰୡ੥ટ • লৃ৭ু – RMS NormَLayerNormभੑ઴ଫऎघॊछُ – ReZeroَResidualঈটॵॡभ૗ఌ৖ী॑įقੂ਋க0ك೅घॊछُ – Fixupَੂ਋கठू॒धઅइॊधਫૠ৲ਂਏटछُ

    ੥ટ৅਀ • RMS Normभा஍ટ॔জ – धःअऊReZeroऋপ૗ऩ૾ய
  76. 3. ஥औपणःथभਫ਼઒ • క௕भFeed Forward৖ী – ଍஄૗ఌजभڭମReLUମ଍஄૗ఌजभڮ – ঃছওشॱਯभॺঞشॻड़ই॑৹सञः •

    ৸৬भಽਯقక௕पउऐॊNك • ઌ॒রभ৖ীभઃ੪ਯ݀୤୤ • Multi-Head Attentionभঊॵॻਯܪ • ৹सञ੥ટ – Vanillaम 12 layers, ݀୤୤ = 3072, ܪ = 12 – ಽਯऋ஥ः্ऋಖ২ଐऔऑटऋؚ1ଧਊञॉभ५ॸॵউੑ઴ऋ೚ः؛
  77. 4. ඇी੢ा্১पणःथभਫ਼઒ • InputधOutputमୁ៲¼ඇी੢ाઃ੪भঃছওشॱ – NLPटधঃছওشॱਯप୶஭ऋপऌः • ষഔীੰقALBERT [Lan+, ICLR

    2020] ेॉك – ୁ៲¼ඇी੢ाઃ੪ ମୁ៲¼৔৖ઃ੪ध৔৖ઃ੪¼ඇी੢ाઃ੪ • ग़থ॥شॲभোলৡदभඇी੢ा [Chung+, ICLR 2021] – ુથقTiedكऊశુથقUntiedكऊ • ᄄ২पेॊඇी੢ाઃ੪भ৹ତ [Baevski+Auli, ICLR 2019] – ଩ᄄ২ऩ౐ୁम଩ઃ੪भঋॡॺঝपඇी੢ि • ৰୡ੥ટ؟ॹ॥شॲभোলৡ॑ુથقग़থ॥شॲधमశુથكघॊधس
  78. 5. ঃছওشॱુથ্১पणःथभਫ਼઒ • ALBERT [Lan+, ICLR 2020] ेॉ – ૚ಽभঃছওشॱ॑৸৖ુથघॊ

    – ੔ऺनभඇी੢ाभীੰधુથु૥घ – ग़থ॥شॲटऐ/ॹ॥شॲटऐदુથ • ৰୡ੥ટ؟প৬ॲও – ञटखALBERTदमધभದ୞॑ਊथॊ௤ଷुোोथःॊऋؚऒभ૛ ધदमৌ଴धखथःऩःभदALBERTঽ৬धभૻຎदमऩः
  79. 6. Softmax • Adaptive Softmax [Joulin+, ICML 2017] – ౐ୁभᄄ২पૢगथୁ៲॑ॡছ५ॱজথॢମమಽ৓௙શदৈச৲

    – ଩ᄄ২ୁमऔैपೝ୶खथೄ୤৲ٔৈச৲ • Mixture of Softmaxes [Yang+, ICLR 2018] – Softmax॑ܭৢॉੑ઴खथ੎ातऐਮपेॊহ৏ન૨ੑ઴ • ৰୡ੥ટ؟ – Mixture of Softmaxesमؚਙચऋଐऎऩढञॱ५ॡुँॊऐन ੑ઴ச২ऋ40%଩ৣ
  80. 7. ৸৬भ॔شय़ॸॡॳকभ੝ఒभഄఴ • Transparent Attention [Bapna+, EMNLP 2018] • Evolved

    Transformer [So+, ICML 2019] • Synthesizer variants [Tay+, 2020] • Funnel Transformer [Dai+, NeurIPS 2020] • Lightweight and Dynamic convolution [Wu+, ICLR 2019] • Mixture of Experts Transformer [Shazeer+, NeurIPS 2018][Lepikhin+, ICLR 2021] • Switch Transformer [Fedus+, 2021] • Product Key Memory [Lample+, NeurIPS 2019] • Universal Transformer [Dehghani+, ICLR 2019]
  81. ৰୡ੥ટ ःेःे੗ऎथ๨ऋীऊै॒

  82. 7. ৸৬भ॔شय़ॸॡॳকभ੝ఒभഄఴ • Transparent Attention [Bapna+, EMNLP 2018] • Evolved

    Transformer [So+, ICML 2019] • Synthesizer variants [Tay+, 2020] • Funnel Transformer [Dai+, NeurIPS 2020] • Lightweight and Dynamic convolution [Wu+, ICLR 2019] • Mixture of Experts Transformer [Shazeer+, NeurIPS 2018][Lepikhin+, ICLR 2021] • Switch Transformer [Fedus+, 2021] • Product Key Memory [Lample+, NeurIPS 2019] • Universal Transformer [Dehghani+, ICLR 2019]
  83. ಖ২ऋଐऎऩढञुभ/பऎऩढञुभ • Transparent Attention [Bapna+, EMNLP 2018] • Evolved Transformer

    [So+, ICML 2019] • Synthesizer variants [Tay+, 2020] • Funnel Transformer [Dai+, NeurIPS 2020] • Lightweight and Dynamic convolution [Wu+, ICLR 2019] • Mixture of Experts Transformer [Shazeer+, NeurIPS 2018][Lepikhin+, ICLR 2021] • Switch Transformer [Fedus+, 2021] • Product Key Memory [Lample+, NeurIPS 2019] • Universal Transformer [Dehghani+, ICLR 2019]
  84. Product Key MemoryधSynthesizer variants • Synthesizer [Tay+, 2020] – ॔ॸথ३ঙথষഔ॑ܳܭୃऊै

    • োৡXभ଍஄૗ఌ+ReLU+଍஄૗ఌ पقDenseك • ुअೊਯदःःृقRandomك – Performer [Choromanski+, ICLR’21] म ࢗध࢑भढ़شॿঝद॔ॸথ३ঙথ ॑੺๚ • Product Key Memory [Dehghani+, ICLR 2019] – প୤भkey॑શಥ৾ಆखऩऋै multi-head attention़ःऒध॑ ृॊPKMभ઀੧ – ঳৖भಽभFFN॑PKMप૗इॊ धؚेॉ৵ૠெऩॿॵॺডشॡ दಖ২؞ச২up! – ৐ᅚदमম૛ધध[Wu+, ICLR 2019]टऐऋFacebookभ૛ધ ق౎म৸थGoogle௺ك
  85. Switch TransformerधMixture of Experts Transformer • Switch Transformer؟1.6ூ଻भঃছওشॱ – धୂऎधপ૗जअटऋؚFFNऋളਯँॊقMixture

    of Expertsك – Switch Transformerदम৭උ৓पऒभFFNभअठ঳ण॑৭व – भदؚ৸ঃছওشॱ॑൐৚भ৾ಆृ௓૛पઞअॎऐदमऩः • ౟୥ – ऒोैभ঳৴भ૛ધम Googleपेॊुभ – ્पNorm Shazeerम • Transformerਉ෩भਸ਼2෩঻ • ऒोैभ૛ધपु෩঻ध खथোढथःॊ [Shazeer+, NeurIPS 2018][Lepikhin+, ICLR 2021][Fedus+, 2021]
  86. ऽधी • ੺ফभTransformerभ੝ఒু১भপ઄ऋ੪भTransformerध ૻसथপ୷ऩः – ളਯୖ਻दभሑ৷ਙऋ૮ःؚ९ش५॥شॻुऺध॒न૗ॎैऩः – ँॉऋञःત੉؟ৗञऩ੝ఒু১॑અइञৎम َളਯৰಎ॑ঋش५पઞइَُCVुஅिളਯभୖ਻द௬੼चेُ َঁॖঃشঃছওشॱ॑⑫इेَُਈଐகगूऩऎ਴಑+ীങُ

    One possible explanation for this is that the originally- proposed Transformer architecture was near-perfect, and there wasn't much that could be done to improve it. قऒोमؚਊੂ઀੧औोञTransformerभ॔شय़ॸॡ ॳকऋ౥᭷प੺ऎؚ੝ଐभ౟৉ऋँऽॉऩऊढञऒध ऋ৶૓धखथઅइैोऽघ؛ك ෩঻ै
  87. जोदु؟஍ટऋનੳऔोञু১ • ৰम…Vanilla Transformerदुোढथःॊੵ୏ऋँॊ – LayerNormऋ৏ ମLayerNormऋ੔ [Baevski+Auli 2019][Xiong+, 2020]

    – ബৌகपेॊਜ਼઼ඇी੢ा ମৼৌ৓ऩਜ਼઼ඇी੢ा [Raffel+, 2019] • ણਙ৲ঢ়ਯ؟GLUधGeLU/Swishभੌ়च • ਫૠ৲؟RMS Norm • ॹ॥شॲपउऐॊোলৡभীങ਀ਠभુથ • ॔شय़ॸॡॳকभੵ୏ – Mixture of Experts Transformer – Switch Transformer – Product key memory – Synthesizer variants
  88. Transformerのノウハウ その他の

  89. Layer Normalization ऋ৏ऊ੔ऊਖ਻ • Post-LN: ਙચऋৈःऋؚธಫऋਂ਍৒ • Pre-LN: ธಫऋ਍৒घॊऋؚਙચऋ଩ः •

    Bottom-to-Top (B2T) connection [Takase+, 2022] – Post-LNभ૗ఌચৡधPre-LNेॉરोञธಫ਍৒ਙ॑૏য় • DeepNorm [Wang+, 2022] – Post-LNঋش५दؚResidualமਢऋजभऽऽق1೅كਸ઴औोॊभ॑৒ਯ೅পऌऎघॊ – ঃছওشॱभੂ਋கقभ঳৖॑ك৒ਯदસढथ৵औऎघॊ – 1000ಽभTransformerदुธಫदऌॊेअपऩढञ ମෙ୷ऋಥরद଎ଷखथखऽअभऋਖ਻ ମোৡधলৡऋ౟ॉ૗ॎैऩःभऋਖ਻
  90. Positional Encoderभ੝ଐधWarmup Positional Encoder • ੪رमബৌਜ਼઼भඇी੢ा • ৼৌਜ਼઼पेॊਜ਼઼ඇी੢ा [Shaw+, NAACL’18]

    [Raffel+, 2019] • ബৌਜ਼઼भਜ਼઼ඇी੢ाप३ইॺਂ૗ਙ॑଑ো [Kiyono+, EMNLP’21] • RoPE (Rotary Position Embedding) [Su+, 2021] ৾ಆ૨भWarmup • ਈੂमݏݐ݁݌_݊ݑ݉ पৌखथ଍஄पੜਸ • ݏݐ݁݌_݊ݑ݉ऋݓܽݎ݉ݑ݌_ݏݐ݁݌धಉखऎऩॊध ݏݐ݁݌_݊ݑ݉भ਴্உपಗૻ୻खथ0पሰ੺
  91. सऌଭಋभઓःল • OpenAIपेॊGPT-3द௴೾औोञ১ಋ [Henighan+, 2020] • ੑ઴੒઴ؚॹشॱ७ॵॺ१ॖ६ؚঃছওشॱऋݐ೅पऩॊध௤ଷऋ ܽݐ௕೅৵औऎऩॊقܽ, ܾम৒ਯك णऽॉ…

    জ९ش५भ੗ःᄲऋ৤ण
  92. सऌଭಋभઓःল • GoogleपेॊPaLMदम… • ঃছওشॱऋ8Bମ62Bମ540Bधੜइञৎपశ৴ਢ૗৲ णऽॉ… জ९ش५भ੗ःᄲऋ৤ण ಌप

  93. লथऎॊऊुखोऩः጑௴ • Exploring the Limits of Large Scale Pre-training [Abnar+,

    ICLR’22] – JFT-300M॑঱૴ॱ५ॡधखथ৾ಆखञৎभಖ২ق૯ກك – vs. ਢःथৣ૴द૚ॱ५ॡ॑৾ಆखञधऌभಖ২قອກك ೋপऩ੦ೕঔॹঝ॑পૠெॹشॱ७ॵॺदธಫखथ ৏ਢॱ५ॡ৾ಆदऌॊधऒौऋਘः॒ट… ঱૴ॱ५ॡभಖ২ऋ ঱ऋढथुؚ ಥরऊैৣ૴ॱ५ॡ भಖ২म୍ઊठ
  94. औःओप Transformerभ੦ম৓ऩ৿੿ऊैૢ৷෇೧ؚ ਈ੺भMLP௺ॿॵॺডشॡऽद॑ᖊᰐखञ • जुजुTransformerढथء • Transformerᄑ௯ • Transformerमड़ড॥থءآ •

    Transformerभঀक़ঁक़ షವংॖ॔५ध૑ਏॹشॱ୤؞ੑ઴ਃभ ॺঞشॻड़ই • ঋॡॺঝ॑଻શप૗ఌ • ঋॡॺঝ॑ऽछथ૗ఌ ମ ॔ॸথ३ঙথमથ஍टऐन ৸थगूऩः ञ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ CNN ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ RNN ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ Transformer/MLP௺
  95. ౦رୂऌघऍथଐऎীऊैऩःآधःअযभञीप • Transformerमॺشॡথق౐ୁृঃॵॳऩनك॑Residualமਢखऩऋ ै2णभঔ४গشঝ॑೷ॉନखथःॊटऐآ – ॺشॡথ॑೴छॊToken Mixer – ॺشॡথ॑૗ఌघॊMLP •

    ਵ਻पऩढञMLP௺ुৰम੦ম৓प৊गଡୗآ • …ँधम౦رधঀक़ঁक़ऋँॊभदਞ॑હऐऽखॆअ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ࢞ ঋॡॺঝ॑ ऽछथ૗ఌ ঋॡॺঝ॑ ଻શप૗ఌ ਫ ૠ ৲ ਫ ૠ ৲ ਡ଍৖॑ളਯ৚೷ॉନघ