Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[Journal club] Pay Attention to MLPs
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Semantic Machine Intelligence Lab., Keio Univ.
PRO
July 19, 2021
Technology
1.5k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[Journal club] Pay Attention to MLPs
Semantic Machine Intelligence Lab., Keio Univ.
PRO
July 19, 2021
More Decks by Semantic Machine Intelligence Lab., Keio Univ.
See All by Semantic Machine Intelligence Lab., Keio Univ.
[Journal club ] PHyCLIP: ðð-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
keio_smilab
PRO
0
42
[Journal club] ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
keio_smilab
PRO
0
100
[Journal club] ReLaGS: Relational Language Gaussian Splatting
keio_smilab
PRO
0
100
[Journal club] Flow as the Cross-Domain Manipulation Interface
keio_smilab
PRO
0
90
Mobi-ð: Mobilizing Your Robot Learning Policy
keio_smilab
PRO
0
160
A Gentle Introduction to Transformers
keio_smilab
PRO
16
6.9k
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
keio_smilab
PRO
0
58
[Journal club] VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
keio_smilab
PRO
0
140
[Journal club] Improved Mean Flows: On the Challenges of Fastforward Generative Models
keio_smilab
PRO
0
200
Other Decks in Technology
See All in Technology
iOS ã¢ããªã®ãããã£ãŠäžå ·åã§ããïŒãã AI ã«èª¿ã¹ãŠããã
miichan
0
140
AIã®Reactç¿çåºŠãæž¬ã
uhyo
2
680
èµ·ç¹ã»æèã»åºåã§åè§£ãã ãPMæ¥åã®èªååèšèšã
kazu_kichi_67
1
1.1k
ã¯ã©ãŠããã¡ã³ãã£ã³ã°çStackChan 3äœïŒ4äœïŒãã€ã³ã¿ã©ã¯ãã£ããªäœéšåäœåã«ããŠå±ç€ºããã話 / ïŸïœ¯ïœžïŸïœ¬ïŸãèªçæ¥äŒ2026
you
PRO
0
180
飲é£åºãAIã§ãã¬ãžç· ãããã³ãã£ã·ã¹ãã ãã€ãã£ãŠã話 / Using AI for restaurant management
vtryo
0
180
èªåã詳ãããªãé åã§AIã䜿ã #ãããã¹2026
konifar
20
7.5k
ãããã·ããã¹ããããããšã³ã·ããã¢ããšã¯äœãïŒ
ryooob
0
300
AIãã£ããæ€çŽ¢æ¹åã®3é±é
kworkdev
PRO
2
170
AIãã£ããã®æ¹åããèŠãããè¯ãAIäœéšãšã¯ / What Constitutes a Good AI Experience: Insights from Improving AI Chat
kubode
0
120
10幎éã®ããã°çºä¿¡ãæ¯ãè¿ã£ãŠèŠããWebã¢ããªã±ãŒã·ã§ã³ãšã³ãžãã¢ãšããŠã®è»è·¡
stefafafan
0
190
Agile and AI Redmine Japan 2026
hiranabe
4
480
2026-06-24_人ãšAIã®è²¬ååé¢ã«åºã€ããéçºãããã»ã¹ã®ææ¡.pdf
takahiromatsui
0
120
Featured
See All Featured
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
170
The Directorâs Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
200
We Have a Design System, Now What?
morganepeng
55
8.2k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Skip the Path - Find Your Career Trail
mkilby
1
150
Evolution of real-time â Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Navigating the moral maze â ethical principles for Al-driven product design
skipperchong
2
400
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
780
Amusing Abliteration
ianozsvald
1
210
Automating Front-end Workflow
addyosmani
1370
210k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.3k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
950
Transcript
Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le
(Google Research, Brain Team) Pay Attention to MLPs Liu, Hanxiao, et al. "Pay Attention to MLPs." arXiv preprint arXiv:2105.08050 (2021). æ ¶æçŸ©å¡Ÿå€§åŠ ææµŠåæç 究宀 çäžé§¿å¹³
2 ⢠Attention-free transformer ã®äžçš®ã§ãããgMLPãã®ææ¡ â¢ Transformer ã®æ§é ã倿Žãã gate ä»ã
Multi Layer Perceptron ( MLP ) ã®ã¿ã§èšèš æŠèŠ â gMLP ãçŸåšã® Transformer ãããåªããæ§èœããŸãã¯åçã®æ§èœãçºæ® â self-attention ããã»ã©éèŠãªèŠçŽ ã§ã¯ãªã
3 ⢠Transformer [Vaswani+, NIPS2017] ã®åºçŸ â Natural Language Processing
( NLP ) ⊠LSTMã»RNN â BERT [Devlin+, NAACL2018] â Computer Vision ( CV ) ⊠CNN â ViT [Dosovitskiy+, ICLR2020] ⢠self-attention ã®ç¹åŸŽ 1. ååž°çã§ã¯ãªã ( = 䞊ååŠç ) 2. token éã®ç©ºéæ å ±ãååŸ èæ¯ïŒTransformer ã®åºçŸã«ãã NLP, CV ã®çºå±
⢠Attention æ©æ§ã®ã¡ãªãã [Bahdanau+, ICLR2015] â å ¥åããŒã¿ã®è¡šçŸã«åºã¥ããåçãªãã©ã¡ãŒã¿ã® 決å®ã«ãããããæå¹ãªåž°çŽãã€ã¢ã¹ãå°å ¥å¯èœ â åž°çŽãã€ã¢ã¹ãé¡èã«æå¹ãã©ããæªè§£æ±º
⢠MLP ã®ã¡ãªãã[Hornik+, 1989] â éçãªãã©ã¡ãŒã¿ã§ä»»æã®é¢æ°ãè¿äŒŒå¯èœ åé¡æèµ·ïŒMLP ã§ã®ä»£æ¿æ¡ã»Attention ã®å¿ èŠæ§ ⢠MLP ã§ self-attention ã®ç¹åŸŽã衚çŸå¯èœã ⢠self-attention ãçšããå¿ èŠæ§ã®æ¯é 4
5 ⢠2021幎以éã« MLPs ãåè©äŸ¡ãããŠãã â ãã ããMLP ããã Transformer ã®ã»ããäŸç¶ãšããŠç²ŸåºŠããã
æ¢åææ³ïŒMLPs ãåè©äŸ¡ãããŠãã ( 2021幎以é ) æ¢åææ³ ç¹åŸŽ Transformer BERT [Devlin+, NAACL2018] ⢠Transformerã®Encoderã䜿ã£ãã¢ã㫠⢠äºååŠç¿ãšããŠMLMãšNSPãåŠç¿ ViT [Dosovitskiy, ICLR2020] DeiT [Hugo+, 2020] ⢠ç»åããããåèªã®ããã«æ±ã ⢠DeiT 㯠ViT ã®åŠç¿ããŒã¿ããã©ã¡ãŒã¿ãæžãããã¢ãã« MLP MLP-Mixer [Tolstikhin+, 2021] ⢠ç»åãããããã£ã³ãã«æ¹åããã³ç©ºéæ¹åã«é¢ã㊠MLPã§å€æ ResMLP [Touvron+, 2021] ⢠ç»åããããMLPã®ã¿ã§ã§ããæ®å·®ãããã¯ã«è€æ°å éããŠãåé¡ãããã«å ¥å
⢠åäžãµã€ãºã® ð¿ åã® gMLP block ã§æ§æ ⢠Spatial Gating
Unit ( SGU ) â token éã®ç©ºéçžäºäœçšãååŸ â¢ å ¥å ( åºå ) 圢åŒïŒTransformer ã«æºãã â å ¥å â¶ sequence length ð à dimension ðdim ⢠Position embedding ã¯äžèŠ â SGU ãç©ºéæ å ±ãååŸãããã ææ¡ææ³ ( 1/4 )ïŒOverview gMLP Model 6 â â¡ â¢
1. Input Embedding 局㧠ð à ðdim ã®æ¬¡å ð ã«å€æ
â NLPïŒåèªããã¯ãã«å â CV ïŒãããåå² â ãã¯ãã«å 2. Normalization å±€ ( æ£èŠå ) 3. Channel Projection å±€ ïŒð â ãã£ãã«æ¹åã«ç·åœ¢å°åœ± (ðdim â ðffn ) 4. Activation å±€ ( 掻æ§å颿° )ïŒð(â) â ããã§ã¯ GeLU 颿°ãäœ¿çš 5. ð à ðffn 次å ã® ð ãåŸã ææ¡ææ³ ( 2/4 )ïŒâ Input Embedding ïœ Activation Norm Channel Proj GeLU ð ðdim ðffn ð ð(â) ð ð = ð(ðð) ð ð 7
6. ð ã ð1 , ð2 ã«åå² â å岿¹æ³ïŒãã£ãã«æ¹åã« 2
åå² 7. ð2 ãæ£èŠå â éåŠç¿é²æ¢ 8. Spatial Projection å±€ïŒ ðð,ð ð2 = ðð2 + ð â ç©ºéæ¹åã«ç·åœ¢å°åœ± â ð ïŒéã¿è¡å ( ð à ð ) â ð ïŒãã€ã¢ã¹é 9. ð1 ãš ðð,ð ð2 ãçµåïŒð (â) ææ¡ææ³ ( 3/4 )ïŒ â¡ Spatial Gating Unit ( SGU ) ð ð = ð1 âš ðð,ð ð2 8 ð Norm Spatial Proj ð1 ð2 ðð,ð (â) ð (â)
10. ð (ð) ãç·åœ¢å°åœ±ïŒð â ãã£ãã«æ¹åã«å°åœ±ãã ð à ðdim 次å ð
ã«å€æ â ð = ð ð ð 11.Input Embedding å±€ã®åºå ð ãšå ç® â åºå ðout 㯠ð à ðdim 次å 12.以é gMLP block ã ð¿ 局繰ãè¿ã ææ¡ææ³ ( 4/4 )ïŒâ¢ gMLP block output ð (ð) ðout = ðâšð Channel Proj SGU Input Embedding ð (ð, ðdim ) (ð, ðdim ) ðout ð ð (ð, ðfnn ) 9
⢠ImageNet ã§ç»ååé¡ã¿ã¹ã¯ã®æ§èœæ¯èŒ â gMLP ã¯3ã€ã®ãµã€ãºã®ã¢ãã«ãçšæ â DeiT ãšåçãªæ§èœ â
self-attention 㯠CV ã«ãããŠäžèŠ â ä»ã® MLP ç³»åã®äžã§æã粟床ãé«ã â SGU ã®æå¹æ§ã®ç¢ºèª â CNN ã¢ãã«ã®ã»ãããã粟床 â EfficientNet [Tan+, ICML2019] â NFNet [Brock+, 2021] çµæ ( 1/3 )ïŒç»ååé¡ã¿ã¹ã¯ã§ DeiT ãšåçãªæ§èœãç²åŸ 10 Model ImageNet Top1 ( % ) ConvNets EfficientNet-B0 77.1 EfficientNet-B3 81.6 EfficientNet-B7 84.3 NFNet-F0 84.3 Transformer DeiT-Ti 72.2 DeiT-S 79.8 DeiT-B 81.8 MLP-like Mixer-B/16 76.4 Mixer-L/16 71.8 ResMLP-12 76.6 ResMLP-24 79.4 ResMLP-36 79.7 gMLP-Ti ( ours ) 72.0 gMLP-S ( ours ) 79.4 gMLP-B ( ours ) 81.6
⢠Masked Language Modeling ( MLM ) ã§äºååŠç¿ãã2çš®é¡ã® NLP åé¡ã¿ã¹ã¯ãšãã®ç²ŸåºŠãæ¯èŒ
MNLI-m â self-attention 㯠scalability ãéæããèŠå ã§ã¯ãªã â MNLI ã®ãããªè€æ°æã®ã¿ã¹ã¯ãæ±ãå ŽåãTransformer ã®ã»ããæ§èœãé«ã â self-attention 㯠cross-sentence alignment ã«é¢äžããŠãããšä»®å® çµæ ( 2/3 )ïŒself-attention 㯠scalability ã«é¢äžããªã ã©ã¡ããã°ã©ã ã®åŸããåç 11 MNLIã¿ã¹ã¯ã§ã¯ 粟床ã§äžåã SST-2 Finetuning accuracy Finetuning accuracy Params ( log scale ) Params ( log scale )
⢠Attention æ©æ§ã¯ç¹å®ã®ã¿ã¹ã¯ ( ex. MNLI ) ã« æå¹ã§ããå¯èœæ§ â¢
Attention æ©æ§ã SGU ã«çµã¿èŸŒã â size 64 ã® single head-attention â gmlp à self-attention ïŒaMLP ⢠aMLP ãæã¬ãã«éã®é¢ä¿æ§ããšããç¢ºèª AblationïŒ gMLP ã« Attention æ©æ§ãå ãã ( aMLP ) 12
â¢ åæ§ã«ãNLP ã®2çš®é¡ã®åé¡ã¿ã¹ã¯ã®ç²ŸåºŠãæ¯èŒ ( Transformer vs aMLP ) MNLI-m SST-2
â gMLP ã®æ¬ ç¹éšåã aMLP ãå æ â self-attention ã¯æã¬ãã«éã®é¢ä¿æ§ã«å¯äž çµæ ( 3/3 )ïŒaMLP ã Transformer ã粟床ã§äžåã 13 MNLI ã¿ã¹ã¯ã§ã Transformerãäžåã Finetuning accuracy Finetuning accuracy Params ( log scale ) Params ( log scale )
14 ⢠timm ã©ã€ãã©ãªã® ImageNet åŠç¿æžã¿ã¢ãã«ãäœ¿çš â¢ gMLP ãæ£ããèªèã»EfficientNet ã誀èªèããäŸ
â gMLP ã¯å±æçãªèªèã Efficient Net ã¯å šäœçãªèªèãããŠãã gMLP ãš EfficientNet ãå®éã«åãããŠã¿ãâ ( 宿§ççµæ ) https://github.com/rwightman/pytorch-image-models.git æ£è§£ïŒã¢ã€ã¹ãã£ã³ã㣠æ£è§£ïŒç¶ã®ãã£ãã EfficientNet ã®èª€èªèäŸ Ã ã¶ãªã¬ã à 氎ç EfficientNet ã®èª€èªèäŸ Ã ãã©ã㯠à èåãæ©
15 ⢠gMLP ã誀èªèã»EfficientNet ãæ£ããèªèããäŸ â gMLP 㯠ç©äœãèŠåããŠãã or
端ã«äœçœ®ããŠãã ç»åã«å¯ŸããŠèªèã匱ã â CV åéã«ã aMLP ãçšããããšã§ãç»åãããéã®é¢ä¿æ§ã«æ³šç®ãæ¹åãããã®ã§ã¯ãªãã gMLP ãš EfficientNet ãå®éã«åãããŠã¿ãâ¡ ( 宿§ççµæ ) https://github.com/rwightman/pytorch-image-models.git æ£è§£ïŒå ¬åã®ãã³ã æ£è§£ïŒããã¹ããŒã« gMLP ã®èª€èªèäŸ Ã æ§ Ã ã«ã¿ãã 㪠gMLP ã®èª€èªèäŸ Ã ã¿ã©ã³ãã¥ã© à 泚å°åš
16 ⢠Attention-free transformer ã®äžçš®ã§ãããgMLPãã®ææ¡ â¢ self-attention æ©æ§ã¯ CV ã§ã¯ã»ãŒå¿ èŠæ§ããªã
⢠NLP ã§ãç¹å®ã®ã¿ã¹ã¯ä»¥å€ã§ã¯å¿ èŠæ§ãäœã ãŸãšã SLIDE 16 â self-attention ã¯ææšªæç㪠alignment ãå¿ èŠãšããã¿ã¹ã¯ã«æå¹ â gMLP ã®ã¢ãã«ã倧ãããã or aMLP ã§Transformer ãšã®å·®ãçž®å°