Scatter Lab Inc.
May 08, 2020
980

# SYNTHESIZER: Rethinking Self-Attention in Transformer Models

May 08, 2020

## Transcript

2. ### • EPUQSPEVDUTFMG"UUFOUJPOݫ஠פ્਷5SBOTGPSNFSݽ؛ࢿמ੄೨ब੸ੋݽٕ۽ঌ۰ઉ੓਺ • ೞ૑݅ ੉࠺ऴো࢑੉੿݈۽5SBOTGPSNFSࢿמীѾ੿੸ੌө ী੄ޙਸы਺ • ࠄ֤ޙীࢲחEPUQSPEVDUBUUFOUJPOэ਷FYQMJDJUೠUPLFOUPLFOBUUFOUJPOҳઑܳߡܻҊ  4ZOUIFUJD"UUFOUJPOҳઑܳыח4ZOUIFTJ[FSݽ؛ਸઁউ •

/.5 -.١׮নೠకझ௼ীࢲ405"5SBOTGPSNFSݽ؛ী࠺١ೠࢿמਸࠁ੐ • ژೠ SBOEPNMFBSOBCMFBMJHONFOUNBUSJDFT۽بજ਷ࢿמਸյࣻ੓׮חࢎपҗ  ӝઓ5SBOTGPSNFSীࢲ੄UPLFOUPLFOEFQFOEFODJFTоજ਷ࢿמਸղחؘ  ߈٘द೙ਃೞ૓ঋ׮חࢎपਸഛੋ ѐਃ
3. ### • ӝࠄҎ੗ח5SBOTGPSNFSҳઑܳٮܰغ 4FMG"UUFOUJPO.PEVMFTਸ  4ZOUIFUJD"UUFOUJPO.PEVMFT۽߸҃ೠҳઑ • ӝઓ੄TFMGBUUFOUJPO੄R Lѐ֛ਸ  হগҊ ૒੽੸ਵ۽BMJHONFOUNBUSJYܳ  न٣ࢎ੉ૉ

• Y = Softmax(QKT)V 4:/5)&4*;&3ݽ؛ "MJHONFOUNBUSJY
4. ### %FOTF4ZOUIFTJ[FS Bi = F(Xi ) ӡ੉ ରਗ ੋੑ۱ ী؀೧  ߣ૩ష௾

ܳ ରਗীࢲ ରਗਵ۽ࢎ৔ೞחೣࣻ ܳ੿੄ೞৈ Y ରਗੋBMJHONFOUNBUSJY ܳҳࢿ l d X ∈ ℝl×d i Xi d l F l l B F(X) = W(σR (W(X) + b)) + b ੉ೣࣻחইې৬э਷க'FFE'PSXBSE/FUXPSL۽ҳഅ ਤ੄BMJHONFOUNBUSJYܳ੉ਊೞৈ5SBOTGPSNFS৬زੌೞѱ"UUFOUJPOো࢑ Y = Softmax(B)G(X) חӝઓ5SBOTGPSNFS੄7BMVF৬زੌ G(X)
5. ### 3BOEPN4ZOUIFTJ[FS ؊рױೞѱBMJHONFOUNBUSJYܳੑ۱ ী੄ઓೞ૑ঋחSBOEPNJOJUJBMJ[FENBUSJY ۽ҳࢿ X R Y = Softmax(R)G(X) •

೨बই੉٣যחUPLFOCZUPLFOJOUFSBDUJPO੉աпUPLFO੄੿ࠁܳഝਊೞחѱইפۄ  ౠ੿కझ௼ী੸೤ೠBMJHONFOUܳ೟णೞѷ׮חѪ • ী؀೧ о૑׮Ѩૐ R Trainable, Fixed
6. ### • 4:/5)&4*;&3ݽ؛਷ӝઓ5SBOTGPSNFSী࠺೧ੌ߈੸ਵ۽౵ۄ޷ఠܳ੺ডೞ૑݅  ੌ҃਋   द௫झӡ੉ ੉ӡয૑ݶӝઓ5SBOTGPSNFSࠁ׮౵ۄ޷ఠоૐоೡࣻ੓਺ • ٮۄࢲ 'BDUPSJ[BUJPOਸా೧౵ۄ޷ఠ੄੺ডਸԫೣ

• ౵ۄ޷ఠ੄੺ডࡺ݅ইפۄয়ߡೖ౴ߑ૑ബҗب੓਺ d l l 'BDUPSJ[BUJPO
7. ### 'BDUPSJ[FE%FOTF4ZOUIFTJ[FS A, B = FA (Xi ), FB (Xi )

 ח ܳпп ରਗਵ۽ࢎ৔ೞחೣࣻױ   FA FB Xi a, b a * b = l Y = Softmax(C)G(X) ੉Ҋ )חױࣽUJMJOHGVODUJPO ੑ۱чਸLߣEVQMJDBUF C = HA (A) * HB (B) ӝઓ੄ ରਗ੄ੑ۱ ܳ ରਗਵ۽ࢎ৔ೞחೣࣻ ܳѐ੄ೣࣻ  ۽'BDUPSJ[BUJPO d Xi l F FA FB
8. ### 'BDUPSJ[FE3BOEPN4ZOUIFTJ[FS Y = Softmax(R1 RT 2 )G(X) SBOEPNBMJHONFOUNBUSJY ਸ ௼ӝܳыח

۽'BDUPSJ[BUJPO   R l × k R1 , R2 k l ౵ۄ޷ఠѐࣻо ীࢲ ۽хࣗ QSBDUJDBMೞѱח ਸࢎਊ l2 2kl k = 8
9. ### .JYUVSFPG4ZOUIFTJ[FST Y = Softmax(α1 S1 (X) + . . .

+ αN SN (X))G(X) ࠂࣻѐ੄4ZOUIFTJ[JOHGVODUJPOਸഒ೤೧ࢲࢎਊ חখীࢲࢸݺೠ%FOTF഑਷3BOEPN4ZOUIFTJ[FS  ח೟णоמೠ౵ۄ޷ఠ S α ∑ α = 1
10. ### %JTDVTTJPO • 4ZOUIFTJ[FSחӝઓ੄5SBOTGPSNFSܳખ؊ੌ߈ചೠߡ੹੉ۄࢤпೡࣻ੓਺ •  • \$POEJUJPO0O ߣ૩ష௾ী؀ೠ4ZOUIFTJ[JOH'VODUJPO੄੄ઓࢿ • 4BNQMFݽٚੑ۱ࢠ೒ী؀೧(MPCBMೠ'VODUJPOੋо

ࢠ೒݃׮׮ܲчਸыח-PDBMೠ'VODUJPOੋо • *OUFSBDUష௾р੄࢚ഐ੘ਊ੉ઓ੤ೞחо S(X) = FQ (X)FK (X)T i

\$PODMVTJPO