SYNTHESIZER: Rethinking Self-Attention in Transformer Models

SYNTHESIZER: Rethinking Self-Attention in Transformer Models

A42dd3541cd40296dcd8a5e6b4a01bef?s=128

Scatter Lab Inc.

May 08, 2020
Tweet

Transcript

  1. 4:/5)&4*;&33FUIJOLJOH 4FMG"UUFOUJPOJO5SBOTGPSNFS.PEFMT .-3FTFBSDI4DJFOUJTU੿׮਍ BS9JW (PPHMF3FTFBSDI 

  2. • EPUQSPEVDUTFMG"UUFOUJPOݫ஠פ્਷5SBOTGPSNFSݽ؛ࢿמ੄೨ब੸ੋݽٕ۽ঌ۰ઉ੓਺ • ೞ૑݅ ੉࠺ऴো࢑੉੿݈۽5SBOTGPSNFSࢿמীѾ੿੸ੌө ী੄ޙਸы਺ • ࠄ֤ޙীࢲחEPUQSPEVDUBUUFOUJPOэ਷FYQMJDJUೠUPLFOUPLFOBUUFOUJPOҳઑܳߡܻҊ
 4ZOUIFUJD"UUFOUJPOҳઑܳыח4ZOUIFTJ[FSݽ؛ਸઁউ •

    /.5 -.١׮নೠకझ௼ীࢲ405"5SBOTGPSNFSݽ؛ী࠺١ೠࢿמਸࠁ੐ • ژೠ SBOEPNMFBSOBCMFBMJHONFOUNBUSJDFT۽بજ਷ࢿמਸյࣻ੓׮חࢎपҗ
 ӝઓ5SBOTGPSNFSীࢲ੄UPLFOUPLFOEFQFOEFODJFTоજ਷ࢿמਸղחؘ
 ߈٘द೙ਃೞ૓ঋ׮חࢎपਸഛੋ ѐਃ
  3. • ӝࠄҎ੗ח5SBOTGPSNFSҳઑܳٮܰغ 4FMG"UUFOUJPO.PEVMFTਸ
 4ZOUIFUJD"UUFOUJPO.PEVMFT۽߸҃ೠҳઑ • ӝઓ੄TFMGBUUFOUJPO੄R Lѐ֛ਸ
 হগҊ ૒੽੸ਵ۽BMJHONFOUNBUSJYܳ
 न٣ࢎ੉ૉ

    • Y = Softmax(QKT)V 4:/5)&4*;&3ݽ؛ "MJHONFOUNBUSJY
  4. %FOTF4ZOUIFTJ[FS Bi = F(Xi ) ӡ੉ ରਗ ੋੑ۱ ী؀೧
 ߣ૩ష௾

    ܳ ରਗীࢲ ରਗਵ۽ࢎ৔ೞחೣࣻ ܳ੿੄ೞৈ Y ରਗੋBMJHONFOUNBUSJY ܳҳࢿ l d X ∈ ℝl×d i Xi d l F l l B F(X) = W(σR (W(X) + b)) + b ੉ೣࣻחইې৬э਷க'FFE'PSXBSE/FUXPSL۽ҳഅ ਤ੄BMJHONFOUNBUSJYܳ੉ਊೞৈ5SBOTGPSNFS৬زੌೞѱ"UUFOUJPOো࢑ Y = Softmax(B)G(X) חӝઓ5SBOTGPSNFS੄7BMVF৬زੌ G(X)
  5. 3BOEPN4ZOUIFTJ[FS ؊рױೞѱBMJHONFOUNBUSJYܳੑ۱ ী੄ઓೞ૑ঋחSBOEPNJOJUJBMJ[FENBUSJY ۽ҳࢿ X R Y = Softmax(R)G(X) •

    ೨बই੉٣যחUPLFOCZUPLFOJOUFSBDUJPO੉աпUPLFO੄੿ࠁܳഝਊೞחѱইפۄ
 ౠ੿కझ௼ী੸೤ೠBMJHONFOUܳ೟णೞѷ׮חѪ • ী؀೧ о૑׮Ѩૐ R Trainable, Fixed
  6. • 4:/5)&4*;&3ݽ؛਷ӝઓ5SBOTGPSNFSী࠺೧ੌ߈੸ਵ۽౵ۄ޷ఠܳ੺ডೞ૑݅  ੌ҃਋ 
 द௫झӡ੉ ੉ӡয૑ݶӝઓ5SBOTGPSNFSࠁ׮౵ۄ޷ఠоૐоೡࣻ੓਺ • ٮۄࢲ 'BDUPSJ[BUJPOਸా೧౵ۄ޷ఠ੄੺ডਸԫೣ

    • ౵ۄ޷ఠ੄੺ডࡺ݅ইפۄয়ߡೖ౴ߑ૑ബҗب੓਺ d l l 'BDUPSJ[BUJPO
  7. 'BDUPSJ[FE%FOTF4ZOUIFTJ[FS A, B = FA (Xi ), FB (Xi )

     ח ܳпп ରਗਵ۽ࢎ৔ೞחೣࣻױ   FA FB Xi a, b a * b = l Y = Softmax(C)G(X) ੉Ҋ )חױࣽUJMJOHGVODUJPO ੑ۱чਸLߣEVQMJDBUF C = HA (A) * HB (B) ӝઓ੄ ରਗ੄ੑ۱ ܳ ରਗਵ۽ࢎ৔ೞחೣࣻ ܳѐ੄ೣࣻ  ۽'BDUPSJ[BUJPO d Xi l F FA FB
  8. 'BDUPSJ[FE3BOEPN4ZOUIFTJ[FS Y = Softmax(R1 RT 2 )G(X) SBOEPNBMJHONFOUNBUSJY ਸ ௼ӝܳыח

    ۽'BDUPSJ[BUJPO   R l × k R1 , R2 k l ౵ۄ޷ఠѐࣻо ীࢲ ۽хࣗ QSBDUJDBMೞѱח ਸࢎਊ l2 2kl k = 8
  9. .JYUVSFPG4ZOUIFTJ[FST Y = Softmax(α1 S1 (X) + . . .

    + αN SN (X))G(X) ࠂࣻѐ੄4ZOUIFTJ[JOHGVODUJPOਸഒ೤೧ࢲࢎਊ חখীࢲࢸݺೠ%FOTF഑਷3BOEPN4ZOUIFTJ[FS  ח೟णоמೠ౵ۄ޷ఠ S α ∑ α = 1
  10. %JTDVTTJPO • 4ZOUIFTJ[FSחӝઓ੄5SBOTGPSNFSܳખ؊ੌ߈ചೠߡ੹੉ۄࢤпೡࣻ੓਺ •  • $POEJUJPO0O ߣ૩ష௾ী؀ೠ4ZOUIFTJ[JOH'VODUJPO੄੄ઓࢿ • 4BNQMFݽٚੑ۱ࢠ೒ী؀೧(MPCBMೠ'VODUJPOੋо

    ࢠ೒݃׮׮ܲчਸыח-PDBMೠ'VODUJPOੋо • *OUFSBDUష௾р੄࢚ഐ੘ਊ੉ઓ੤ೞחо S(X) = FQ (X)FK (X)T i
  11. प೷

  12. .BDIJOF5SBOTMBUJPO-BOHVBHF.PEFMJOH

  13. 5FYU(FOFSBUJPO

  14. • 4ZOUIFTJ[FSҳઑ۽5ܳ೟णदெ (-6&߂4VQFS(-6&ࣇী؀೧ಣо .VMUJ5BTL/-6

  15. • &ODPEFS%FDPEFSпۨ੉য੄"UUFOUJPO8FJHIU࠙ನܳഛੋ೧ࠄѾҗ • 4ZOUIFTJ[FSݽ؛਷5SBOTGPSNFSݽ؛ী࠺೧࠙࢑੉؊௾ಞ • %FOTF4ZOUIFTJ[FSח3BOEPN4ZOUIFTJ[FSী࠺೧੘਷чਸ݆੉ы਺ "OBMZTJT 

  16. • .VMUJIFBE੄ѐࣻীٮܲ.5ࢿמ߸ച "OBMZTJT 

  17. • %PU1SPEVDU"UUFOUJPOਸ؀୓ೞח4ZOUIFUJD"UUFOUJPOҳઑܳыח4ZOUIFTJ[FSҳઑܳઁউ • ׮নೠకझ௼ী؀೧ӝઓ5SBOTGPSNFS৬DPNQFUJUJWFೠࢿמਸࠁ੐ • ؀ചࢤࢿకझ௼ীࢲח؊જ਷ࢿמਸࠁ੐ • ৈӝࢲઁউೠ4ZOUIFUJD"UUFOUJPOҳઑחয٣ө૑ա4FMG"UUFOUJPOਸ؀୓ೞח੹ۚ੐ • 4PVSDF৬5BSHFUद௫झр੄$SPTT"UUFOUJPOҳઑীࢲח؊ױࣽೞݶࢲبજ਷ҳઑܳ଺૑ޅೣ

    $PODMVTJPO
  18. хࢎ೤פ׮✌ ୶о૕ޙژחҾӘೠ੼੉੓׮ݶ঱ઁٚইېোۅ୊۽োۅ઱ࣁਃ +VOH%BXPPO &NBJMEB!OBWFSDPN