Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pruning Basics on Multi Head Attention-based Models

Pruning Basics on Multi Head Attention-based Models

Scatter Lab Inc.

May 18, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. "SF"MM"UUFOUJPO)FBET*NQPSUBOU • ѐঀ݅.BTLJOHೞחप೷ ѐࡐҊ׮.BTLJOHೞחप೷ਸ૓೯ೣ • فѐ੄ݽ؛ࢎਊ • 8.5 • 7BTXBOJFUBM੄-BSHF5SBOTGPSNFS

    -BZFS )FBETQFS-BZFS  • 0UUFUBM GBJSTFR 1SFUSBJOFE.PEFM newstest2013#-&64DPSF • ѐ੄"UUFOUJPO.FDIBOJTNࢎਊ &OD&OD &OD%FD %FD%FD  • #&35 • %FWMJOFUBM -BZFS "UUFOUJPO)FBET  • #BTF .VMUJ/-*.BUDIFE%BUBTFU'JOFUVOJOH೧ࢲࢎਊ "DDVSBDZӝળ࠺Ү • ѐ੄"UUFOUJPO.FDIBOJTNࢎਊ п-BZFS߹TFMGBUUFOUJPO 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT
  2. • पઁ۽о੢*NQPSUBOUೠ)FBEೞա݅ਵ۽୽࠙ೠࢿמ੉ա৳਺ • $SPTTUBTL&WBMVBUJPOীࢲب࠺तೠѾҗܳࠁ੐ • ҭ੢൤Ҋޖ੸ੋѾҗ • ੹୓1BSBNFUFS੄ ഑਷ ࠙੄݅ࢎਊ೧ࢲࢿמ੉ࠁઓؽ

    • ೞ૑݅ݻѐ੄-BZFSীࢲח.VMUJIFBEо೙ਃ೮਺ • &ODPEFS%FDPEFS"UUFOUJPOীࢲೞաो؊פڄয૗ "CMBUJOH"MM)FBETCVU0OF 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT
  3. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT • ૑Әө૑੄ղਊ਷ౠ੿5FTU4FUী੄ઓػഋక੄प೷Ѿҗ • ׮ܲ%BUBTFUী؀ೠੌ߈ചооמೠо  • )FBEо6OJWFSTBMMZ*NQPSUBOUೠഋక۽ઓ੤ೞחо  •

    ৮੹൤؀ઑ੸ੋ৉ೡਸೞח0VUPGEPNBJO5FTU4VJUF۽ппप೷ೣ • 8.5ী.5/5&OHMJTIUP'SFODI5FTU4FU • #&35ী./-*.JTNBUDIFE%BUBTFU7BMJEBUJPO4FU "SF*NQPSUBOU)FBETUIF4BNF"DSPTT%BUBTFUT
  4. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT -BZFSXJTF3FMFWBODF1SPQBHBUJPO -31 • %JOHFUBM ীࢲ୊਺ઁदؽ • 8FJHIU3BUJPܳഝਊೞৈ/FUXPSL࢚੄ౠ੿/FVSPOٜࢎ੉੄ 3FMBUJWF$POUSJCVUJPOਸ҅࢑ೞחߑߨ •

    .PEFMী੄೧৘ஏػ୭࢚ਤ5PQ-PHJUীп)FBEо঴݃աҙৈೞ৓חоܳ ҅࢑ೞחؘী੉.FUIPEܳࢎਊೡѪਸઁदೣ • ੉чਸӝળਵ۽઺ਃبܳಣо
  5. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT 4ZOUBDUJD)FBETn.FUIPEPMPHZ • $PSF/-1 .BOOJOHFUBM ܳࢎਊ೧ࢲ%FQFOEFODZ4USVDUVSFܳ݅ٞ • "UUFOUJPO8FJHIUਸ୶୹ೠറਤ%FQFOEFODZ4USVDUVSF৬࠺Ү • খীࢲݺदೠޙߨ੸ਃࣗܳоܻఃח%FQFOEFODZо٘۞աח૑ܳഛੋೞҊर਺

    • ౠ੿%FQFOEFODZ3FMBUJPO੉ౠ੿%JSFDUJPOਸоܻఃח҃਋ী؀ೠ"DDVSBDZܳ҅࢑ • "DDVSBDZоইېౠ੿ޙߨਃࣗী؀ೠ1PTJUJPOBM3FMBUJPOࠁ׮੉࢚֫ਸ҃਋4ZOUBDUJDೞ׮Ҋ੿੄ пޙߨਃࣗ߹۽1PTJUJPOBM3FMBUJPOਸ࠺Үೣ ਤ஖੸҃ೱࢿژೠэ੉о૑חѪਸഛੋೣ
  6. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT 1SVOJOH"UUFOUJPO)FBET • .VMUJIFBE"UUFOUJPOਸ׮਺ҕधਵ۽߄Է •  • חп)FBE݃׮׮ܰѱࠗৈغח(BUF • *OQVU4FOUFODF৬ח*OEFQFOEFOUೣ

    • п)FBEܳ%PXO8FJHIUJOHೞחѱইפҊই৘%JTBCMFदఃҊर਺ • ী 3FHVMBSJ[BUJPOਸ੸ਊ • .VMUJ)FBE(Q, K, V) = $PODBUi (gi ⋅ )FBEi )WO gi gi L0 L0 (g1 , …, gi ) = h ∑ i=1 (1 − [[gi = 0]])
  7. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT 1SVOJOH"UUFOUJPO)FBET • Ӓ۞ա 3FHVMBUJPO਷޷࠙ࠛоמೞ޲۽ 4UPDIBTUJD3FMBYBUJPOਸࢎਊೣ • ܳ3BOEPN7BSJBCMF۽فҊ)FBETQFDJGJD%JTUSJCVUJPOীٮۄة݀੸ਵ۽੿ೣ • )BSE$PODSFUF%JTUSJCVUJPOਸࢎਊೣ

    $PODSFUF੄#JOBSZߡ੹ L0 gi LC (ϕ) = h ∑ i=1 (1 − P(gi = 0|ϕi )) L(θ, ϕ) = Lxent (θ, ϕ) + λLC (ϕ) -/PSN੄3FMBYBUJPO L0 (g1 , …, gi ) = h ∑ i=1 (1 − [[gi = 0]]) ୭ઙ-PTTೣࣻ 5SBOTMBUJPO.PEFM੄$SPTT&OUSPQZ 3FHVMBSJ[BUJPO$PFGGJDJFOU թਸ)FBE੄іࣻઑ੺ Lxent λ
  8. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT 1SVOJOHBU*OGFSFODF5JNFn4FMFDUJOH-BZFSTUP1SVOF • 1SVOJOHೡ-BZFSܳҊܰחо૑4USBUFHZ • &WFSZ0UIFS • 3BUF о઱য઎ਸٸ %FQUI

    ী؀೧ࢲܳ1SVOJOHೣ • о੢૒ҙ੸੉Ҋ#BMBODFE/FUXPSLٜܳ݅ࣻ੓חߑߨ ݽٚ҃਋ীੜ੘زೣ • 4FBSDIPO7BMJE • -BZFSٜ੄ৈ۞$PNCJOBUJPOਸ҅࢑ೞҊ7BMJEBUJPO4FUਵ۽ппѨૐೣ • 4USBJHIUGPSXBSEೞ૑݅ো࢑۝੉࠺ऱҊ7BMJEBUJPO4FUী0WFSGJUUJOHೡࣻ੓਺ • %BUB%SJWFO1SVOJOH • п-BZFS੄%SPQ3BUF ܳ೟णೞݴ -BZFSٜ੄"WFSBHF3BUFܳ ৬эѱೣ • -BZFS੄"DUJWBUJPOਵ۽ ܳ೟णೞҊ4PGUNBYܳ҅࢑ೞחഋక • *OGFSFODJOH5JNFী4PGUNBY0VUQVU઺5PQLܳࡳইࢲ೧׼-BZFSٜਸࢎਊೣ p d pd p pd d ≡ 0(mod⌊ 1 p ⌋)
  9. 1SVOJOH#BTJDTPO.VMUJ)FBE"UUFOUJPOCBTFE.PEFMT 1SVOJOHBU*OGFSFODF5JNFn4FUUJOHUIFESPQSBUFGPSPQUJNBMQSVOJOH • (SPVQ੉ ѐ੓Ҋ 'JYFE%SPQ3BUJP о੓ਸٸ (SPVQٜ੄ಣӐіࣻח ੉޲۽ ѐ੄(SPVQਸ1SVOJOHೠ׮Ҋೞݶ

    0QUJNBM%SPQ3BUF ח  • ֫਷1SVOJOH3BUFח੘਷.PEFMীࢲ؊ա਷ࢿמਸࠁ੐ • -BZFS%SPQীࢲח ܳࢎਊ೮ਵա ੘਷.PEFMীࢲח ܳӂ੢ೣ N p N(1 − p) r p* p* = 1 − r N p = 0.2 p = 0.5