Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LSPC deep-people for music processing #05 CNN

LSPC deep-people for music processing #05 CNN

筑波大学人と音の情報学研究室で行われた,
深層学習×音楽データの勉強会の資料を公開しています.
誤りなどがあるかもしれません.その場合,ご指摘お願いします.

#05 畳み込みニューラルネットワーク

Ae87a6075a4bc957a6775c6d70c8be90?s=128

Yuya Yamamoto

May 05, 2022
Tweet

More Decks by Yuya Yamamoto

Other Decks in Research

Transcript

  1. Convolutional Neural Network (CNN) Deep-people #5

  2. લճͷ͓͞Β͍ • ਂ૚ֶशͷ·ͱΊ • ύʔηϓτϩϯ -> χϡʔϥϧωοτϫʔΫͷ͘͠Έ • ׆ੑԽؔ਺ •

    ֶशʢޯ഑߱Լ๏ɼޡࠩٯ఻೻ etcʣ • PyTorchͷجૅ 2
  3. ࠓճͷ͓࿩ • ৞ΈࠐΈχϡʔϥϧωοτϫʔΫʢҎ߱ɼCNNʣ • ը૾ॲཧͰ੒ޭΛ਱͛ͨχϡʔϥϧωοτϫʔΫ 3 (SBEJFOU#BTFE-FBSOJOH"QQMJFEUP%PDVNFOU3FDPHOJUJPO :BOO-F$VO -FPO#PUUPV :PTIVB#FOHJP

    BOE1BUSJDL)BOFS 
  4. ը૾σʔλ΍ԻσʔλͳͲͷηϯαʔσʔλʹNN͸ඇޮ཰త • ը૾ɾԻͷσʔλ͸͍ΘΏΔ৴߸ɼߴ͍࣍ݩΛ͍࣋ͬͯΔ • ը૾ɿ256 * 256 • Իɿ3ඵͷϝϧεϖΫτϩάϥϜͰ128 *

    300 • NNʢઢܗ݁߹ʴ׆ੑԽؔ਺ʣΛద༻͢Δͱɼύϥϝʔλ਺͕ඇৗʹଟ͘ͳΔ • ӅΕ૚ͷαΠζ͕256ͱ͢Δͱɼ256x256ͷΧϥʔը૾ͳΒɼ5000ສ΋ύϥϝʔλ͕ඞཁ • ͜ͷ··ͩͱաֶश͢Δͷ͸ՐΛݟΔΑΓ໌Β͔ • ͳΜͱ͔͍͍ํ๏ͰύϥϝʔλΛݮΒͤͳ͍ͩΖ͏͔ʁ 4
  5. ը૾ɾԻͷ΋ͭߏ଄ʹண໨ 5 Ґஔෆมੑ ہॴੑ ର৅ͱͳΔ෺ମ͸ۭؒతʹ·ͱ·Δ ର৅͸Ґஔʹಠཱɿ Ͳ͜ʹ͋Ζ͏͕ର৅ࣗମ͕มΘΔ͜ͱ͸ͳ͍ ɾௗ͕͍ΔྖҬ͸ը૾શମʹ͸ࢄΒ͹Βͳ͍ ɾͲ͜ʹ͍ͯ΋ௗ͸ௗͱೝࣝͰ͖Δ ɾυϥϜԻ͸ಛఆ࣌ؒɼ

    प೾਺ଳҬʹ·ͱ·ΓΛݟͤΔ ɾͲͷ࣌ؒʹͳΒͦ͏͕Իڹతੑ࣭͸ෆม
  6. ہॴతͳ৘ใʹڧ͘ண໨͢Δʂ • Ұఆͷண໨ྖҬʢड༰໺ʣͷΈʹରͯ͠ઢܗԋࢉΛߦ͏Α͏ʹ͢Δ • →͞ΒʹɼॏΈΛҐஔͰෆมʹʢڞ༗ʣ͢Ε͹͞Βʹύϥϝʔλ͕ݮΔʂ • ͜Ε͕৞ΈࠐΈԋࢉͱಉ౳ʹʂ 6

  7. CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution૚ • ৞ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •

    Pooling૚ • μ΢ϯαϯϓϦϯάͷํ๏ • શ݁߹૚ • CNN -> ಛ௃நग़ɼશ݁߹૚Λࣝผثʹ 7
  8. CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution૚ • ৞ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •

    Pooling૚ • μ΢ϯαϯϓϦϯάͷํ๏ • શ݁߹૚ • CNN -> ಛ௃நग़ɼશ݁߹૚Λࣝผثʹ 8
  9. ৞ΈࠐΈ૚ʢConvolutional layerʣ • ೖྗσʔλʹର͠ʮΧʔωϧʯΛ৞ΈࠐΈɼ ಛ௃ϚοϓΛग़ྗ͢Δ • ΧʔωϧͰہॴੑͷ৘ใΛಘΔ • ֤ϐΫηϧ஋ͱΧʔωϧΛ৐ࢉɹɹɹɹɹɹ ->

    ֻ͚ࢉͨ͠෦෼ͷ࿨ΛऔΔɹɹ ɹɹɹɹɹɹ -> ͣΒ͢ ɾɾɾͷ܁Γฦ͠ 9 https://medium.datadriveninvestor.com/ convolutional-neural-networks-3b241a5da51e ΑΓ ೖྗ Χʔωϧ ग़ྗ
  10. ৞ΈࠐΈԋࢉʹΑΔۭؒతಛ௃நग़ • ΧʔωϧͷॏΈwͷ૊Έ߹Θͤʹ Αͬͯɼ༷ʑͳಛ௃Λଊ͑ΒΕΔ • ্ɿϥϓϥγΞϯϑΟϧλ • ԼɿΨ΢γΞϯϑΟϧλ 10 https://algorithm.joho.info/image-processing/laplacian-

    fi lter/                   ྠֲ ΅͔͠ https://algorithm.joho.info/image-processing/gaussian- fi lter/
  11. CNNͰ͸ 11 ೖྗσʔλ Χʔωϧ ಛ௃Ϛοϓ https://ml4a.github.io/ml4a/jp/convnets/ • ΧʔωϧΛෳ਺࣋ͨͤɼෳ਺ͷಛ௃ྔʢಛ௃ϚοϓʣΛಘΔ • Χʔωϧͷ஋Λֶशͤͯ͞దԠతͳֶशΛ࣮ݱ͢Δ

  12. ৞ΈࠐΈ૚ͷೖग़ྗͷؔ܎ • 2࣍ݩ৞ΈࠐΈͷ৔߹ɼೖྗ͸Channel, Height, Widthͷ3࣍ݩ • Channel਺ɿೖྗ͕Χϥʔը૾ɿRGBͰ3ɼεϖΫτϩάϥϜɿ1 • ϛχόονֶशΛద༻͢Δ৔߹͸ (Batch,

    Channel, Height, Width)ͷ4࣍ݩ • ग़ྗChannel਺͸ద༻ͤ͞ΔΧʔωϧ਺ʹҰக 12
  13. ετϥΠυʢStrideʣͱύσΟϯάʢPaddingʣ • ετϥΠυɿΧʔωϧΛͣΒ͢෯ • ύσΟϯάɿೖྗαΠζΛେ͖͘͢ΔͨΊɼपΓʹ0ͷ৽͍͠ηϧͰғΉ 13

  14. CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution૚ • ৞ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •

    Pooling૚ • μ΢ϯαϯϓϦϯάͷํ๏ • શ݁߹૚ • CNN -> ಛ௃நग़ɼશ݁߹૚Λࣝผثʹ 14
  15. ϓʔϦϯάʢPoolingʣ • ৞ΈࠐΈͯ͠ಘΒΕͨಛ௃ϚοϓΛμ΢ϯαϯϓϧ • ࣍ݩѹॖͷ໾ׂΛՌͨ͢ˠҐஔෆมੑͷ֫ಘʹॏཁͱ͞Ε͍ͯΔ • Max poolingɿྖҬͷ࠷େ஋Λ࠾༻ • Average

    poolingɿྖҬͷ2৐ฏۉΛར༻ 15
  16. CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution૚ • ৞ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •

    Pooling૚ • μ΢ϯαϯϓϦϯάͷํ๏ • શ݁߹૚ • CNN -> ಛ௃நग़ɼશ݁߹૚Λࣝผثʹ 16
  17. શ݁߹૚ʢFully-connected layerʣ • ֤ηϧ͕࣍ͷ૚ͷηϧͱશͯܨ͕͍ͬͯΔ૚ • Έͳ͞Μ͕લճֶΜͩΞϑΟϯม׵Λߦ͏૚ͷ͜ͱ • ΋ͪΖΜ׆ੑԽؔ਺͖ͭ • Dense૚ͱ΋

    • ͍ΘΏΔग़ྗ΁ม׵͢ΔͨΊͷ૚ • ग़ྗ࣍ݩ͸ॴ๬ͷग़ྗͷ࣍ݩʹἧ͑Δ • ෼ྨ໰୊ɿΫϥε਺ʢ஋͸Ϋϥεͷ֬཰ʣ • ճؼ໰୊ɿ1ʢ஋͸ਪఆճؼ஋ʣetc… 17
  18. ࣮ࡍͷϞσϧʹ͍ͭͯ

  19. ը૾෼໺Ͱͷ༗໊Ϟσϧ ͦͷ1 19 AlexNet 2012೥ɼ૚Λਂͯ͘͠ੑೳ޲্ ը૾෼ྨͷImagenet challengeʹͯ2ҐҎԼʹѹউ VGG 2015೥ɼΧʔωϧαΠζΛখ͘͞ʢ3x3ʣʹɼ͞Βʹਂ͘ ͢Δ͜ͱʹΑͬͯਫ਼౓޲্

    ଟ͘ͷݚڀͰ࠷΋γϯϓϧͳϞσϧͱͯ͠޿͘༻͍ΒΕΔ GoogLeNet 2015೥ɼΑΓগͳ͍ύϥϝʔλͰޮ཰తʹܭࢉΛߦ͏ InceptionػߏΛಋೖɿαΠζͷখ͍͞৞ΈࠐΈ or ϓʔϦϯά ΛೖྗͷҰ෦ʹద༻͠ύϥϝʔλΛ࡟ݮ ResNet Skip-connectionʢ૚ͷग़ྗʹೖྗΛ଍͢ʣΛಋೖ͠ɼ ૚Λਂͯ͘͠΋ޯ഑ফࣦ͠ͳ͍Α͏ʹͨ͠ %FFQ3FTJEVBM-FBSOJOHGPS*NBHF3FDPHOJUJPO ,BJNJOH)F 9JBOHZV;IBOH 4IBPRJOH3FO +JBO4VO $713  (PJOH%FFQFSXJUI$POWPMVUJPOT 4[FHFEZFUBM $713  7FSZ%FFQ$POWPMVUJPOBM/FUXPSLTGPS-BSHF4DBMF*NBHF3FDPHOJUJPO ,BSFO4JNPOZBO "OESFX;JTTFSNBO *$-3  *NBHF/FUDMBTTJ fi DBUJPOXJUIEFFQDPOWPMVUJPOBMOFVSBMOFUXPSLT "MFY,SJ[IFWTLZ *MZB4VUTLFWFS (FP ff SFZ&)JOUPO /*14 
  20. ը૾෼໺Ͱͷ༗໊Ϟσϧ ͦͷ2 20 DenseNet 2017೥ɼResNetʹΠϯεύΠΞɼશͯͷ૚ͷதؒग़ ྗΛ݁߹ʢConcatʣ͠ɼύϥϝʔλ࡟ݮ Squeeze-and-ExcitationʢSENetʣ 2018೥ɼಛ௃Ϛοϓͷ͏ͪ༗༻ͳ΋ͷʹॏΈΛ͚ͭΔ ήʔτػߏΛ௥Ճͯ͠ੑೳ޲্ Ef

    fi cientNet 2019೥ɼਅͷޮ཰ੑͷͨΊʹCNNΛͲ͏ઃܭ͢Ε͹͍͍͔ʁ ૚ͷਂ͞ɼΧʔωϧͷ޿͞ɼೖྗͷղ૾౓౳Λνϡʔχϯά εέʔϧΞοϓ࣌ɼ֤ཁૉΛͱ͋Δఆ਺Ͱఆ਺ഒ͢Ε͹͍͍ Deformable Convolution 2017೥ɼ෺ମͷܗঢ়ͷଟ༷ੑʹରԠ͢ΔͨΊʹ ΧʔωϧΛόϥόϥʹ͢Δ %FOTFMZ$POOFDUFE$POWPMVUJPOBM/FUXPSLT  (BP)VBOH ;IVBOH-JV -BVSFOTWBOEFS .BBUFO ,JMJBO28FJOCFSHFS $713  & ffi DJFOU/FU3FUIJOLJOH.PEFM4DBMJOHGPS$POWPMVUJPOBM/FVSBM/FUXPSLT .5BOFUBM*$.-  4RVFF[FBOE&YDJUBUJPO/FUXPSLT +JF)V -J 4IFO 4BNVFM"MCBOJF (BOH4VO &OIVB 8V $713  %FGPSNBCMF$POWPMVUJPO/FUXPSLT +%BJFUBM *$$7 
  21. Իڹσʔλʹ͓͚ΔCNNͷద༻ํ๏ • Իڹͷࣝผ໰୊ͷεςοϓʹ͓͚Δɼಛ௃நग़ʹ༻͍Δ 21 'SPOU&OE #BDL&OE ϑΟϧλόϯΫ౳ͷ લॲཧ ہॴతಛ௃நग़ ಛ௃ͷཁ໿ࣝผ

    ϝϧεϖΫτϩάϥϜ CQT CNN Global Average Pooling RNN/MLP Attention ex. ਺ϑϨʔϜ෼ɼہॴతʹCNNΛద༻ͤ͞ ͦΕΛϓʔϦϯάͰ·ͱΊΔ
  22. Իʹ͓͚ΔϞσϧᶃɿ೾ܗ+1DCNN • 1࣍ݩʢ1DʣCNNΛ༻͍ͯ৴߸೾ܗΛೖྗ • ϑϨʔϜϨϕϧͷΧʔωϧ࣍ݩʢ256, 512, 1024ʣ Ͱ͸ϝϧεϖΫτϩάϥϜΑΓੑೳ͕௿͔ͬͨ • αϯϓϧϨϕϧ·Ͱམͱͯ͠ɼʢ3,

    9ʣ૚Λਂ͘͢ Ε͹ੑೳ͸޲্ʢSampleCNNʣ • σʔλ਺͕ଟ͍΄Ͳ೾ܗೖྗ͸༗ޮ 22 4BNQMFMFWFM%FFQ$POWPMVUJPOBM/FVSBM/FUXPSLTGPS.VTJD"VUP5BHHJOH6TJOH3BX 8BWFGPSNT+POHQJM-FF +JZPVOH1BSL ,FVOIZPVOH-VLF,JN +VIBO/BN 4.$  &OEUPFOEMFBSOJOHGPS.VTJD"VEJP  4BOEFS%JFMFNBO #FOKBNJO 4DISBVXFO *$"441 
  23. Իʹ͓͚ΔϞσϧᶄ 2D CNN + εϖΫτϩάϥϜ • ઌ΄Ͳ·Ͱઆ໌ͨ͠ը૾ + 2࣍ݩΧʔωϧΛ࣋ͭCNN ʢ2DCNNʣͷద༻

    • εϖΫτϩάϥϜΛ2࣍ݩͷϕΫτϧͱΈͳ͠ɼ 1Channelը૾ͱΈͳ͠ॲཧ • ଟ͘ͷ৔߹1DCNNΑΓੑೳ͕ྑ͘ͳΔ • ॎʢHʣΛप೾਺ɼԣʢWʣΛ࣌ؒ࣍ݩͱ͢Δ • ͜ͷߏ଄ʹண໨͠ɼਖ਼ํܗͰͳ͍ΧʔωϧΛ࣋ͭɹɹɹɹ CNN΋ఏҊ͞Ε͖ͯͨ 23 %FFQ-FBSOJOHGPS"VEJP#BTFE.VTJD$MBTTJ fi DBUJPOBOE 5BHHJOH +VIBO/BN ,FVOXPP$IPJ +POHQJM-FF 4[V:V$IPV  BOE:J)TVBO:BOH *&&&41.  &YQFSJNFOUJOHXJUINVTJDBMMZNPUJWBUFEDPOWPMVUJPOBMOFVSBM OFUXPSLT +PSEJ1POT 5IPNBT-JEZ 9BWJFS4FSSB $#.* 
  24. ·ͱΊ • CNNʹ͍ͭͯղઆ • ߴ͍࣍ݩΛ΋ͭηϯαʔσʔλʹର͠ɼہॴੑͱɼҐஔෆมੑʹج͖ͮ ύϥϝʔλ࡟ݮͨ͠NN • ৞ΈࠐΈ૚ɼϓʔϦϯά૚ɼશ݁߹૚Λ΋ͭ • ը૾ॲཧͰ͸༷ʑͳϞσϧ͕ఏҊ͞Ε͍ͯΔ

    • ԻσʔλͰ͸εϖΫτϩάϥϜʴ2DCNN͕࠷΋޿͘༻͍ΒΕ͍ͯΔ 24
  25. ิ଍

  26. ग़ྗ૚ͷ࡞Γํ • Ground Truthͷදݱ • ୯ϥϕϧ෼ྨɿone-hotදݱʢ5ΫϥεͳΒ [0,0,1,0,0]ʣ • ෳ਺ϥϕϧ෼ྨɿmulti-hotදݱʢ5ΫϥεͳΒ [1,0,1,0,1]ʣ

    • ग़ྗ૚ͷදݱ • ୯ϥϕϧ෼ྨɿSoftmax - ΫϥεؒͰ߹ܭ͢Δͱ1ʹͳΔΑ͏ʹεέʔϧ • ෳ਺ϥϕϧ෼ྨɿSigmoid - ֤ΫϥεͰON/OFFͱͳΔ֬཰஋Λಠཱʹग़ྗɽ஋Ҭ͸ [0,1] • ୯ϥϕϧɿԻָ෼ྨɼՎखࣝผ౳ɼෳ਺ϥϕϧɿԻָλά͚ͮɼࠞ߹Ի͔Βͷָثࣝผ౳ 26
  27. Ϧιʔε • ֶशࡁΈ༗໊Ϟσϧ • https://github.com/minzwon/sota-music-tagging-models • https://github.com/jordipons/musicnn • https://github.com/tensor fl

    ow/models/tree/master/research/audioset/vggish • https://github.com/marl/openl3 • Իָ෼ྨʹؔ͢Δσʔληοτ • GTZAN: http://marsyas.info/downloads/datasets.html • Free Music Archive: https://github.com/mdeff/fma • MagnaTagaTune: https://mirg.city.ac.uk/codeapps/the-magnatagatunedataset • Million Song Dataset: http://millionsongdataset.com/ 27