Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
LSPC deep-people for music processing #05 CNN
Yuya Yamamoto
May 05, 2022
Research
0
8
LSPC deep-people for music processing #05 CNN
筑波大学人と音の情報学研究室で行われた,
深層学習×音楽データの勉強会の資料を公開しています.
誤りなどがあるかもしれません.その場合,ご指摘お願いします.
#05 畳み込みニューラルネットワーク
Yuya Yamamoto
May 05, 2022
Tweet
Share
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
21
LSPC deep-people for music processing #01 導入
yamathcy
0
9
LSPC deep-people for music processing #06 RNN
yamathcy
0
5
LSPC博士前期チュートリアル
yamathcy
0
81
2020年度修士論文最終発表
yamathcy
0
28
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
500
SIGMUS130-yamamoto
yamathcy
0
27
#muana IRM
yamathcy
0
1k
論文サタデーナイト#01 SEMI-SUPERVISED LEARNING USING TEACHER-STUDENT MODELS FOR VOCAL MELODY EXTRACTION
yamathcy
0
110
Other Decks in Research
See All in Research
論述リビジョンのためのメタ評価基盤
chemical_tree
0
130
科学技術情報分析の面白さ
hayataka88
5
4.8k
Recent Findings on Density-Ratio Approachesin Machine Learning
masakat0
0
230
統計・機械学習若手シンポジウム
masakat0
2
1.8k
第9回チャンピオンズミーティング・カプリコーン杯ラウンド2集計 / Umamusume Capricorn 2022 Round2
kitachan_black
0
1.6k
データに基づくレヴィ流行語大賞2021
levii
0
230
Instance-Based Neural Dependency Parsing
hiroki13
1
140
20211222_黒部市社協とjinjerの共同研究
koshiba_noriaki
0
120
幼少期の自然体験が理科学習への態度に及ぼす影響
arumakan
0
830
Natural language processing tells us the shape of language
eumesy
0
280
まだ生態学に本格導入されていない統計的因果推論手法の紹介:傾向スコア、回帰分断デザイン、操作変数法を中心に
takehikoihayashi
2
860
スマートシティ基盤FIWAREをNode-REDで使う
kikuzo
0
330
Featured
See All Featured
Imperfection Machines: The Place of Print at Facebook
scottboms
253
11k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
498
130k
The Power of CSS Pseudo Elements
geoffreycrofte
46
3.9k
How to name files
jennybc
39
59k
Rails Girls Zürich Keynote
gr2m
86
12k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
237
19k
What the flash - Photography Introduction
edds
61
10k
KATA
mclloyd
7
8.6k
A designer walks into a library…
pauljervisheath
196
16k
Become a Pro
speakerdeck
PRO
3
790
Facilitating Awesome Meetings
lara
29
3.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
337
17k
Transcript
Convolutional Neural Network (CNN) Deep-people #5
લճͷ͓͞Β͍ • ਂֶशͷ·ͱΊ • ύʔηϓτϩϯ -> χϡʔϥϧωοτϫʔΫͷ͘͠Έ • ׆ੑԽؔ •
ֶशʢޯ߱Լ๏ɼޡࠩٯ etcʣ • PyTorchͷجૅ 2
ࠓճͷ͓ • ΈࠐΈχϡʔϥϧωοτϫʔΫʢҎ߱ɼCNNʣ • ը૾ॲཧͰޭΛ͛ͨχϡʔϥϧωοτϫʔΫ 3 (SBEJFOU#BTFE-FBSOJOH"QQMJFEUP%PDVNFOU3FDPHOJUJPO :BOO-F$VO -FPO#PUUPV :PTIVB#FOHJP
BOE1BUSJDL)BOFS
ը૾σʔλԻσʔλͳͲͷηϯαʔσʔλʹNNඇޮత • ը૾ɾԻͷσʔλ͍ΘΏΔ৴߸ɼߴ͍࣍ݩΛ͍࣋ͬͯΔ • ը૾ɿ256 * 256 • Իɿ3ඵͷϝϧεϖΫτϩάϥϜͰ128 *
300 • NNʢઢܗ݁߹ʴ׆ੑԽؔʣΛద༻͢Δͱɼύϥϝʔλ͕ඇৗʹଟ͘ͳΔ • ӅΕͷαΠζ͕256ͱ͢Δͱɼ256x256ͷΧϥʔը૾ͳΒɼ5000ສύϥϝʔλ͕ඞཁ • ͜ͷ··ͩͱաֶश͢ΔͷՐΛݟΔΑΓ໌Β͔ • ͳΜͱ͔͍͍ํ๏ͰύϥϝʔλΛݮΒͤͳ͍ͩΖ͏͔ʁ 4
ը૾ɾԻͷͭߏʹண 5 Ґஔෆมੑ ہॴੑ ରͱͳΔମۭؒతʹ·ͱ·Δ ରҐஔʹಠཱɿ Ͳ͜ʹ͋Ζ͏͕ରࣗମ͕มΘΔ͜ͱͳ͍ ɾௗ͕͍ΔྖҬը૾શମʹࢄΒΒͳ͍ ɾͲ͜ʹ͍ͯௗௗͱೝࣝͰ͖Δ ɾυϥϜԻಛఆ࣌ؒɼ
पଳҬʹ·ͱ·ΓΛݟͤΔ ɾͲͷ࣌ؒʹͳΒͦ͏͕Իڹతੑ࣭ෆม
ہॴతͳใʹڧ͘ண͢Δʂ • ҰఆͷணྖҬʢड༰ʣͷΈʹରͯ͠ઢܗԋࢉΛߦ͏Α͏ʹ͢Δ • →͞ΒʹɼॏΈΛҐஔͰෆมʹʢڞ༗ʣ͢Ε͞Βʹύϥϝʔλ͕ݮΔʂ • ͜Ε͕ΈࠐΈԋࢉͱಉʹʂ 6
CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution • ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •
Pooling • μϯαϯϓϦϯάͷํ๏ • શ݁߹ • CNN -> ಛநग़ɼશ݁߹Λࣝผثʹ 7
CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution • ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •
Pooling • μϯαϯϓϦϯάͷํ๏ • શ݁߹ • CNN -> ಛநग़ɼશ݁߹Λࣝผثʹ 8
ΈࠐΈʢConvolutional layerʣ • ೖྗσʔλʹର͠ʮΧʔωϧʯΛΈࠐΈɼ ಛϚοϓΛग़ྗ͢Δ • ΧʔωϧͰہॴੑͷใΛಘΔ • ֤ϐΫηϧͱΧʔωϧΛࢉɹɹɹɹɹɹ ->
ֻ͚ࢉͨ͠෦ͷΛऔΔɹɹ ɹɹɹɹɹɹ -> ͣΒ͢ ɾɾɾͷ܁Γฦ͠ 9 https://medium.datadriveninvestor.com/ convolutional-neural-networks-3b241a5da51e ΑΓ ೖྗ Χʔωϧ ग़ྗ
ΈࠐΈԋࢉʹΑΔۭؒతಛநग़ • ΧʔωϧͷॏΈwͷΈ߹Θͤʹ Αͬͯɼ༷ʑͳಛΛଊ͑ΒΕΔ • ্ɿϥϓϥγΞϯϑΟϧλ • ԼɿΨγΞϯϑΟϧλ 10 https://algorithm.joho.info/image-processing/laplacian-
fi lter/ ྠֲ ΅͔͠ https://algorithm.joho.info/image-processing/gaussian- fi lter/
CNNͰ 11 ೖྗσʔλ Χʔωϧ ಛϚοϓ https://ml4a.github.io/ml4a/jp/convnets/ • ΧʔωϧΛෳ࣋ͨͤɼෳͷಛྔʢಛϚοϓʣΛಘΔ • ΧʔωϧͷΛֶशͤͯ͞దԠతͳֶशΛ࣮ݱ͢Δ
ΈࠐΈͷೖग़ྗͷؔ • 2࣍ݩΈࠐΈͷ߹ɼೖྗChannel, Height, Widthͷ3࣍ݩ • Channelɿೖྗ͕Χϥʔը૾ɿRGBͰ3ɼεϖΫτϩάϥϜɿ1 • ϛχόονֶशΛద༻͢Δ߹ (Batch,
Channel, Height, Width)ͷ4࣍ݩ • ग़ྗChannelద༻ͤ͞ΔΧʔωϧʹҰக 12
ετϥΠυʢStrideʣͱύσΟϯάʢPaddingʣ • ετϥΠυɿΧʔωϧΛͣΒ͢෯ • ύσΟϯάɿೖྗαΠζΛେ͖͘͢ΔͨΊɼपΓʹ0ͷ৽͍͠ηϧͰғΉ 13
CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution • ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •
Pooling • μϯαϯϓϦϯάͷํ๏ • શ݁߹ • CNN -> ಛநग़ɼશ݁߹Λࣝผثʹ 14
ϓʔϦϯάʢPoolingʣ • ΈࠐΈͯ͠ಘΒΕͨಛϚοϓΛμϯαϯϓϧ • ࣍ݩѹॖͷׂΛՌͨ͢ˠҐஔෆมੑͷ֫ಘʹॏཁͱ͞Ε͍ͯΔ • Max poolingɿྖҬͷ࠷େΛ࠾༻ • Average
poolingɿྖҬͷ2ฏۉΛར༻ 15
CNNͷ࣋ͭػߏʹ͍ͭͯ • Convolution • ΈࠐΈܭࢉ • ετϥΠυ • ύσΟϯά •
Pooling • μϯαϯϓϦϯάͷํ๏ • શ݁߹ • CNN -> ಛநग़ɼશ݁߹Λࣝผثʹ 16
શ݁߹ʢFully-connected layerʣ • ֤ηϧ͕࣍ͷͷηϧͱશͯܨ͕͍ͬͯΔ • Έͳ͞Μ͕લճֶΜͩΞϑΟϯมΛߦ͏ͷ͜ͱ • ͪΖΜ׆ੑԽ͖ؔͭ • Denseͱ
• ͍ΘΏΔग़ྗม͢ΔͨΊͷ • ग़ྗ࣍ݩॴͷग़ྗͷ࣍ݩʹἧ͑Δ • ྨɿΫϥεʢΫϥεͷ֬ʣ • ճؼɿ1ʢਪఆճؼʣetc… 17
࣮ࡍͷϞσϧʹ͍ͭͯ
ը૾Ͱͷ༗໊Ϟσϧ ͦͷ1 19 AlexNet 2012ɼΛਂͯ͘͠ੑೳ্ ը૾ྨͷImagenet challengeʹͯ2ҐҎԼʹѹউ VGG 2015ɼΧʔωϧαΠζΛখ͘͞ʢ3x3ʣʹɼ͞Βʹਂ͘ ͢Δ͜ͱʹΑͬͯਫ਼্
ଟ͘ͷݚڀͰ࠷γϯϓϧͳϞσϧͱͯ͘͠༻͍ΒΕΔ GoogLeNet 2015ɼΑΓগͳ͍ύϥϝʔλͰޮతʹܭࢉΛߦ͏ InceptionػߏΛಋೖɿαΠζͷখ͍͞ΈࠐΈ or ϓʔϦϯά ΛೖྗͷҰ෦ʹద༻͠ύϥϝʔλΛݮ ResNet Skip-connectionʢͷग़ྗʹೖྗΛ͢ʣΛಋೖ͠ɼ Λਂͯ͘͠ޯফࣦ͠ͳ͍Α͏ʹͨ͠ %FFQ3FTJEVBM-FBSOJOHGPS*NBHF3FDPHOJUJPO ,BJNJOH)F 9JBOHZV;IBOH 4IBPRJOH3FO +JBO4VO $713 (PJOH%FFQFSXJUI$POWPMVUJPOT 4[FHFEZFUBM $713 7FSZ%FFQ$POWPMVUJPOBM/FUXPSLTGPS-BSHF4DBMF*NBHF3FDPHOJUJPO ,BSFO4JNPOZBO "OESFX;JTTFSNBO *$-3 *NBHF/FUDMBTTJ fi DBUJPOXJUIEFFQDPOWPMVUJPOBMOFVSBMOFUXPSLT "MFY,SJ[IFWTLZ *MZB4VUTLFWFS (FP ff SFZ&)JOUPO /*14
ը૾Ͱͷ༗໊Ϟσϧ ͦͷ2 20 DenseNet 2017ɼResNetʹΠϯεύΠΞɼશͯͷͷதؒग़ ྗΛ݁߹ʢConcatʣ͠ɼύϥϝʔλݮ Squeeze-and-ExcitationʢSENetʣ 2018ɼಛϚοϓͷ͏ͪ༗༻ͳͷʹॏΈΛ͚ͭΔ ήʔτػߏΛՃͯ͠ੑೳ্ Ef
fi cientNet 2019ɼਅͷޮੑͷͨΊʹCNNΛͲ͏ઃܭ͢Ε͍͍͔ʁ ͷਂ͞ɼΧʔωϧͷ͞ɼೖྗͷղ૾Λνϡʔχϯά εέʔϧΞοϓ࣌ɼ֤ཁૉΛͱ͋ΔఆͰఆഒ͢Ε͍͍ Deformable Convolution 2017ɼମͷܗঢ়ͷଟ༷ੑʹରԠ͢ΔͨΊʹ ΧʔωϧΛόϥόϥʹ͢Δ %FOTFMZ$POOFDUFE$POWPMVUJPOBM/FUXPSLT (BP)VBOH ;IVBOH-JV -BVSFOTWBOEFS .BBUFO ,JMJBO28FJOCFSHFS $713 & ffi DJFOU/FU3FUIJOLJOH.PEFM4DBMJOHGPS$POWPMVUJPOBM/FVSBM/FUXPSLT .5BOFUBM*$.- 4RVFF[FBOE&YDJUBUJPO/FUXPSLT +JF)V -J 4IFO 4BNVFM"MCBOJF (BOH4VO &OIVB 8V $713 %FGPSNBCMF$POWPMVUJPO/FUXPSLT +%BJFUBM *$$7
Իڹσʔλʹ͓͚ΔCNNͷద༻ํ๏ • Իڹͷࣝผͷεςοϓʹ͓͚Δɼಛநग़ʹ༻͍Δ 21 'SPOU&OE #BDL&OE ϑΟϧλόϯΫͷ લॲཧ ہॴతಛநग़ ಛͷཁࣝผ
ϝϧεϖΫτϩάϥϜ CQT CNN Global Average Pooling RNN/MLP Attention ex. ϑϨʔϜɼہॴతʹCNNΛద༻ͤ͞ ͦΕΛϓʔϦϯάͰ·ͱΊΔ
Իʹ͓͚ΔϞσϧᶃɿܗ+1DCNN • 1࣍ݩʢ1DʣCNNΛ༻͍ͯ৴߸ܗΛೖྗ • ϑϨʔϜϨϕϧͷΧʔωϧ࣍ݩʢ256, 512, 1024ʣ ͰϝϧεϖΫτϩάϥϜΑΓੑೳ͕͔ͬͨ • αϯϓϧϨϕϧ·Ͱམͱͯ͠ɼʢ3,
9ʣΛਂ͘͢ Εੑೳ্ʢSampleCNNʣ • σʔλ͕ଟ͍΄Ͳܗೖྗ༗ޮ 22 4BNQMFMFWFM%FFQ$POWPMVUJPOBM/FVSBM/FUXPSLTGPS.VTJD"VUP5BHHJOH6TJOH3BX 8BWFGPSNT+POHQJM-FF +JZPVOH1BSL ,FVOIZPVOH-VLF,JN +VIBO/BN 4.$ &OEUPFOEMFBSOJOHGPS.VTJD"VEJP 4BOEFS%JFMFNBO #FOKBNJO 4DISBVXFO *$"441
Իʹ͓͚ΔϞσϧᶄ 2D CNN + εϖΫτϩάϥϜ • ઌ΄Ͳ·Ͱઆ໌ͨ͠ը૾ + 2࣍ݩΧʔωϧΛ࣋ͭCNN ʢ2DCNNʣͷద༻
• εϖΫτϩάϥϜΛ2࣍ݩͷϕΫτϧͱΈͳ͠ɼ 1Channelը૾ͱΈͳ͠ॲཧ • ଟ͘ͷ߹1DCNNΑΓੑೳ͕ྑ͘ͳΔ • ॎʢHʣΛपɼԣʢWʣΛ࣌ؒ࣍ݩͱ͢Δ • ͜ͷߏʹண͠ɼਖ਼ํܗͰͳ͍ΧʔωϧΛ࣋ͭɹɹɹɹ CNNఏҊ͞Ε͖ͯͨ 23 %FFQ-FBSOJOHGPS"VEJP#BTFE.VTJD$MBTTJ fi DBUJPOBOE 5BHHJOH +VIBO/BN ,FVOXPP$IPJ +POHQJM-FF 4[V:V$IPV BOE:J)TVBO:BOH *&&&41. &YQFSJNFOUJOHXJUINVTJDBMMZNPUJWBUFEDPOWPMVUJPOBMOFVSBM OFUXPSLT +PSEJ1POT 5IPNBT-JEZ 9BWJFS4FSSB $#.*
·ͱΊ • CNNʹ͍ͭͯղઆ • ߴ͍࣍ݩΛͭηϯαʔσʔλʹର͠ɼہॴੑͱɼҐஔෆมੑʹج͖ͮ ύϥϝʔλݮͨ͠NN • ΈࠐΈɼϓʔϦϯάɼશ݁߹Λͭ • ը૾ॲཧͰ༷ʑͳϞσϧ͕ఏҊ͞Ε͍ͯΔ
• ԻσʔλͰεϖΫτϩάϥϜʴ2DCNN͕࠷͘༻͍ΒΕ͍ͯΔ 24
ิ
ग़ྗͷ࡞Γํ • Ground Truthͷදݱ • ୯ϥϕϧྨɿone-hotදݱʢ5ΫϥεͳΒ [0,0,1,0,0]ʣ • ෳϥϕϧྨɿmulti-hotදݱʢ5ΫϥεͳΒ [1,0,1,0,1]ʣ
• ग़ྗͷදݱ • ୯ϥϕϧྨɿSoftmax - ΫϥεؒͰ߹ܭ͢Δͱ1ʹͳΔΑ͏ʹεέʔϧ • ෳϥϕϧྨɿSigmoid - ֤ΫϥεͰON/OFFͱͳΔ֬Λಠཱʹग़ྗɽҬ [0,1] • ୯ϥϕϧɿԻָྨɼՎखࣝผɼෳϥϕϧɿԻָλά͚ͮɼࠞ߹Ի͔Βͷָثࣝผ 26
Ϧιʔε • ֶशࡁΈ༗໊Ϟσϧ • https://github.com/minzwon/sota-music-tagging-models • https://github.com/jordipons/musicnn • https://github.com/tensor fl
ow/models/tree/master/research/audioset/vggish • https://github.com/marl/openl3 • Իָྨʹؔ͢Δσʔληοτ • GTZAN: http://marsyas.info/downloads/datasets.html • Free Music Archive: https://github.com/mdeff/fma • MagnaTagaTune: https://mirg.city.ac.uk/codeapps/the-magnatagatunedataset • Million Song Dataset: http://millionsongdataset.com/ 27