Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
muana vol.11 音楽識別の事前学習モデル
Search
Yuya Yamamoto
October 28, 2023
1
740
muana vol.11 音楽識別の事前学習モデル
Muana vol.11 WIP.
Yuya Yamamoto
October 28, 2023
Tweet
Share
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
GMI44@Music and Language研究のサーベイ報告
yamathcy
1
130
APSIPA 2023 Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
yamathcy
0
89
国際会議ISMIR2022報告(山本分)
yamathcy
0
180
Do It Yourself: Sonic Visualiserで音楽分析してみよう
yamathcy
0
1.6k
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
160
LSPC博士前期チュートリアル
yamathcy
0
190
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
1.8k
SIGMUS130-yamamoto
yamathcy
0
62
#muana IRM
yamathcy
0
2.4k
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
53
7.7k
How to Think Like a Performance Engineer
csswizardry
25
1.8k
Docker and Python
trallard
45
3.5k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.6k
Art, The Web, and Tiny UX
lynnandtonic
301
21k
Visualization
eitanlees
146
16k
It's Worth the Effort
3n
185
28k
Why Our Code Smells
bkeepers
PRO
337
57k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Scaling GitHub
holman
461
140k
Transcript
Իָࣝผʹ͑ͦ͏ͳࣄલֶशϞσϧͨͪ Λ֓؍͢Δ 2023/10/28 Music x Analytics meetup vol.11 ࢁຊ ༤ʢyamathcyʣ
ࣗݾհ • ࢁຊ ༤ʢyamathcyʣ • ஜେֶେֶӃത࢜ޙظ՝ఔ3 • ઐɿԻָɾԻڹใॲཧ • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷੳ
• ࠷ۙSSLϞσϧͰՎੳλεΫΛղ͍͍ͯΔ • ࠷ۙͷԻָͷϚΠϒʔϜ • 4’33" Tsukuba Remix. • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ 2 ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io
ͪΐͬͱ͚ͩએ͍ͤͯͩ͘͞͞m(_ _)m • ԻָใॲཧͷࠃࡍձٞɼISMIR2023 ͷจಡΈձΛओ࠵͍ͯ͠·͢ʂ • ΦϯϥΠϯɼ2023 11/22ʢਫʣ 18:00- •
ະͩProceedingsະެ։ͷͨΊϦεέ Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφϯε͠ ·͢ʣ • (Ϧεέޙ12্݄०༧ఆ) 3
ຊ
ԻָࣝผͷͨΊͷࣄલֶशϞσϧ ʔͬͱϞσϧͷհͱࣗͷλεΫͷద༻ʹ͍ͭͯ ͠·͢ ͪΐͬͱதڃऀҎ্دΓͷ͔͠Ε·ͤΜ
Իָࣝผ • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏཁૉΛग़ྗ͢ΔλεΫ 6 δϟϯϧɼϜʔυ ָثߏ ָේɼՎࢺ, etc…
ͦͦͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ • ཧ༝1ɿσʔληοτෆ • Իָѹతʹσʔλ͕Γͳ͍ • ΞϊςʔγϣϯͬͱΓͳ͍͠࡞Δ ίετ͕ߴ͍ • ཧ༝2ɿλεΫࣗମͷ͠͞
• ͦͦԻָͷղੳࣗମ͕͍͠ • DNNͬͨํ͕ੑೳग़Δ 7 https://yamathcy.github.io/ISMIR2022J-POP/ ࢁຊˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ Λ࡞Γ·͕ͨ͠࡞ʹ1͔͔ۙ͘Γ·ͨ͠... DNNͷύϫʔΛΈࠐΈ͍͕ͨσʔλෆɼͷδϨϯϚ
ࠓΊͪΌͪ͘Όͨ͘͞Μ 8 Ͳͷ͘Β͍͋Δ͔ Tutorial Self-supervised Representation Learning for Speech Processing
NAACL 2022 https://sites.google.com/view/tutorial-ssl-speech
ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9 ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ੜଞͷతͷ ϞσϧΛస༻ Musicnn CREPE ɿԻָɼ੨ɿԻɼࠇɿڥԻɼҰൠ VGGish
PANNs AST Whisper Wav2Vec2.0 HuBERT WavLM CLAP MapMusic2Vec MERT Data2Vec JukeMIR: JukeBox (ੜ༻) ͷ ಛΛར༻ Encodec (ߴੑೳͳԻѹॖ༻) ͋ΔλεΫΛڭࢣ͋Γֶश ͯ͠ɼࣅͨλεΫʹస༻ ϥϕϧͷͳ͍σʔλʹ ٖࣅతͳϥϕϧΛ༩ֶͯ͠श ݩʑࣝผ༻Ͱͳ͍Ϟσϧͷ தؒग़ྗΛࣝผʹ͏
ʮԻָੜϞσϧJukeBoxͷதؒಛΛࣝผʹͬͯ͠·͓͏ʯ 10 • Jukeboxͷ36ͷग़ྗΛಛྔͱͯࣝ͠ผʹར༻, • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓྑ͍ੑೳ • ָۂͷϝϩσΟͷࣖίϐͷԠ༻ྫ [Donahue 22]
JukeMIR [Castellon 21]
ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ 11 MERT [Li 22] • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ ֶ͘शΛߦ͏
• ԻָʹಛԽֶͨ͠शΛՃʢϐο νಛͷ෮ݩʣ • ଟ͘ͷλεΫͰSoTAʹඖఢ ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9
Իָͱݴޠͷͭͳ͕ΓͷؔΛֶश 12 ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ ݴޠԻָ - MusicLM - MusicGen Իָݴޠ -
MuLan - MusCALL - LPMusicCaps - MU-LLaMa - LLark->New!! ※αʔϏεͷΈͳΒͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ
ԻָΛղੳ͢ΔͷʹݴޠΛ͏Ϟσϧ 13 ԻָxLLM LP-MusicCaps: ೖྗͨ͠ۂʹ͍ͭͯͷ هड़จΛੜ [Doh 23] Mu-LLaMA: MERT+LLaMAͰɼ
Իָͷ༰ʹର͢Δ࣭Ԡ [Liu 23] LLark: ࣭Ԡهड़จੜΛҰൠԽɽ ΑΓԻָͷཁૉ ʢίʔυ/ςϯϙʣʹಛԽ [Gardner 23]
14 • ϥΠϒϥϦ • Hugging face: https://huggingface.co/ • s3prl: https://s3prl.github.io/s3prl/index.html
• Իͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ • จɼσϞͷಈΛ͏ • ਐ݄าͲ͜Ζ͔ඵਐาɽίʔυ/Ϟσϧެ։͞ΕΔ߹͕ଟ͍ͷ ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔ զʑ͕͍ͬͯ͘ʹ
తͷλεΫʢԼྲྀλεΫʣͰͷར༻ϑΝΠϯνϡʔχϯά 15 զʑ͕͍ͬͯ͘ʹ ࠷ऴ͚͚ͩସ͑ ग़ྗ֬Λ৽ͨʹ ಛྔͱͯ͠༻͍Δ ৽ͨʹՃֶश͢Δ ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ
ۙͷσΧ͍Ϟσϧʹ͍ͭͯͷTips 16 զʑ͕͍ͬͯ͘ʹ 🔥 🔥 🔥 🔥 ❄ Transformer EncoderͷΈΛֶश
֤ͷग़ྗΛॏΈ͚ͮฏۉ Adapter, LoRA, Pre fi x tuningͷ Parameter-ef fi cient FT [Chen 23] Ұ෦ͷ༗༻ͳͷग़ྗͷΈΛ͏ 㱻Իͷ߹ͷલޙͰ࣋ͭใ͕ҟͳΔ [Chen 22]
ͦΕͧΕจ͓ΑͼϕϯνϚʔΫΛνΣοΫ 17 ͲΕΛ͍͍͔͑ • ԻɿSUPERB • ԻָࣝผɿMARBLE ͋ͱࣗͰࢼ͔͢͠ͳ͍…
ݩͷυϝΠϯʹҙ 18 ͲΕΛ͍͍͔͑ • Վͷ߹ͳΒԻͰ͋Δఔɹ ͑ΔɼԻָͷϞσϧΑΓྑ͍߹ • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍ έʔε •
Reprogrammingͱ͍͏ΈΛͬ ͯదԠͤͯ͞͠·͏ݚڀ ՎखΛͯΔλεΫͩͱԻ>Իָ [Yamamoto 23] Wav2Vec2.0ϐον/ָثࣝผʹͦͷ··FTͰ͏ΑΓ ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]
ࣄલֶशϞσϧͷར༻ԻָͰΜʹʂ 19 • ֤छϞσϧͷհ • ԻָࣗମͷࣝผϚϧνϞʔμϧൃలதʂ • ·ͩ·ͩΓΓͯͳ͍͜ͱͨ͘͞Μ͋Δ ऴΘΓʹ Thank
you!!
ิ
HuBERT • MERTͷݩʹͳͬͨϞσϧ 21
JukeBOX • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε • https://openai.com/research/jukebox 22 ֶश࣌ ੜ࣌
Reprogramming • Ϟσϧֶ͕शͨ͠λεΫͱֶ͕ࣗश͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏ • ԻָࣝผλεΫʹ͓͍ͯɼڥԻ͓ΑͼԻϞσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23] 23 https://github.com/ ga642381/Speech- Prompts-Adapters