Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
muana vol.11 音楽識別の事前学習モデル
Search
Yuya Yamamoto
October 28, 2023
1
680
muana vol.11 音楽識別の事前学習モデル
Muana vol.11 WIP.
Yuya Yamamoto
October 28, 2023
Tweet
Share
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
GMI44@Music and Language研究のサーベイ報告
yamathcy
1
130
APSIPA 2023 Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
yamathcy
0
70
国際会議ISMIR2022報告(山本分)
yamathcy
0
150
Do It Yourself: Sonic Visualiserで音楽分析してみよう
yamathcy
0
1.2k
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
140
LSPC博士前期チュートリアル
yamathcy
0
170
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
1.8k
SIGMUS130-yamamoto
yamathcy
0
59
#muana IRM
yamathcy
0
2.4k
Featured
See All Featured
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2k
Unsuck your backbone
ammeep
669
57k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
170
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.2k
Docker and Python
trallard
42
3.1k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.3k
Speed Design
sergeychernyshev
25
680
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
2
290
The Cult of Friendly URLs
andyhume
78
6.1k
Done Done
chrislema
182
16k
Transcript
Իָࣝผʹ͑ͦ͏ͳࣄલֶशϞσϧͨͪ Λ֓؍͢Δ 2023/10/28 Music x Analytics meetup vol.11 ࢁຊ ༤ʢyamathcyʣ
ࣗݾհ • ࢁຊ ༤ʢyamathcyʣ • ஜେֶେֶӃത࢜ޙظ՝ఔ3 • ઐɿԻָɾԻڹใॲཧ • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷੳ
• ࠷ۙSSLϞσϧͰՎੳλεΫΛղ͍͍ͯΔ • ࠷ۙͷԻָͷϚΠϒʔϜ • 4’33" Tsukuba Remix. • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ 2 ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io
ͪΐͬͱ͚ͩએ͍ͤͯͩ͘͞͞m(_ _)m • ԻָใॲཧͷࠃࡍձٞɼISMIR2023 ͷจಡΈձΛओ࠵͍ͯ͠·͢ʂ • ΦϯϥΠϯɼ2023 11/22ʢਫʣ 18:00- •
ະͩProceedingsະެ։ͷͨΊϦεέ Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφϯε͠ ·͢ʣ • (Ϧεέޙ12্݄०༧ఆ) 3
ຊ
ԻָࣝผͷͨΊͷࣄલֶशϞσϧ ʔͬͱϞσϧͷհͱࣗͷλεΫͷద༻ʹ͍ͭͯ ͠·͢ ͪΐͬͱதڃऀҎ্دΓͷ͔͠Ε·ͤΜ
Իָࣝผ • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏཁૉΛग़ྗ͢ΔλεΫ 6 δϟϯϧɼϜʔυ ָثߏ ָේɼՎࢺ, etc…
ͦͦͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ • ཧ༝1ɿσʔληοτෆ • Իָѹతʹσʔλ͕Γͳ͍ • ΞϊςʔγϣϯͬͱΓͳ͍͠࡞Δ ίετ͕ߴ͍ • ཧ༝2ɿλεΫࣗମͷ͠͞
• ͦͦԻָͷղੳࣗମ͕͍͠ • DNNͬͨํ͕ੑೳग़Δ 7 https://yamathcy.github.io/ISMIR2022J-POP/ ࢁຊˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ Λ࡞Γ·͕ͨ͠࡞ʹ1͔͔ۙ͘Γ·ͨ͠... DNNͷύϫʔΛΈࠐΈ͍͕ͨσʔλෆɼͷδϨϯϚ
ࠓΊͪΌͪ͘Όͨ͘͞Μ 8 Ͳͷ͘Β͍͋Δ͔ Tutorial Self-supervised Representation Learning for Speech Processing
NAACL 2022 https://sites.google.com/view/tutorial-ssl-speech
ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9 ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ੜଞͷతͷ ϞσϧΛస༻ Musicnn CREPE ɿԻָɼ੨ɿԻɼࠇɿڥԻɼҰൠ VGGish
PANNs AST Whisper Wav2Vec2.0 HuBERT WavLM CLAP MapMusic2Vec MERT Data2Vec JukeMIR: JukeBox (ੜ༻) ͷ ಛΛར༻ Encodec (ߴੑೳͳԻѹॖ༻) ͋ΔλεΫΛڭࢣ͋Γֶश ͯ͠ɼࣅͨλεΫʹస༻ ϥϕϧͷͳ͍σʔλʹ ٖࣅతͳϥϕϧΛ༩ֶͯ͠श ݩʑࣝผ༻Ͱͳ͍Ϟσϧͷ தؒग़ྗΛࣝผʹ͏
ʮԻָੜϞσϧJukeBoxͷதؒಛΛࣝผʹͬͯ͠·͓͏ʯ 10 • Jukeboxͷ36ͷग़ྗΛಛྔͱͯࣝ͠ผʹར༻, • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓྑ͍ੑೳ • ָۂͷϝϩσΟͷࣖίϐͷԠ༻ྫ [Donahue 22]
JukeMIR [Castellon 21]
ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ 11 MERT [Li 22] • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ ֶ͘शΛߦ͏
• ԻָʹಛԽֶͨ͠शΛՃʢϐο νಛͷ෮ݩʣ • ଟ͘ͷλεΫͰSoTAʹඖఢ ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9
Իָͱݴޠͷͭͳ͕ΓͷؔΛֶश 12 ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ ݴޠԻָ - MusicLM - MusicGen Իָݴޠ -
MuLan - MusCALL - LPMusicCaps - MU-LLaMa - LLark->New!! ※αʔϏεͷΈͳΒͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ
ԻָΛղੳ͢ΔͷʹݴޠΛ͏Ϟσϧ 13 ԻָxLLM LP-MusicCaps: ೖྗͨ͠ۂʹ͍ͭͯͷ هड़จΛੜ [Doh 23] Mu-LLaMA: MERT+LLaMAͰɼ
Իָͷ༰ʹର͢Δ࣭Ԡ [Liu 23] LLark: ࣭Ԡهड़จੜΛҰൠԽɽ ΑΓԻָͷཁૉ ʢίʔυ/ςϯϙʣʹಛԽ [Gardner 23]
14 • ϥΠϒϥϦ • Hugging face: https://huggingface.co/ • s3prl: https://s3prl.github.io/s3prl/index.html
• Իͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ • จɼσϞͷಈΛ͏ • ਐ݄าͲ͜Ζ͔ඵਐาɽίʔυ/Ϟσϧެ։͞ΕΔ߹͕ଟ͍ͷ ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔ զʑ͕͍ͬͯ͘ʹ
తͷλεΫʢԼྲྀλεΫʣͰͷར༻ϑΝΠϯνϡʔχϯά 15 զʑ͕͍ͬͯ͘ʹ ࠷ऴ͚͚ͩସ͑ ग़ྗ֬Λ৽ͨʹ ಛྔͱͯ͠༻͍Δ ৽ͨʹՃֶश͢Δ ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ
ۙͷσΧ͍Ϟσϧʹ͍ͭͯͷTips 16 զʑ͕͍ͬͯ͘ʹ 🔥 🔥 🔥 🔥 ❄ Transformer EncoderͷΈΛֶश
֤ͷग़ྗΛॏΈ͚ͮฏۉ Adapter, LoRA, Pre fi x tuningͷ Parameter-ef fi cient FT [Chen 23] Ұ෦ͷ༗༻ͳͷग़ྗͷΈΛ͏ 㱻Իͷ߹ͷલޙͰ࣋ͭใ͕ҟͳΔ [Chen 22]
ͦΕͧΕจ͓ΑͼϕϯνϚʔΫΛνΣοΫ 17 ͲΕΛ͍͍͔͑ • ԻɿSUPERB • ԻָࣝผɿMARBLE ͋ͱࣗͰࢼ͔͢͠ͳ͍…
ݩͷυϝΠϯʹҙ 18 ͲΕΛ͍͍͔͑ • Վͷ߹ͳΒԻͰ͋Δఔɹ ͑ΔɼԻָͷϞσϧΑΓྑ͍߹ • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍ έʔε •
Reprogrammingͱ͍͏ΈΛͬ ͯదԠͤͯ͞͠·͏ݚڀ ՎखΛͯΔλεΫͩͱԻ>Իָ [Yamamoto 23] Wav2Vec2.0ϐον/ָثࣝผʹͦͷ··FTͰ͏ΑΓ ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]
ࣄલֶशϞσϧͷར༻ԻָͰΜʹʂ 19 • ֤छϞσϧͷհ • ԻָࣗମͷࣝผϚϧνϞʔμϧൃలதʂ • ·ͩ·ͩΓΓͯͳ͍͜ͱͨ͘͞Μ͋Δ ऴΘΓʹ Thank
you!!
ิ
HuBERT • MERTͷݩʹͳͬͨϞσϧ 21
JukeBOX • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε • https://openai.com/research/jukebox 22 ֶश࣌ ੜ࣌
Reprogramming • Ϟσϧֶ͕शͨ͠λεΫͱֶ͕ࣗश͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏ • ԻָࣝผλεΫʹ͓͍ͯɼڥԻ͓ΑͼԻϞσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23] 23 https://github.com/ ga642381/Speech- Prompts-Adapters