Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
muana vol.11 音楽識別の事前学習モデル
Search
Yuya Yamamoto
October 28, 2023
820
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
muana vol.11 音楽識別の事前学習モデル
Muana vol.11 WIP.
Yuya Yamamoto
October 28, 2023
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
GMI44@Music and Language研究のサーベイ報告
yamathcy
1
160
APSIPA 2023 Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
yamathcy
0
130
国際会議ISMIR2022報告(山本分)
yamathcy
0
240
Do It Yourself: Sonic Visualiserで音楽分析してみよう
yamathcy
0
2k
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
200
LSPC博士前期チュートリアル
yamathcy
0
220
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
1.9k
SIGMUS130-yamamoto
yamathcy
0
72
#muana IRM
yamathcy
0
2.4k
Featured
See All Featured
Utilizing Notion as your number one productivity tool
mfonobong
4
320
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
390
Information Architects: The Missing Link in Design Systems
soysaucechin
0
960
The untapped power of vector embeddings
frankvandijk
2
1.7k
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.8k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
840
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Facilitating Awesome Meetings
lara
57
6.9k
A Soul's Torment
seathinner
6
2.9k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
340
Transcript
Իָࣝผʹ͑ͦ͏ͳࣄલֶशϞσϧͨͪ Λ֓؍͢Δ 2023/10/28 Music x Analytics meetup vol.11 ࢁຊ ༤ʢyamathcyʣ
ࣗݾհ • ࢁຊ ༤ʢyamathcyʣ • ஜେֶେֶӃത࢜ޙظ՝ఔ3 • ઐɿԻָɾԻڹใॲཧ • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷੳ
• ࠷ۙSSLϞσϧͰՎੳλεΫΛղ͍͍ͯΔ • ࠷ۙͷԻָͷϚΠϒʔϜ • 4’33" Tsukuba Remix. • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ 2 ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io
ͪΐͬͱ͚ͩએ͍ͤͯͩ͘͞͞m(_ _)m • ԻָใॲཧͷࠃࡍձٞɼISMIR2023 ͷจಡΈձΛओ࠵͍ͯ͠·͢ʂ • ΦϯϥΠϯɼ2023 11/22ʢਫʣ 18:00- •
ະͩProceedingsະެ։ͷͨΊϦεέ Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφϯε͠ ·͢ʣ • (Ϧεέޙ12্݄०༧ఆ) 3
ຊ
ԻָࣝผͷͨΊͷࣄલֶशϞσϧ ʔͬͱϞσϧͷհͱࣗͷλεΫͷద༻ʹ͍ͭͯ ͠·͢ ͪΐͬͱதڃऀҎ্دΓͷ͔͠Ε·ͤΜ
Իָࣝผ • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏཁૉΛग़ྗ͢ΔλεΫ 6 δϟϯϧɼϜʔυ ָثߏ ָේɼՎࢺ, etc…
ͦͦͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ • ཧ༝1ɿσʔληοτෆ • Իָѹతʹσʔλ͕Γͳ͍ • ΞϊςʔγϣϯͬͱΓͳ͍͠࡞Δ ίετ͕ߴ͍ • ཧ༝2ɿλεΫࣗମͷ͠͞
• ͦͦԻָͷղੳࣗମ͕͍͠ • DNNͬͨํ͕ੑೳग़Δ 7 https://yamathcy.github.io/ISMIR2022J-POP/ ࢁຊˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ Λ࡞Γ·͕ͨ͠࡞ʹ1͔͔ۙ͘Γ·ͨ͠... DNNͷύϫʔΛΈࠐΈ͍͕ͨσʔλෆɼͷδϨϯϚ
ࠓΊͪΌͪ͘Όͨ͘͞Μ 8 Ͳͷ͘Β͍͋Δ͔ Tutorial Self-supervised Representation Learning for Speech Processing
NAACL 2022 https://sites.google.com/view/tutorial-ssl-speech
ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9 ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ੜଞͷతͷ ϞσϧΛస༻ Musicnn CREPE ɿԻָɼ੨ɿԻɼࠇɿڥԻɼҰൠ VGGish
PANNs AST Whisper Wav2Vec2.0 HuBERT WavLM CLAP MapMusic2Vec MERT Data2Vec JukeMIR: JukeBox (ੜ༻) ͷ ಛΛར༻ Encodec (ߴੑೳͳԻѹॖ༻) ͋ΔλεΫΛڭࢣ͋Γֶश ͯ͠ɼࣅͨλεΫʹస༻ ϥϕϧͷͳ͍σʔλʹ ٖࣅతͳϥϕϧΛ༩ֶͯ͠श ݩʑࣝผ༻Ͱͳ͍Ϟσϧͷ தؒग़ྗΛࣝผʹ͏
ʮԻָੜϞσϧJukeBoxͷதؒಛΛࣝผʹͬͯ͠·͓͏ʯ 10 • Jukeboxͷ36ͷग़ྗΛಛྔͱͯࣝ͠ผʹར༻, • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓྑ͍ੑೳ • ָۂͷϝϩσΟͷࣖίϐͷԠ༻ྫ [Donahue 22]
JukeMIR [Castellon 21]
ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ 11 MERT [Li 22] • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ ֶ͘शΛߦ͏
• ԻָʹಛԽֶͨ͠शΛՃʢϐο νಛͷ෮ݩʣ • ଟ͘ͷλεΫͰSoTAʹඖఢ ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9
Իָͱݴޠͷͭͳ͕ΓͷؔΛֶश 12 ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ ݴޠԻָ - MusicLM - MusicGen Իָݴޠ -
MuLan - MusCALL - LPMusicCaps - MU-LLaMa - LLark->New!! ※αʔϏεͷΈͳΒͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ
ԻָΛղੳ͢ΔͷʹݴޠΛ͏Ϟσϧ 13 ԻָxLLM LP-MusicCaps: ೖྗͨ͠ۂʹ͍ͭͯͷ هड़จΛੜ [Doh 23] Mu-LLaMA: MERT+LLaMAͰɼ
Իָͷ༰ʹର͢Δ࣭Ԡ [Liu 23] LLark: ࣭Ԡهड़จੜΛҰൠԽɽ ΑΓԻָͷཁૉ ʢίʔυ/ςϯϙʣʹಛԽ [Gardner 23]
14 • ϥΠϒϥϦ • Hugging face: https://huggingface.co/ • s3prl: https://s3prl.github.io/s3prl/index.html
• Իͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ • จɼσϞͷಈΛ͏ • ਐ݄าͲ͜Ζ͔ඵਐาɽίʔυ/Ϟσϧެ։͞ΕΔ߹͕ଟ͍ͷ ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔ զʑ͕͍ͬͯ͘ʹ
తͷλεΫʢԼྲྀλεΫʣͰͷར༻ϑΝΠϯνϡʔχϯά 15 զʑ͕͍ͬͯ͘ʹ ࠷ऴ͚͚ͩସ͑ ग़ྗ֬Λ৽ͨʹ ಛྔͱͯ͠༻͍Δ ৽ͨʹՃֶश͢Δ ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ
ۙͷσΧ͍Ϟσϧʹ͍ͭͯͷTips 16 զʑ͕͍ͬͯ͘ʹ 🔥 🔥 🔥 🔥 ❄ Transformer EncoderͷΈΛֶश
֤ͷग़ྗΛॏΈ͚ͮฏۉ Adapter, LoRA, Pre fi x tuningͷ Parameter-ef fi cient FT [Chen 23] Ұ෦ͷ༗༻ͳͷग़ྗͷΈΛ͏ 㱻Իͷ߹ͷલޙͰ࣋ͭใ͕ҟͳΔ [Chen 22]
ͦΕͧΕจ͓ΑͼϕϯνϚʔΫΛνΣοΫ 17 ͲΕΛ͍͍͔͑ • ԻɿSUPERB • ԻָࣝผɿMARBLE ͋ͱࣗͰࢼ͔͢͠ͳ͍…
ݩͷυϝΠϯʹҙ 18 ͲΕΛ͍͍͔͑ • Վͷ߹ͳΒԻͰ͋Δఔɹ ͑ΔɼԻָͷϞσϧΑΓྑ͍߹ • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍ έʔε •
Reprogrammingͱ͍͏ΈΛͬ ͯదԠͤͯ͞͠·͏ݚڀ ՎखΛͯΔλεΫͩͱԻ>Իָ [Yamamoto 23] Wav2Vec2.0ϐον/ָثࣝผʹͦͷ··FTͰ͏ΑΓ ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]
ࣄલֶशϞσϧͷར༻ԻָͰΜʹʂ 19 • ֤छϞσϧͷհ • ԻָࣗମͷࣝผϚϧνϞʔμϧൃలதʂ • ·ͩ·ͩΓΓͯͳ͍͜ͱͨ͘͞Μ͋Δ ऴΘΓʹ Thank
you!!
ิ
HuBERT • MERTͷݩʹͳͬͨϞσϧ 21
JukeBOX • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε • https://openai.com/research/jukebox 22 ֶश࣌ ੜ࣌
Reprogramming • Ϟσϧֶ͕शͨ͠λεΫͱֶ͕ࣗश͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏ • ԻָࣝผλεΫʹ͓͍ͯɼڥԻ͓ΑͼԻϞσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23] 23 https://github.com/ ga642381/Speech- Prompts-Adapters