Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
muana vol.11 音楽識別の事前学習モデル
Search
Yuya Yamamoto
October 28, 2023
1
540
muana vol.11 音楽識別の事前学習モデル
Muana vol.11 WIP.
Yuya Yamamoto
October 28, 2023
Tweet
Share
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
GMI44@Music and Language研究のサーベイ報告
yamathcy
1
110
APSIPA 2023 Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
yamathcy
0
42
国際会議ISMIR2022報告(山本分)
yamathcy
0
130
Do It Yourself: Sonic Visualiserで音楽分析してみよう
yamathcy
0
640
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
100
LSPC博士前期チュートリアル
yamathcy
0
150
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
1.6k
SIGMUS130-yamamoto
yamathcy
0
48
#muana IRM
yamathcy
0
2.1k
Featured
See All Featured
A Philosophy of Restraint
colly
197
16k
Principles of Awesome APIs and How to Build Them.
keavy
121
16k
The Brand Is Dead. Long Live the Brand.
mthomps
49
29k
How GitHub Uses GitHub to Build GitHub
holman
468
290k
What the flash - Photography Introduction
edds
64
11k
Producing Creativity
orderedlist
PRO
337
39k
Become a Pro
speakerdeck
PRO
11
4.5k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
25
2.3k
Building Applications with DynamoDB
mza
88
5.6k
Fantastic passwords and where to find them - at NoRuKo
philnash
37
2.5k
Happy Clients
brianwarren
92
6.4k
Building Flexible Design Systems
yeseniaperezcruz
319
37k
Transcript
Իָࣝผʹ͑ͦ͏ͳࣄલֶशϞσϧͨͪ Λ֓؍͢Δ 2023/10/28 Music x Analytics meetup vol.11 ࢁຊ ༤ʢyamathcyʣ
ࣗݾհ • ࢁຊ ༤ʢyamathcyʣ • ஜେֶେֶӃത࢜ޙظ՝ఔ3 • ઐɿԻָɾԻڹใॲཧ • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷੳ
• ࠷ۙSSLϞσϧͰՎੳλεΫΛղ͍͍ͯΔ • ࠷ۙͷԻָͷϚΠϒʔϜ • 4’33" Tsukuba Remix. • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ 2 ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io
ͪΐͬͱ͚ͩએ͍ͤͯͩ͘͞͞m(_ _)m • ԻָใॲཧͷࠃࡍձٞɼISMIR2023 ͷจಡΈձΛओ࠵͍ͯ͠·͢ʂ • ΦϯϥΠϯɼ2023 11/22ʢਫʣ 18:00- •
ະͩProceedingsະެ։ͷͨΊϦεέ Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφϯε͠ ·͢ʣ • (Ϧεέޙ12্݄०༧ఆ) 3
ຊ
ԻָࣝผͷͨΊͷࣄલֶशϞσϧ ʔͬͱϞσϧͷհͱࣗͷλεΫͷద༻ʹ͍ͭͯ ͠·͢ ͪΐͬͱதڃऀҎ্دΓͷ͔͠Ε·ͤΜ
Իָࣝผ • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏཁૉΛग़ྗ͢ΔλεΫ 6 δϟϯϧɼϜʔυ ָثߏ ָේɼՎࢺ, etc…
ͦͦͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ • ཧ༝1ɿσʔληοτෆ • Իָѹతʹσʔλ͕Γͳ͍ • ΞϊςʔγϣϯͬͱΓͳ͍͠࡞Δ ίετ͕ߴ͍ • ཧ༝2ɿλεΫࣗମͷ͠͞
• ͦͦԻָͷղੳࣗମ͕͍͠ • DNNͬͨํ͕ੑೳग़Δ 7 https://yamathcy.github.io/ISMIR2022J-POP/ ࢁຊˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ Λ࡞Γ·͕ͨ͠࡞ʹ1͔͔ۙ͘Γ·ͨ͠... DNNͷύϫʔΛΈࠐΈ͍͕ͨσʔλෆɼͷδϨϯϚ
ࠓΊͪΌͪ͘Όͨ͘͞Μ 8 Ͳͷ͘Β͍͋Δ͔ Tutorial Self-supervised Representation Learning for Speech Processing
NAACL 2022 https://sites.google.com/view/tutorial-ssl-speech
ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9 ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ੜଞͷతͷ ϞσϧΛస༻ Musicnn CREPE ɿԻָɼ੨ɿԻɼࠇɿڥԻɼҰൠ VGGish
PANNs AST Whisper Wav2Vec2.0 HuBERT WavLM CLAP MapMusic2Vec MERT Data2Vec JukeMIR: JukeBox (ੜ༻) ͷ ಛΛར༻ Encodec (ߴੑೳͳԻѹॖ༻) ͋ΔλεΫΛڭࢣ͋Γֶश ͯ͠ɼࣅͨλεΫʹస༻ ϥϕϧͷͳ͍σʔλʹ ٖࣅతͳϥϕϧΛ༩ֶͯ͠श ݩʑࣝผ༻Ͱͳ͍Ϟσϧͷ தؒग़ྗΛࣝผʹ͏
ʮԻָੜϞσϧJukeBoxͷதؒಛΛࣝผʹͬͯ͠·͓͏ʯ 10 • Jukeboxͷ36ͷग़ྗΛಛྔͱͯࣝ͠ผʹར༻, • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓྑ͍ੑೳ • ָۂͷϝϩσΟͷࣖίϐͷԠ༻ྫ [Donahue 22]
JukeMIR [Castellon 21]
ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ 11 MERT [Li 22] • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ ֶ͘शΛߦ͏
• ԻָʹಛԽֶͨ͠शΛՃʢϐο νಛͷ෮ݩʣ • ଟ͘ͷλεΫͰSoTAʹඖఢ ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9
Իָͱݴޠͷͭͳ͕ΓͷؔΛֶश 12 ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ ݴޠԻָ - MusicLM - MusicGen Իָݴޠ -
MuLan - MusCALL - LPMusicCaps - MU-LLaMa - LLark->New!! ※αʔϏεͷΈͳΒͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ
ԻָΛղੳ͢ΔͷʹݴޠΛ͏Ϟσϧ 13 ԻָxLLM LP-MusicCaps: ೖྗͨ͠ۂʹ͍ͭͯͷ هड़จΛੜ [Doh 23] Mu-LLaMA: MERT+LLaMAͰɼ
Իָͷ༰ʹର͢Δ࣭Ԡ [Liu 23] LLark: ࣭Ԡهड़จੜΛҰൠԽɽ ΑΓԻָͷཁૉ ʢίʔυ/ςϯϙʣʹಛԽ [Gardner 23]
14 • ϥΠϒϥϦ • Hugging face: https://huggingface.co/ • s3prl: https://s3prl.github.io/s3prl/index.html
• Իͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ • จɼσϞͷಈΛ͏ • ਐ݄าͲ͜Ζ͔ඵਐาɽίʔυ/Ϟσϧެ։͞ΕΔ߹͕ଟ͍ͷ ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔ զʑ͕͍ͬͯ͘ʹ
తͷλεΫʢԼྲྀλεΫʣͰͷར༻ϑΝΠϯνϡʔχϯά 15 զʑ͕͍ͬͯ͘ʹ ࠷ऴ͚͚ͩସ͑ ग़ྗ֬Λ৽ͨʹ ಛྔͱͯ͠༻͍Δ ৽ͨʹՃֶश͢Δ ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ
ۙͷσΧ͍Ϟσϧʹ͍ͭͯͷTips 16 զʑ͕͍ͬͯ͘ʹ 🔥 🔥 🔥 🔥 ❄ Transformer EncoderͷΈΛֶश
֤ͷग़ྗΛॏΈ͚ͮฏۉ Adapter, LoRA, Pre fi x tuningͷ Parameter-ef fi cient FT [Chen 23] Ұ෦ͷ༗༻ͳͷग़ྗͷΈΛ͏ 㱻Իͷ߹ͷલޙͰ࣋ͭใ͕ҟͳΔ [Chen 22]
ͦΕͧΕจ͓ΑͼϕϯνϚʔΫΛνΣοΫ 17 ͲΕΛ͍͍͔͑ • ԻɿSUPERB • ԻָࣝผɿMARBLE ͋ͱࣗͰࢼ͔͢͠ͳ͍…
ݩͷυϝΠϯʹҙ 18 ͲΕΛ͍͍͔͑ • Վͷ߹ͳΒԻͰ͋Δఔɹ ͑ΔɼԻָͷϞσϧΑΓྑ͍߹ • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍ έʔε •
Reprogrammingͱ͍͏ΈΛͬ ͯదԠͤͯ͞͠·͏ݚڀ ՎखΛͯΔλεΫͩͱԻ>Իָ [Yamamoto 23] Wav2Vec2.0ϐον/ָثࣝผʹͦͷ··FTͰ͏ΑΓ ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]
ࣄલֶशϞσϧͷར༻ԻָͰΜʹʂ 19 • ֤छϞσϧͷհ • ԻָࣗମͷࣝผϚϧνϞʔμϧൃలதʂ • ·ͩ·ͩΓΓͯͳ͍͜ͱͨ͘͞Μ͋Δ ऴΘΓʹ Thank
you!!
ิ
HuBERT • MERTͷݩʹͳͬͨϞσϧ 21
JukeBOX • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε • https://openai.com/research/jukebox 22 ֶश࣌ ੜ࣌
Reprogramming • Ϟσϧֶ͕शͨ͠λεΫͱֶ͕ࣗश͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏ • ԻָࣝผλεΫʹ͓͍ͯɼڥԻ͓ΑͼԻϞσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23] 23 https://github.com/ ga642381/Speech- Prompts-Adapters