Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
muana vol.11 音楽識別の事前学習モデル
Search
Yuya Yamamoto
October 28, 2023
1
700
muana vol.11 音楽識別の事前学習モデル
Muana vol.11 WIP.
Yuya Yamamoto
October 28, 2023
Tweet
Share
More Decks by Yuya Yamamoto
See All by Yuya Yamamoto
GMI44@Music and Language研究のサーベイ報告
yamathcy
1
130
APSIPA 2023 Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
yamathcy
0
75
国際会議ISMIR2022報告(山本分)
yamathcy
0
160
Do It Yourself: Sonic Visualiserで音楽分析してみよう
yamathcy
0
1.5k
2022年度情報学学位プログラム説明会 学生体験談
yamathcy
0
150
LSPC博士前期チュートリアル
yamathcy
0
170
MULTIMODAL METRIC LEARNING FOR TAG-BASED MUSIC RETRIEVAL@ICASSP2021読み会
yamathcy
0
1.8k
SIGMUS130-yamamoto
yamathcy
0
60
#muana IRM
yamathcy
0
2.4k
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
94
13k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
GraphQLとの向き合い方2022年版
quramy
44
14k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
260
Mobile First: as difficult as doing things right
swwweet
223
9.5k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
27
1.9k
Building Your Own Lightsaber
phodgson
104
6.2k
Scaling GitHub
holman
459
140k
Docker and Python
trallard
44
3.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
Transcript
Իָࣝผʹ͑ͦ͏ͳࣄલֶशϞσϧͨͪ Λ֓؍͢Δ 2023/10/28 Music x Analytics meetup vol.11 ࢁຊ ༤ʢyamathcyʣ
ࣗݾհ • ࢁຊ ༤ʢyamathcyʣ • ஜେֶେֶӃത࢜ޙظ՝ఔ3 • ઐɿԻָɾԻڹใॲཧ • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷੳ
• ࠷ۙSSLϞσϧͰՎੳλεΫΛղ͍͍ͯΔ • ࠷ۙͷԻָͷϚΠϒʔϜ • 4’33" Tsukuba Remix. • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ 2 ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io
ͪΐͬͱ͚ͩએ͍ͤͯͩ͘͞͞m(_ _)m • ԻָใॲཧͷࠃࡍձٞɼISMIR2023 ͷจಡΈձΛओ࠵͍ͯ͠·͢ʂ • ΦϯϥΠϯɼ2023 11/22ʢਫʣ 18:00- •
ະͩProceedingsະެ։ͷͨΊϦεέ Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφϯε͠ ·͢ʣ • (Ϧεέޙ12্݄०༧ఆ) 3
ຊ
ԻָࣝผͷͨΊͷࣄલֶशϞσϧ ʔͬͱϞσϧͷհͱࣗͷλεΫͷద༻ʹ͍ͭͯ ͠·͢ ͪΐͬͱதڃऀҎ্دΓͷ͔͠Ε·ͤΜ
Իָࣝผ • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏཁૉΛग़ྗ͢ΔλεΫ 6 δϟϯϧɼϜʔυ ָثߏ ָේɼՎࢺ, etc…
ͦͦͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ • ཧ༝1ɿσʔληοτෆ • Իָѹతʹσʔλ͕Γͳ͍ • ΞϊςʔγϣϯͬͱΓͳ͍͠࡞Δ ίετ͕ߴ͍ • ཧ༝2ɿλεΫࣗମͷ͠͞
• ͦͦԻָͷղੳࣗମ͕͍͠ • DNNͬͨํ͕ੑೳग़Δ 7 https://yamathcy.github.io/ISMIR2022J-POP/ ࢁຊˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ Λ࡞Γ·͕ͨ͠࡞ʹ1͔͔ۙ͘Γ·ͨ͠... DNNͷύϫʔΛΈࠐΈ͍͕ͨσʔλෆɼͷδϨϯϚ
ࠓΊͪΌͪ͘Όͨ͘͞Μ 8 Ͳͷ͘Β͍͋Δ͔ Tutorial Self-supervised Representation Learning for Speech Processing
NAACL 2022 https://sites.google.com/view/tutorial-ssl-speech
ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9 ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ੜଞͷతͷ ϞσϧΛస༻ Musicnn CREPE ɿԻָɼ੨ɿԻɼࠇɿڥԻɼҰൠ VGGish
PANNs AST Whisper Wav2Vec2.0 HuBERT WavLM CLAP MapMusic2Vec MERT Data2Vec JukeMIR: JukeBox (ੜ༻) ͷ ಛΛར༻ Encodec (ߴੑೳͳԻѹॖ༻) ͋ΔλεΫΛڭࢣ͋Γֶश ͯ͠ɼࣅͨλεΫʹస༻ ϥϕϧͷͳ͍σʔλʹ ٖࣅతͳϥϕϧΛ༩ֶͯ͠श ݩʑࣝผ༻Ͱͳ͍Ϟσϧͷ தؒग़ྗΛࣝผʹ͏
ʮԻָੜϞσϧJukeBoxͷதؒಛΛࣝผʹͬͯ͠·͓͏ʯ 10 • Jukeboxͷ36ͷग़ྗΛಛྔͱͯࣝ͠ผʹར༻, • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓྑ͍ੑೳ • ָۂͷϝϩσΟͷࣖίϐͷԠ༻ྫ [Donahue 22]
JukeMIR [Castellon 21]
ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ 11 MERT [Li 22] • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ ֶ͘शΛߦ͏
• ԻָʹಛԽֶͨ͠शΛՃʢϐο νಛͷ෮ݩʣ • ଟ͘ͷλεΫͰSoTAʹඖఢ ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9
Իָͱݴޠͷͭͳ͕ΓͷؔΛֶश 12 ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ ݴޠԻָ - MusicLM - MusicGen Իָݴޠ -
MuLan - MusCALL - LPMusicCaps - MU-LLaMa - LLark->New!! ※αʔϏεͷΈͳΒͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ
ԻָΛղੳ͢ΔͷʹݴޠΛ͏Ϟσϧ 13 ԻָxLLM LP-MusicCaps: ೖྗͨ͠ۂʹ͍ͭͯͷ هड़จΛੜ [Doh 23] Mu-LLaMA: MERT+LLaMAͰɼ
Իָͷ༰ʹର͢Δ࣭Ԡ [Liu 23] LLark: ࣭Ԡهड़จੜΛҰൠԽɽ ΑΓԻָͷཁૉ ʢίʔυ/ςϯϙʣʹಛԽ [Gardner 23]
14 • ϥΠϒϥϦ • Hugging face: https://huggingface.co/ • s3prl: https://s3prl.github.io/s3prl/index.html
• Իͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ • จɼσϞͷಈΛ͏ • ਐ݄าͲ͜Ζ͔ඵਐาɽίʔυ/Ϟσϧެ։͞ΕΔ߹͕ଟ͍ͷ ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔ զʑ͕͍ͬͯ͘ʹ
తͷλεΫʢԼྲྀλεΫʣͰͷར༻ϑΝΠϯνϡʔχϯά 15 զʑ͕͍ͬͯ͘ʹ ࠷ऴ͚͚ͩସ͑ ग़ྗ֬Λ৽ͨʹ ಛྔͱͯ͠༻͍Δ ৽ͨʹՃֶश͢Δ ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ
ۙͷσΧ͍Ϟσϧʹ͍ͭͯͷTips 16 զʑ͕͍ͬͯ͘ʹ 🔥 🔥 🔥 🔥 ❄ Transformer EncoderͷΈΛֶश
֤ͷग़ྗΛॏΈ͚ͮฏۉ Adapter, LoRA, Pre fi x tuningͷ Parameter-ef fi cient FT [Chen 23] Ұ෦ͷ༗༻ͳͷग़ྗͷΈΛ͏ 㱻Իͷ߹ͷલޙͰ࣋ͭใ͕ҟͳΔ [Chen 22]
ͦΕͧΕจ͓ΑͼϕϯνϚʔΫΛνΣοΫ 17 ͲΕΛ͍͍͔͑ • ԻɿSUPERB • ԻָࣝผɿMARBLE ͋ͱࣗͰࢼ͔͢͠ͳ͍…
ݩͷυϝΠϯʹҙ 18 ͲΕΛ͍͍͔͑ • Վͷ߹ͳΒԻͰ͋Δఔɹ ͑ΔɼԻָͷϞσϧΑΓྑ͍߹ • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍ έʔε •
Reprogrammingͱ͍͏ΈΛͬ ͯదԠͤͯ͞͠·͏ݚڀ ՎखΛͯΔλεΫͩͱԻ>Իָ [Yamamoto 23] Wav2Vec2.0ϐον/ָثࣝผʹͦͷ··FTͰ͏ΑΓ ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]
ࣄલֶशϞσϧͷར༻ԻָͰΜʹʂ 19 • ֤छϞσϧͷհ • ԻָࣗମͷࣝผϚϧνϞʔμϧൃలதʂ • ·ͩ·ͩΓΓͯͳ͍͜ͱͨ͘͞Μ͋Δ ऴΘΓʹ Thank
you!!
ิ
HuBERT • MERTͷݩʹͳͬͨϞσϧ 21
JukeBOX • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε • https://openai.com/research/jukebox 22 ֶश࣌ ੜ࣌
Reprogramming • Ϟσϧֶ͕शͨ͠λεΫͱֶ͕ࣗश͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏ • ԻָࣝผλεΫʹ͓͍ͯɼڥԻ͓ΑͼԻϞσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23] 23 https://github.com/ ga642381/Speech- Prompts-Adapters