Upgrade to Pro — share decks privately, control downloads, hide ads and more …

muana vol.11 音楽識別の事前学習モデル

Yuya Yamamoto
October 28, 2023
490

muana vol.11 音楽識別の事前学習モデル

Muana vol.11 WIP.

Yuya Yamamoto

October 28, 2023
Tweet

Transcript

 1. Իָࣝผʹ࢖͑ͦ͏ͳࣄલֶशϞσϧͨͪ
  Λ֓؍͢Δ
  2023/10/28 Music x Analytics meetup vol.11


  ࢁຊ ༤໵ʢyamathcyʣ

  View full-size slide

 2. ࣗݾ঺հ
  • ࢁຊ ༤໵ʢyamathcyʣ


  • ஜ೾େֶେֶӃത࢜ޙظ՝ఔ3೥


  • ઐ໳ɿԻָɾԻڹ৘ใॲཧ


  • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷ෼ੳ


  • ࠷ۙ͸SSLϞσϧͰՎ੠෼ੳλεΫΛղ͍͍ͯΔ


  • ࠷ۙͷԻָͷϚΠϒʔϜ


  • 4’33" Tsukuba Remix.


  • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ
  2
  ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io

  View full-size slide

 3. ͪΐͬͱ͚ͩએ఻͍ͤͯͩ͘͞͞m(_ _)m
  • Իָ৘ใॲཧͷࠃࡍձٞɼISMIR2023
  ͷ࿦จಡΈձΛओ࠵͍ͯ͠·͢ʂ


  • ΦϯϥΠϯɼ2023 11/22ʢਫʣ
  18:00-


  • ະͩProceedingsະެ։ͷͨΊϦεέ
  Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφ΢ϯε͠
  ·͢ʣ


  • (Ϧεέޙ͸12্݄०༧ఆ)
  3

  View full-size slide

 4. ԻָࣝผͷͨΊͷࣄલֶशϞσϧ
  ͹ʔͬͱϞσϧͷ঺հͱࣗ෼ͷλεΫ΁ͷద༻ʹ͍ͭͯ
  ࿩͠·͢
  ͪΐͬͱதڃऀҎ্دΓͷ࿩͔΋͠Ε·ͤΜ

  View full-size slide

 5. Իָࣝผ
  • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏ੒ཁૉΛग़ྗ͢ΔλεΫ
  6
  δϟϯϧɼϜʔυ౳
  ָثߏ੒
  ָේɼՎࢺ, etc…

  View full-size slide

 6. ͦ΋ͦ΋ͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ
  • ཧ༝1ɿσʔληοτෆ଍


  • Իָ͸ѹ౗తʹσʔλ͕଍Γͳ͍


  • Ξϊςʔγϣϯ͸΋ͬͱ଍Γͳ͍͠࡞Δ
  ίετ͕ߴ͍


  • ཧ༝2ɿλεΫࣗମͷ೉͠͞


  • ͦ΋ͦ΋Իָͷղੳࣗମ͕೉͍͠


  • DNN࢖ͬͨํ͕ੑೳ͸ग़Δ
  7
  https://yamathcy.github.io/ISMIR2022J-POP/


  ࢁຊ΋ˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ
  Λ࡞Γ·͕ͨ͠࡞੒ʹ1೥͔͔ۙ͘Γ·ͨ͠...
  DNNͷύϫʔΛ૊ΈࠐΈ͍͕ͨσʔλෆ଍ɼͷδϨϯϚ

  View full-size slide

 7. ࠓ΍ΊͪΌͪ͘Όͨ͘͞Μ
  8
  Ͳͷ͘Β͍͋Δ͔
  Tutorial Self-supervised Representation Learning for Speech Processing NAACL 2022


  https://sites.google.com/view/tutorial-ssl-speech

  View full-size slide

 8. ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9
  ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ
  ੜ੒౳ଞͷ໨తͷ
  ϞσϧΛస༻
  Musicnn


  CREPE
  ੺ɿԻָɼ੨ɿԻ੠ɼࠇɿ؀ڥԻɼҰൠ
  VGGish


  PANNs


  AST


  Whisper
  Wav2Vec2.0


  HuBERT


  WavLM


  CLAP
  MapMusic2Vec


  MERT


  Data2Vec
  JukeMIR: JukeBox (ੜ੒༻) ͷ


  ಛ௃Λར༻


  Encodec (ߴੑೳͳԻ੠ѹॖ༻)
  ͋ΔλεΫΛڭࢣ͋Γֶश


  ͯ͠ɼࣅͨλεΫʹస༻
  ϥϕϧͷͳ͍σʔλʹ


  ٖࣅతͳϥϕϧΛ෇༩ֶͯ͠श
  ݩʑࣝผ༻Ͱ͸ͳ͍Ϟσϧͷ


  தؒग़ྗΛࣝผʹ࢖͏

  View full-size slide

 9. ʮԻָੜ੒ϞσϧJukeBoxͷதؒಛ௃Λࣝผʹ࢖ͬͯ͠·͓͏ʯ
  10
  • Jukeboxͷ36૚໨ͷग़ྗΛಛ௃ྔͱͯࣝ͠ผ໰୊ʹར༻,


  • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓ΋ྑ͍ੑೳ


  • ָۂͷϝϩσΟͷࣖίϐ΁ͷԠ༻ྫ΋ [Donahue 22]
  JukeMIR [Castellon 21]

  View full-size slide

 10. ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ
  11
  MERT [Li 22]
  • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ


  • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ
  ֶ͘शΛߦ͏


  • ԻָʹಛԽֶͨ͠शΛ௥Ճʢϐο
  νಛ௃ͷ෮ݩ౳ʣ


  • ଟ͘ͷλεΫͰSoTAʹඖఢ
  ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9

  View full-size slide

 11. Իָͱݴޠͷͭͳ͕Γͷؔ܎Λֶश
  12
  ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ
  ݴޠԻָ
  - MusicLM


  - MusicGen
  Իָݴޠ
  - MuLan


  - MusCALL


  - LPMusicCaps


  - MU-LLaMa


  - LLark->New!!
  ※αʔϏεͷΈͳΒ΋ͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ

  View full-size slide

 12. ԻָΛղੳ͢ΔͷʹݴޠΛ࢖͏Ϟσϧ
  13
  ԻָxLLM
  LP-MusicCaps:


  ೖྗͨ͠ۂʹ͍ͭͯͷ


  هड़จΛੜ੒ [Doh 23]
  Mu-LLaMA:


  MERT+LLaMAͰɼ


  Իָͷ಺༰ʹର͢Δ࣭໰Ԡ౴


  [Liu 23]
  LLark:


  ࣭໰Ԡ౴΍هड़จੜ੒ΛҰൠԽɽ


  ΑΓԻָͷཁૉ


  ʢίʔυ/ςϯϙ౳ʣʹಛԽ


  [Gardner 23]

  View full-size slide

 13. 14
  • ϥΠϒϥϦ


  • Hugging face: https://huggingface.co/


  • s3prl: https://s3prl.github.io/s3prl/index.html


  • Ի੠ͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ


  • ࿦จɼσϞ౳ͷಈ޲Λ௥͏


  • ೔ਐ݄าͲ͜Ζ͔ඵਐ෼าɽίʔυ/Ϟσϧ΋ެ։͞ΕΔ৔߹͕ଟ͍ͷ
  ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔΋
  զʑ͕࢖͍ͬͯ͘ʹ͸

  View full-size slide

 14. ໨తͷλεΫʢԼྲྀλεΫʣͰͷར༻΍ϑΝΠϯνϡʔχϯά
  15
  զʑ͕࢖͍ͬͯ͘ʹ͸
  ࠷ऴ૚͚ͩ෇͚ସ͑
  ग़ྗ֬཰Λ৽ͨʹ


  ಛ௃ྔͱͯ͠༻͍Δ
  ৽ͨʹ௥Ճֶश͢Δ
  ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ

  View full-size slide

 15. ۙ೥ͷσΧ͍Ϟσϧʹ͍ͭͯͷTips
  16
  զʑ͕࢖͍ͬͯ͘ʹ͸
  🔥
  🔥
  🔥
  🔥

  Transformer Encoder૚ͷΈΛֶश ֤૚ͷग़ྗΛॏΈ͚ͮฏۉ
  Adapter, LoRA, Pre
  fi
  x tuning౳ͷ


  Parameter-ef
  fi
  cient FT [Chen 23]
  Ұ෦ͷ༗༻ͳ૚ͷग़ྗͷΈΛ࢖͏


  㱻Իͷ৔߹૚ͷલ൒ޙ൒Ͱ࣋ͭ৘ใ͕ҟͳΔ [Chen 22]

  View full-size slide

 16. ͦΕͧΕ࿦จ͓ΑͼϕϯνϚʔΫΛνΣοΫ
  17
  ͲΕΛ࢖͑͹͍͍͔
  • Ի੠ɿSUPERB
  • ԻָࣝผɿMARBLE
  ͋ͱ͸ࣗ෼Ͱࢼ͔͢͠ͳ͍…

  View full-size slide

 17. ݩͷυϝΠϯʹ஫ҙ
  18
  ͲΕΛ࢖͑͹͍͍͔
  • Վ੠ͷ৔߹ͳΒԻ੠Ͱ΋͋Δఔ౓͸ɹ
  ࢖͑ΔɼԻָͷϞσϧΑΓྑ͍৔߹΋


  • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍
  έʔε΋


  • Reprogrammingͱ͍͏࿮૊ΈΛ࢖ͬ
  ͯదԠͤͯ͞͠·͏ݚڀ΋
  ՎखΛ౰ͯΔλεΫͩͱԻ੠>Իָ [Yamamoto 23]
  Wav2Vec2.0͸ϐον/ָثࣝผʹ͸ͦͷ··FTͰ࢖͏ΑΓ


  ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]

  View full-size slide

 18. ࣄલֶशϞσϧͷར༻͸ԻָͰ΋੝Μʹʂ
  19
  • ֤छϞσϧͷ঺հ


  • Իָࣗମͷࣝผ΋ϚϧνϞʔμϧ΋ൃలதʂ


  • ·ͩ·ͩ΍Γ଍Γͯͳ͍͜ͱ΋ͨ͘͞Μ͋Δ
  ऴΘΓʹ
  Thank you!!

  View full-size slide

 19. HuBERT
  • MERTͷݩʹͳͬͨϞσϧ
  21

  View full-size slide

 20. JukeBOX
  • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε


  • https://openai.com/research/jukebox
  22
  ֶश࣌ ੜ੒࣌

  View full-size slide

 21. Reprogramming
  • Ϟσϧֶ͕शͨ͠λεΫͱࣗ෼ֶ͕श͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏


  • ԻָࣝผλεΫʹ͓͍ͯɼ؀ڥԻ͓ΑͼԻ੠Ϟσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23]
  23
  https://github.com/
  ga642381/Speech-
  Prompts-Adapters

  View full-size slide