Upgrade to Pro — share decks privately, control downloads, hide ads and more …

muana vol.11 音楽識別の事前学習モデル

Yuya Yamamoto
October 28, 2023
490

muana vol.11 音楽識別の事前学習モデル

Muana vol.11 WIP.

Yuya Yamamoto

October 28, 2023
Tweet

Transcript

  1. Իָࣝผʹ࢖͑ͦ͏ͳࣄલֶशϞσϧͨͪ
    Λ֓؍͢Δ
    2023/10/28 Music x Analytics meetup vol.11


    ࢁຊ ༤໵ʢyamathcyʣ

    View full-size slide

  2. ࣗݾ঺հ
    • ࢁຊ ༤໵ʢyamathcyʣ


    • ஜ೾େֶେֶӃത࢜ޙظ՝ఔ3೥


    • ઐ໳ɿԻָɾԻڹ৘ใॲཧ


    • ՎखͷಛघͳՎ͍ํʮՎএςΫχοΫʯͷ෼ੳ


    • ࠷ۙ͸SSLϞσϧͰՎ੠෼ੳλεΫΛղ͍͍ͯΔ


    • ࠷ۙͷԻָͷϚΠϒʔϜ


    • 4’33" Tsukuba Remix.


    • ΠϯυͷଧָثͷλϒϥͷۂΛௌ͘͜ͱ
    2
    ಛٕ: ͡Ό͕Γࣦ͜ഊ github.io

    View full-size slide

  3. ͪΐͬͱ͚ͩએ఻͍ͤͯͩ͘͞͞m(_ _)m
    • Իָ৘ใॲཧͷࠃࡍձٞɼISMIR2023
    ͷ࿦จಡΈձΛओ࠵͍ͯ͠·͢ʂ


    • ΦϯϥΠϯɼ2023 11/22ʢਫʣ
    18:00-


    • ະͩProceedingsະެ։ͷͨΊϦεέ
    Մೳੑ͋Γ🙇 ʢޙ΄ͲΞφ΢ϯε͠
    ·͢ʣ


    • (Ϧεέޙ͸12্݄०༧ఆ)
    3

    View full-size slide

  4. ԻָࣝผͷͨΊͷࣄલֶशϞσϧ
    ͹ʔͬͱϞσϧͷ঺հͱࣗ෼ͷλεΫ΁ͷద༻ʹ͍ͭͯ
    ࿩͠·͢
    ͪΐͬͱதڃऀҎ্دΓͷ࿩͔΋͠Ε·ͤΜ

    View full-size slide

  5. Իָࣝผ
    • ԻָԻڹ৴߸ʢwavͱ͔ʣΛೖྗʹɼߏ੒ཁૉΛग़ྗ͢ΔλεΫ
    6
    δϟϯϧɼϜʔυ౳
    ָثߏ੒
    ָේɼՎࢺ, etc…

    View full-size slide

  6. ͦ΋ͦ΋ͳͥԻָࣝผʹࣄલֶशϞσϧ͕ඞཁʁ
    • ཧ༝1ɿσʔληοτෆ଍


    • Իָ͸ѹ౗తʹσʔλ͕଍Γͳ͍


    • Ξϊςʔγϣϯ͸΋ͬͱ଍Γͳ͍͠࡞Δ
    ίετ͕ߴ͍


    • ཧ༝2ɿλεΫࣗମͷ೉͠͞


    • ͦ΋ͦ΋Իָͷղੳࣗମ͕೉͍͠


    • DNN࢖ͬͨํ͕ੑೳ͸ग़Δ
    7
    https://yamathcy.github.io/ISMIR2022J-POP/


    ࢁຊ΋ˢͷΑ͏ͳ5࣌ؒ͘Β͍ͷՎͷσʔληοτ
    Λ࡞Γ·͕ͨ͠࡞੒ʹ1೥͔͔ۙ͘Γ·ͨ͠...
    DNNͷύϫʔΛ૊ΈࠐΈ͍͕ͨσʔλෆ଍ɼͷδϨϯϚ

    View full-size slide

  7. ࠓ΍ΊͪΌͪ͘Όͨ͘͞Μ
    8
    Ͳͷ͘Β͍͋Δ͔
    Tutorial Self-supervised Representation Learning for Speech Processing NAACL 2022


    https://sites.google.com/view/tutorial-ssl-speech

    View full-size slide

  8. ͬ͘͟ΓΧςΰϥΠζʢಠஅʣ 9
    ڭࢣ͋Γֶशʹجͮ͘Ϟσϧ ࣗݾڭࢣ͋Γֶशʹجͮ͘Ϟσϧ
    ੜ੒౳ଞͷ໨తͷ
    ϞσϧΛస༻
    Musicnn


    CREPE
    ੺ɿԻָɼ੨ɿԻ੠ɼࠇɿ؀ڥԻɼҰൠ
    VGGish


    PANNs


    AST


    Whisper
    Wav2Vec2.0


    HuBERT


    WavLM


    CLAP
    MapMusic2Vec


    MERT


    Data2Vec
    JukeMIR: JukeBox (ੜ੒༻) ͷ


    ಛ௃Λར༻


    Encodec (ߴੑೳͳԻ੠ѹॖ༻)
    ͋ΔλεΫΛڭࢣ͋Γֶश


    ͯ͠ɼࣅͨλεΫʹస༻
    ϥϕϧͷͳ͍σʔλʹ


    ٖࣅతͳϥϕϧΛ෇༩ֶͯ͠श
    ݩʑࣝผ༻Ͱ͸ͳ͍Ϟσϧͷ


    தؒग़ྗΛࣝผʹ࢖͏

    View full-size slide

  9. ʮԻָੜ੒ϞσϧJukeBoxͷதؒಛ௃Λࣝผʹ࢖ͬͯ͠·͓͏ʯ
    10
    • Jukeboxͷ36૚໨ͷग़ྗΛಛ௃ྔͱͯࣝ͠ผ໰୊ʹར༻,


    • 4छྨͷλεΫͰͲͷ༗໊ϞσϧΑΓ΋ྑ͍ੑೳ


    • ָۂͷϝϩσΟͷࣖίϐ΁ͷԠ༻ྫ΋ [Donahue 22]
    JukeMIR [Castellon 21]

    View full-size slide

  10. ԻָԻڹ৴߸ʹର͢ΔBERTͷΑ͏ͳEncoderϞσϧ
    11
    MERT [Li 22]
    • ԻָࣝผʹಛԽͨ͠ࣄલֶशϞσϧ


    • BERTͷΑ͏ʹϚεΫ෦ਪఆʹجͮ
    ֶ͘शΛߦ͏


    • ԻָʹಛԽֶͨ͠शΛ௥Ճʢϐο
    νಛ௃ͷ෮ݩ౳ʣ


    • ଟ͘ͷλεΫͰSoTAʹඖఢ
    ղઆهࣄॻ͍ͯ·ͨ͠ˠhttps://qiita.com/yamathcy/items/f2f27468c5b5c4dc24a9

    View full-size slide

  11. Իָͱݴޠͷͭͳ͕Γͷؔ܎Λֶश
    12
    ϚϧνϞʔμϧܥʢಛʹԻָxࣗવݴޠʣ
    ݴޠԻָ
    - MusicLM


    - MusicGen
    Իָݴޠ
    - MuLan


    - MusCALL


    - LPMusicCaps


    - MU-LLaMa


    - LLark->New!!
    ※αʔϏεͷΈͳΒ΋ͬͱͨ͘͞Μ͋Γ·͕͢झࢫ͕ҟͳΔͨΊׂѪ

    View full-size slide

  12. ԻָΛղੳ͢ΔͷʹݴޠΛ࢖͏Ϟσϧ
    13
    ԻָxLLM
    LP-MusicCaps:


    ೖྗͨ͠ۂʹ͍ͭͯͷ


    هड़จΛੜ੒ [Doh 23]
    Mu-LLaMA:


    MERT+LLaMAͰɼ


    Իָͷ಺༰ʹର͢Δ࣭໰Ԡ౴


    [Liu 23]
    LLark:


    ࣭໰Ԡ౴΍هड़จੜ੒ΛҰൠԽɽ


    ΑΓԻָͷཁૉ


    ʢίʔυ/ςϯϙ౳ʣʹಛԽ


    [Gardner 23]

    View full-size slide

  13. 14
    • ϥΠϒϥϦ


    • Hugging face: https://huggingface.co/


    • s3prl: https://s3prl.github.io/s3prl/index.html


    • Ի੠ͷࣗݾڭࢣ͋ΓֶशʹಛԽͨ͠πʔϧΩοτ


    • ࿦จɼσϞ౳ͷಈ޲Λ௥͏


    • ೔ਐ݄าͲ͜Ζ͔ඵਐ෼าɽίʔυ/Ϟσϧ΋ެ։͞ΕΔ৔߹͕ଟ͍ͷ
    ͰɼΞϯςφΛுΔͱ͍͍͜ͱ͕͋Δ͔΋
    զʑ͕࢖͍ͬͯ͘ʹ͸

    View full-size slide

  14. ໨తͷλεΫʢԼྲྀλεΫʣͰͷར༻΍ϑΝΠϯνϡʔχϯά
    15
    զʑ͕࢖͍ͬͯ͘ʹ͸
    ࠷ऴ૚͚ͩ෇͚ସ͑
    ग़ྗ֬཰Λ৽ͨʹ


    ಛ௃ྔͱͯ͠༻͍Δ
    ৽ͨʹ௥Ճֶश͢Δ
    ҰൠʹֶशࡁΈͷϞσϧΛ׆༻͢Δํ๏ͨͪ

    View full-size slide

  15. ۙ೥ͷσΧ͍Ϟσϧʹ͍ͭͯͷTips
    16
    զʑ͕࢖͍ͬͯ͘ʹ͸
    🔥
    🔥
    🔥
    🔥

    Transformer Encoder૚ͷΈΛֶश ֤૚ͷग़ྗΛॏΈ͚ͮฏۉ
    Adapter, LoRA, Pre
    fi
    x tuning౳ͷ


    Parameter-ef
    fi
    cient FT [Chen 23]
    Ұ෦ͷ༗༻ͳ૚ͷग़ྗͷΈΛ࢖͏


    㱻Իͷ৔߹૚ͷલ൒ޙ൒Ͱ࣋ͭ৘ใ͕ҟͳΔ [Chen 22]

    View full-size slide

  16. ͦΕͧΕ࿦จ͓ΑͼϕϯνϚʔΫΛνΣοΫ
    17
    ͲΕΛ࢖͑͹͍͍͔
    • Ի੠ɿSUPERB
    • ԻָࣝผɿMARBLE
    ͋ͱ͸ࣗ෼Ͱࢼ͔͢͠ͳ͍…

    View full-size slide

  17. ݩͷυϝΠϯʹ஫ҙ
    18
    ͲΕΛ࢖͑͹͍͍͔
    • Վ੠ͷ৔߹ͳΒԻ੠Ͱ΋͋Δఔ౓͸ɹ
    ࢖͑ΔɼԻָͷϞσϧΑΓྑ͍৔߹΋


    • ͦΕҎ֎ͩͱ͋·Γ͏·͍͔͘ͳ͍
    έʔε΋


    • Reprogrammingͱ͍͏࿮૊ΈΛ࢖ͬ
    ͯదԠͤͯ͞͠·͏ݚڀ΋
    ՎखΛ౰ͯΔλεΫͩͱԻ੠>Իָ [Yamamoto 23]
    Wav2Vec2.0͸ϐον/ָثࣝผʹ͸ͦͷ··FTͰ࢖͏ΑΓ


    ԻָͰ࠶ֶश͢Δํ͕ྑ͍ [Legano 23]

    View full-size slide

  18. ࣄલֶशϞσϧͷར༻͸ԻָͰ΋੝Μʹʂ
    19
    • ֤छϞσϧͷ঺հ


    • Իָࣗମͷࣝผ΋ϚϧνϞʔμϧ΋ൃలதʂ


    • ·ͩ·ͩ΍Γ଍Γͯͳ͍͜ͱ΋ͨ͘͞Μ͋Δ
    ऴΘΓʹ
    Thank you!!

    View full-size slide

  19. HuBERT
    • MERTͷݩʹͳͬͨϞσϧ
    21

    View full-size slide

  20. JukeBOX
    • OpenAIൃͷԻָΛ࡞ΔϞσϧɽVQVAEϕʔε


    • https://openai.com/research/jukebox
    22
    ֶश࣌ ੜ੒࣌

    View full-size slide

  21. Reprogramming
    • Ϟσϧֶ͕शͨ͠λεΫͱࣗ෼ֶ͕श͍ͨ͠λεΫؒͷϚοϐϯάΛߦ͏


    • ԻָࣝผλεΫʹ͓͍ͯɼ؀ڥԻ͓ΑͼԻ੠Ϟσϧ͔Βద༻ͨ͠ࣄྫ͋Γ[Hung 23]
    23
    https://github.com/
    ga642381/Speech-
    Prompts-Adapters

    View full-size slide