Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human-Informed Machine Learning Models and Inte...

Hiromu Yakura
November 06, 2024

Human-Informed Machine Learning Models and Interactions

機械学習は実世界の様々なタスクを解くに至っているが,その応用を拡げていくには人間に寄り添ったインタラクションの設計も欠かせない.本トークでは,1)人間の行動や思考のパターンと機械学習モデルの間にはどのようなギャップがあるのか,2)それをどのように機械学習モデルやそのインタラクションの設計に取り入れていくのかについて,音楽やデザインの領域を対象にした具体例を含め提示する.そして,1つの強力なモデルでできることの範囲が広がっている中で,人間を起点に新たな応用を探るというアプローチの可能性について議論する.

Hiromu Yakura

November 06, 2024
Tweet

More Decks by Hiromu Yakura

Other Decks in Research

Transcript

  1.  w ໼૔େເʢ΍͘ΒͻΖΉʣ w ϚοΫεϓϥϯΫਓؒ։ൃݚڀॴ ത࢜ݚڀһ w ͜Ε·Ͱʹ(PPHMF.JDSPTPGU3FTFBSDI1I%'FMMPX w σβΠϯࢧԉ͔Βൃୡো͕͍ࣇࢧԉ·Ͱ

    ػցֶशͷԠ༻Λ֦͛Δݚڀʹैࣄ w ઌ݄ΑΓ+45͖͕͚͞ʮࣾձมֵج൫ʯʹͯ ʮػցֶश࣌୅ͷࣾձม༰Λཧղ͢Δ ج൫Ξϓϩʔνͷ૑ग़ʯͱ͍͏՝୊Λ։࢝ ࣗݾ঺հ
  2. ػցֶशͷ׆༻ͷͨΊͷ ΠϯλϥΫγϣϯݚڀ  ػցֶशͱΠϯλϥΫγϣϯͷؔ܎ੑ ػցֶशٕज़ͷݚڀ Ϣʔβ ϦεΫ΍ ݶք ࣾձͰͷ ड༰

    w ػցֶशͷ࣮Ԡ༻ʹ޲͚ͯɺͦͷΪϟοϓΛຒΊΔ w ଓʑͱ࢈·ΕΔ৽ͨͳٕज़ΛͲ͏͢Ε͹࢖͍΍͘͢ͳΔ͔ ࣮ࡍͷιϑτ΢ΣΞ։ൃΛ௨ͯ͠ݕূ͠ɺํ๏࿦Λಋ͘ w ಛʹʮਓؒʯΛग़ൃ఺ʹλεΫΛଊ͑Δ͜ͱ͕ଟ͍
  3. ػցֶशͷ׆༻ͷͨΊͷ ΠϯλϥΫγϣϯݚڀ  ػցֶशͱΠϯλϥΫγϣϯͷؔ܎ੑ ػցֶशٕज़ͷݚڀ Ϣʔβ ϦεΫ΍ ݶք ࣾձͰͷ ड༰

    w ػցֶशͷ࣮Ԡ༻ʹ޲͚ͯɺͦͷΪϟοϓΛຒΊΔ w ଓʑͱ࢈·ΕΔ৽ͨͳٕज़ΛͲ͏͢Ε͹࢖͍΍͘͢ͳΔ͔ ࣮ࡍͷιϑτ΢ΣΞ։ൃΛ௨ͯ͠ݕূ͠ɺํ๏࿦Λಋ͘ w ಛʹʮਓؒʯΛग़ൃ఺ʹλεΫΛଊ͑Δ͜ͱ͕ଟ͍ """* *+$"*
  4. Self-Supervised Contrastive Learning for Singing Voices Hiromu Yakura†‡, Kento Watanabe‡,

    Masataka Goto‡ † University of Tsukuba, Japan ‡ AIST, Japan IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022 
  5.  w ՎखࣝผɿՎ੠͔ΒରԠ͢ΔՎखΛࣝผ͢ΔλεΫ w Վ੠͔ΒͷՎखݕࡧͳͲʹԠ༻͢Δ͜ͱ͕Ͱ͖Δ w ػցֶशΛ༻͍ͨΞϓϩʔν͕෯޿͘༻͍ΒΕ͖͕ͯͨ ͦͷੑೳ͸ֶशʹ༻͍Δσʔληοτͷ࣭ʹେ͖͘ґଘ͢Δ w Վख৘ใ΍Վ੠ͷੑ࣭͕ద੾ʹΞϊςʔγϣϯ͞Εͨ

    େن໛σʔληοτΛ४උ͢Δͷ͸ϋʔυϧ͸ߴ͍ ՎखࣝผλεΫʹ͓͚Δσʔληοτ্ͷ੍໿ ࣗݾڭࢣ͋ΓֶशʹΑͬͯಛ௃ྔදݱΛ֫ಘ͢Δ͜ͱͰ ͦ͏ͨ͠σʔληοτ΁ґଘ͠ͳ͍ख๏Λ࣮ݱ͍ͨ͠
  6.  w ࣗݾڭࢣ͋Γରরֶश< > ͸ϥϕϧͳ͠ͷσʔληοτ͔Β ಛ௃ྔදݱΛ֫ಘͰ͖ɺը૾υϝΠϯͰ޿·Γͭͭ͋Δ ࣗݾڭࢣ͋ΓରরֶशʹΑΔಛ௃ྔදݱͷ֫ಘ [5] Jaiswal, A.

    et al.: A survey on contrastive self-supervised learning, Technologies (2021). [6] Jing, L. and Tian, Y.: Self-supervised visual feature learning with deep neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell. (2021). w ػցతʹม׵ͨ͠ೖྗର͕ ࣅͨજࡏදݱʹͳΔΑ͏ʹ ਂ૚ֶशϞσϧΛ܇࿅͢Δ
  7.  w ࣗݾڭࢣ͋Γରরֶश< > ͸ϥϕϧͳ͠ͷσʔληοτ͔Β ಛ௃ྔදݱΛ֫ಘͰ͖ɺը૾υϝΠϯͰ޿·Γͭͭ͋Δ ࣗݾڭࢣ͋ΓରরֶशʹΑΔಛ௃ྔදݱͷ֫ಘ [5] Jaiswal, A.

    et al.: A survey on contrastive self-supervised learning, Technologies (2021). [6] Jing, L. and Tian, Y.: Self-supervised visual feature learning with deep neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell. (2021). ୯७ʹը૾ͱಉ͡ܗͰ ϞσϧΛ܇࿅ͯ͠΋ ਫ਼౓͸શ͘޲্ͤͣ w ػցతʹม׵ͨ͠ೖྗର͕ ࣅͨજࡏදݱʹͳΔΑ͏ʹ ਂ૚ֶशϞσϧΛ܇࿅͢Δ
  8.  w Վ੠ͷੑ࣭͸੠࣭ͱՎ͍ํʹ෼ղͯ͠ߟ͑Δ͜ͱ͕Ͱ͖Δ w ੠࣭͸εϖΫτϧแབྷ΍ϑΥϧϚϯτʹґଘ͢Δ w Վ͍ํ͸Ϗϒϥʔτ΍ΞʔςΟΩϡϨʔγϣϯʹݱΕΔ Վ੠ͷߏ੒ཁૉͱͦͷ੠࣭ ָثతͳ੠࣭ ʷ

    Ͳͷఔ౓λϝΔ͔ ϏϒϥʔτͷՃݮ ͦΕͧΕͷදݱʹΑΔՎ͍ํ ੠ಓͷܗ ͳͲʹ༝དྷ ʹ ࠷ऴతͳՎ੠ ͜ΕΒ͕ʮࣅ͍ͯΔʯͱ͸Ͳ͏͍͏ঢ়ଶ͔Λߟ͑ͯ ࣗݾڭࢣ͋ΓରরֶशΛઃܭ͢Δ
  9. w ͦ΋ͦ΋ʮ੠࣭͕ࣅ͍ͯΔʯͱ͸Ͳ͏͍͏͜ͱ͔ w ػցతʹϐονΛม͑Δͱશવҧ͏੠࣭ʹฉ͑͜Δ w ָثͷԻ৭ͱಉ༷ʹɺ੠࣭͸ प೾਺ଳͷ෼෍Ͱଊ͑ΒΕΔ w ಉ͡ਓ͕ҧ͏ߴ͞ͷԻΛ Վͬͯ΋େ·͔ͳࢁͷܗ͸ෆม

    w ػցతʹϐονΛม͑Δͱ ࢁͷܗ΋มΘͬͯ͠·͏  ੠࣭͕ࣅ͍ͯΔͱ͸Ͳ͏͍͏͜ͱ͔ ϐʔΫ ʢॎ๮ʣ ͷ ִؒ͸Ұఆ ػցతͳϐον ม׵ͷΠϝʔδ ग़య: S. Duvvuru, et al.: The Effect of Timbre, Pitch, and Vibrato on Vocal Pitch-Matching Accuracy. Journal of Voice (2016).
  10.  Վ͍ํ͕ࣅ͍ͯΔͱ͸Ͳ͏͍͏͜ͱ͔ [31] Kako, T. et al.: Automatic identification for

    singing style based on sung melodic contour characterized in phase plane, ISMIR (2009). ػցతͳλΠϜ ετϨονͷΠϝʔδ w Ͱ͸ʮՎ͍ํ͕ࣅ͍ͯΔʯͱ͸Ͳ͏͍͏͜ͱ͔ w ඍখ࣌ؒ಺ʹൃੜ͢ΔΞʔςΟΩϡϨʔγϣϯʹண໨ w Իߴͷ੾ΓସΘΓͰͷ ' ͷޯ഑͸ ݸਓͷՎ͍ํ͕൓ө͞ΕΔ<>  w ಉ͡ਓ͕Ώͬ͘ΓՎͬͨ৔߹΋ ' ͷಈ͖ํ͸มΘΒͳ͍ w ػցతʹλΠϜετϨον͢Δͱ ' ͷࡉ͔ͳޯ഑ͷܗ΋มΘͬͯ͠·͏
  11.  w ·ͣՎख৘ใͳ͠ͷՎ੠σʔλͰࣗݾڭࢣ͋ΓֶशΛ࣮ࢪ w   ۂ ʷ  ඵͷԻָ

    %# Λɺ Վ੠෼཭<> ͯ͠܇࿅ʹ࢖༻ w ΞʔςΟετ໊΍ͦͷଞϝλσʔλ͸܇࿅ʹؚΊͣ w ্هͱ͸ผʹɺ ໊ ʷ  ۂͷσʔληοτΛߏங͠ μ΢ϯετϦʔϜλεΫͱͯ͠Վखࣝผͷਫ਼౓Λൺֱ w طଘख๏<> ͔ΒQU ఔͷ ਫ਼౓޲্Λ֬ೝ w Վ੠ͷੑ࣭Λ׆͔ͨ͠ ઃܭͷ༗ޮੑ΋֬ೝ ՎखࣝผλεΫͰͷධՁ [35] JHennequin, R. et al.: Spleeter: A fast and efficient music source separation tool with pre-trained models, J. Open Source Softw. (2020). [12] Spijkervet, J. and Burgoyne, J. A.: Contrastive learning of musical representations, ISMIR (2021).
  12.  ੠࣭ʹಛԽͨ͠ಛ௃ྔදݱ Վ͍ํͷҧ͍͸ແࢹ͠ ੠࣭ͷҧ͍Λଊ͑Δ ಛ௃ྔදݱ͕ಘΒΕΔ λΠϜετϨονͨ͠ Վ੠͸ಉ͡΋ͷɺ ϐονΛม͑ͨՎ੠͸ ผ΋ͷͱͯ͠ѻ͏ w

    ม׵ͷ࢖͍ํΛ૊Έସ͑ͯɺఏҊख๏Λ֦ு͢Δ͜ͱ΋Մೳ w ྫ͑͹ɺՎ͍ํʹؔ܎ͳ͘੠࣭͚ͩͷྨࣅ౓Λࢉग़Ͱ͖Δ
  13. Tool- and Domain-Agnostic Parameterization of Style Transfer E ff ects

    Leveraging Pretrained Perceptual Metrics Hiromu Yakura†, Yuki Koyama‡, Masataka Goto‡ † University of Tsukuba, Japan ‡ AIST, Japan IJCAI 2021 
  14.  w ਂ૚ֶश͕͜Ε·Ͱʹͳ͔ͬͨΑ͏ͳίϯςϯπੜ੒Λ࣮ݱ w ಛʹελΠϧసҠ͸෯޿͍υϝΠϯͰख๏ͷఏҊ͕͋Δ എܠਂ૚ֶशʹΑΔελΠϧసҠख๏ͷ޿͕Γ ෩ܠࣸਅ <-J > ΦϦδφϧ

    ϦϑΝϨϯε ελΠϧΛసҠ ϝΠΫࣸਅ <$IBOH > ("/ Y. Li, et al. A Closed-form Solution to Photorealistic Image Stylization. ECCV 2018. H. Chang. et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup. CVPR 2018.
  15.  w ͳͥελΠϧసҠ͕૑࡞πʔϧͱͯ͠޿·Βͳ͍ͷ͔ʁ   w ࠷ॳ͔Β׬ᘳͳΰʔϧΠϝʔδΛ͍࣋ͬͯΔͷ͸كͰɺ ΅΍ͬͱͨ͠Πϝʔδ͔ΒσβΠϯΛ࢝ΊΔͷ͕΄ͱΜͲ w ༷ʑͳࢼߦࡨޡΛ܁Γฦ͢தͰɺηϨϯσΟϐςΟతʹ

    ϏϏοͱ͖ͨ΋ͷ͕ɺ݁Ռͱͯ͠׬੒඼ͱͳΔ w ݱࡏͷελΠϧసҠ͸ɺϫϯγϣοτͰਫ਼៛ͳ݁ՌΛੜΉ͕ ਓ͕ؒࡉ͔͘ࢼߦࡨޡ͠ͳ͕Β୳ࡧ͍ͯ͘͠༨஍͕ͳ͍ എܠελΠϧసҠͱզʑͷσβΠϯϓϩηεͱͷෆҰக ਓؒͷσβΠϯϓϩηε͸ԟʑʹͯ͠୳ࡧతͰ<5BMUPO > ͭͷ݁Ռ͚ͩΛग़ྗ͢Δख๏ͱೃછ·ͳ͍͔Β J. Talton, et al. Exploratory modeling with collaborative design spaces. ACM Trans. Graph. 28(5). 2009.
  16.  എܠ୳ࡧతϓϩηεͱ &OEUP&OE ελΠϧసҠͷࠩ େ·͔ͳ ׬੒Πϝʔδ ୳ࡧతϓϩηε ҉͞ ίϯτϥετ Ϗωοτ

    ΦϦδφϧ ͜Ε͕ ͍͍͔΋ʂ &OEUP&OEͷελΠϧసҠ ҉͞ ίϯτϥετ Ϗωοτ ΦϦδφϧ ελΠϧͷ ϦϑΝϨϯε ࢼߦࡨޡ͠ͳ͕Β௚ײతʹ σβΠϯۭؒΛཧղ͍ͯ͘͠ ϫϯγϣοτͰ ϕετͳ݁Ռʹ ग़ձ͏ͷ͸ ೉͍͠ ʷ ʷ ʷ ͍͔ʹࣅͤΔ͔Λ໨ࢦ͢ख๏Ͱ ࢼߦࡨޡʹ͸޲͔ͳ͍
  17.  w ࣅͤͨ݁ՌͰ͸ͳ͘ɺࣅͤΔํ๏Λڭ͑Δͱ͍͏ΞΠσΞ w DG࿝ࢠʮڕΛ༩͑ΔΑΓ௼ΓํΛڭ͑Αʯ ఏҊύϥϝτϦοΫͳม׵ʹΑΔελΠϧసҠͷ໛฿ ׳Ε਌͠Μͩπʔϧ಺ͰͷࣅͤํΛڭ͑ͯ͋͛Ε͹ Ϣʔβ͸͔ͦ͜Βࣗ༝ʹฤूɾ୳ࡧͰ͖Δ ͲͷϑΟϧλͱ ͲΜͳύϥϝλΛ

    ࢖͑͹ࣅͤΒΕΔ͔ ΦϦδφϧ ϦϑΝϨϯε సҠ݁Ռ ݱঢ় ϒϥοΫ ϘοΫε ݁ՌɺϢʔβ͕ Α͍ͱࢥ͑ͨ෺ ʷ ΦϦδφϧ ϦϑΝϨϯε సҠ݁Ռ ݁ՌɺϢʔβ͕ Α͍ͱࢥ͑ͨ෺ ఏҊ ೖྗ ೖྗ
  18.  w Ͱ͸Ͳ͏΍ͬͯࣅͤํʢύϥϝλ΍ม׵ʣΛٻΊΔ͔ʁ w ࠷దԽ໰୊ͱͯ͠ଊ͑ͯ΋ɺ ม׵݁ՌͱϦϑΝϨϯεͷྨࣅ౓Λ ௚઀ൺֱͰ͖ͣ໨తؔ਺͕࡞Εͳ͍  w ͭͷ伴ͱͳΔཁૉΛಋೖ

    w ("/ͷજࡏදݱʹΑΔ஌֮తई౓ w ϒϥοΫϘοΫε࠷దԽ ఏҊύϥϝτϦοΫͳม׵ʹΑΔελΠϧసҠͷ໛฿ ม׵݁Ռ ϦϑΝϨϯε ݩʑͷࣸਅ͕ҧ͏ͷͰ ͲΕ͘Β͍ࣅ͍ͯΔ͔ͷ ܭࢉػతͳ൑அ͕೉͍͠
  19.  w ("/ʹΑΔελΠϧసҠͰ͸ɺ&ODPEFS͕ελΠϧͱ಺༰Λ ෼཭͢Δ͜ͱͰɺελΠϧͷΈͷసҠΛՄೳʹ͍ͯ͠Δ      w

    ֶशࡁΈϞσϧͷ&ODPEFSͰಘΒΕΔελΠϧͷજࡏදݱΛ ൺֱ͢Ε͹ɺͲΕ͘Β͍ࣅ͍ͯΔ͔ͷई౓ΛಘΒΕΔ w ͔͠΋ɺࣸਅ΍ϝΠΫελΠϧͳͲɺϞσϧ͑͋͞Ε͹ ෯޿͍ର৅Λ࠶ֶशͳ͠ʹѻ͏͜ͱ͕Ͱ͖Δ ఏҊ("/ͷજࡏදݱʹΑΔ஌֮తई౓ ࣅͨελΠϧͱΘ͔Δ
  20.  w ϒϥοΫϘοΫε࠷దԽΛ༻͍ɺ஌֮తई౓͕ۙ͘ͳΔΑ͏ ϑΟϧλ΍ύϥϝλΛࣗಈతʹ୳ࡧ͍ͯ͘͠      

    w ϒϥοΫϘοΫε࠷దԽ͸༻͍Δม׵Λ੍໿͠ͳ͍ͨΊɺ *OTUBHSBN͚ͩͰͳ༷͘ʑͳπʔϧͰͷࣅͤํΛ୳ࡧՄೳ ఏҊϒϥοΫϘοΫε࠷దԽʹΑΔࣅͤํͷ୳ࡧ
  21. Empirical evidence of Large Language Model's influence on human spoken

    communication Hiromu Yakura*, Ezequiel Lopez-Lopez*, Levin Brinkmann*, Ignacio Serna, Prateek Gupta, Iyad Rahwan *: equal contribution Max-Planck Institute for Human Development  arXiv 2409.01754
  22. $IBU(15 ͷ EFMWF όΠΞε ‣ $IBU(15 ͸ͳ͔ͥ EFMWF ͱ͍͏୯ޠΛ࢖͍͕ͪͱ͍͏ݱ৅͕ ޿͘஌ΒΕͭͭ͋Δ

    ‣ ଞʹ΋ $IBU(15 ʹಛ௃తͳ ୯ޠ͕ൃݟ͞Εɺ࿦จ౳Ͱ ग़ݱස౓ͷ૿Ճ͕ࢦఠ͞ΕΔ  W. Liang, et al. Mapping the Increasing Use of LLMs in Scientific Papers. Proc. CoLM (2024). https://pshapira.net/2024/03/31/delving-into-delve/
  23. ࣗ෼Ͱ΋ৼΓฦͬͯΈΔͱ ‣ ࣗ਎΋Α͘ $IBU(15 ͰӳޠϝʔϧͷจষͳͲΛఴ࡟ ‣ ͦͯ͠ɺ$IBU(15 Λ࢖͍ͬͯͳ͍ͱ͖ʹ  EFMWF

    ͱ͍͏୯ޠΛ࢖͏บ͕఻છ͍ͯ͠Δؾ͕ ‣ ੈքͷֶज़ػؔͷ:PV5VCFνϟϯωϧ͔Β෼Ҏ্ͷ ಈըສ݅Λऩू͠ɺॻ͖ى্ͨ͜͠͠ͰมԽΛ෼ੳ  Ծઆ$IBU(15ͷଘࡏ͕ਓؒͷ ݴޠίϛϡχέʔγϣϯΛม͑ͭͭ͋ΔͷͰ͸ͳ͍͔
  24. ػցֶश࣌୅ͷ ਓؒཧղͷݚڀ @ ػցֶशͷ׆༻ͷͨΊͷ ΠϯλϥΫγϣϯݚڀ  ػցֶशͱΠϯλϥΫγϣϯͷؔ܎ੑ ػցֶशٕज़ͷݚڀ Ϣʔβ ϦεΫ΍

    ݶք ࣾձͰͷ ड༰ w ػցֶशٕज़͕࣋ͭྗ͕ڧ͘ͳͬͨࠓͦ͜ ػցֶशͷϥετϫϯϚΠϧ໰୊ʢʹͲ͏ಧ͚Δ͔ʣ͕ॏཁʹ w ਓؒΛग़ൃ఺ʹλεΫΛଊ͑Δ͜ͱ͕ώϯτʹͳΔ͔΋ w ʮՊֶతڵຯʯʹجͮ͘ݚڀ΋໘ന͘ͳΓͦ͏ͳ༧ײ͕