Human-Informed Machine Learning
Models and Interactions
Hiromu Yakura
Max-Planck Institute for Human Development
IBIS 2024
Slide 2
Slide 2 text
w େເʢ͘ΒͻΖΉʣ
w ϚοΫεϓϥϯΫਓؒ։ൃݚڀॴ
ത࢜ݚڀһ
w ͜Ε·Ͱʹ(PPHMF.JDSPTPGU3FTFBSDI1I%'FMMPX
w σβΠϯࢧԉ͔Βൃୡো͕͍ࣇࢧԉ·Ͱ
ػցֶशͷԠ༻Λ֦͛Δݚڀʹैࣄ
w ઌ݄ΑΓ+45͖͕͚͞ʮࣾձมֵج൫ʯʹͯ
ʮػցֶश࣌ͷࣾձม༰Λཧղ͢Δ
ج൫Ξϓϩʔνͷग़ʯͱ͍͏՝Λ։࢝
ࣗݾհ
Slide 3
Slide 3 text
ػցֶशͷ׆༻ͷͨΊͷ
ΠϯλϥΫγϣϯݚڀ
ػցֶशͱΠϯλϥΫγϣϯͷؔੑ
ػցֶशٕज़ͷݚڀ
Ϣʔβ
ϦεΫ
ݶք
ࣾձͰͷ
ड༰
w ػցֶशͷ࣮Ԡ༻ʹ͚ͯɺͦͷΪϟοϓΛຒΊΔ
w ଓʑͱ࢈·ΕΔ৽ͨͳٕज़ΛͲ͏͢Ε͍͘͢ͳΔ͔
࣮ࡍͷιϑτΣΞ։ൃΛ௨ͯ͠ݕূ͠ɺํ๏Λಋ͘
w ಛʹʮਓؒʯΛग़ൃʹλεΫΛଊ͑Δ͜ͱ͕ଟ͍
Slide 4
Slide 4 text
ػցֶशͷ׆༻ͷͨΊͷ
ΠϯλϥΫγϣϯݚڀ
ػցֶशͱΠϯλϥΫγϣϯͷؔੑ
ػցֶशٕज़ͷݚڀ
Ϣʔβ
ϦεΫ
ݶք
ࣾձͰͷ
ड༰
w ػցֶशͷ࣮Ԡ༻ʹ͚ͯɺͦͷΪϟοϓΛຒΊΔ
w ଓʑͱ࢈·ΕΔ৽ͨͳٕज़ΛͲ͏͢Ε͍͘͢ͳΔ͔
࣮ࡍͷιϑτΣΞ։ൃΛ௨ͯ͠ݕূ͠ɺํ๏Λಋ͘
w ಛʹʮਓؒʯΛग़ൃʹλεΫΛଊ͑Δ͜ͱ͕ଟ͍
"""*
*+$"*
Slide 5
Slide 5 text
Self-Supervised Contrastive Learning
for Singing Voices
Hiromu Yakura†‡, Kento Watanabe‡, Masataka Goto‡
† University of Tsukuba, Japan
‡ AIST, Japan
IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
Slide 6
Slide 6 text
w ՎखࣝผɿՎ͔ΒରԠ͢ΔՎखΛࣝผ͢ΔλεΫ
w Վ͔ΒͷՎखݕࡧͳͲʹԠ༻͢Δ͜ͱ͕Ͱ͖Δ
w ػցֶशΛ༻͍ͨΞϓϩʔν͕෯͘༻͍ΒΕ͖͕ͯͨ
ͦͷੑೳֶशʹ༻͍Δσʔληοτͷ࣭ʹେ͖͘ґଘ͢Δ
w ՎखใՎͷੑ࣭͕దʹΞϊςʔγϣϯ͞Εͨ
େنσʔληοτΛ४උ͢Δͷϋʔυϧߴ͍
ՎखࣝผλεΫʹ͓͚Δσʔληοτ্ͷ੍
ࣗݾڭࢣ͋ΓֶशʹΑͬͯಛྔදݱΛ֫ಘ͢Δ͜ͱͰ
ͦ͏ͨ͠σʔληοτґଘ͠ͳ͍ख๏Λ࣮ݱ͍ͨ͠
Slide 7
Slide 7 text
w ࣗݾڭࢣ͋Γରরֶश<
>
ϥϕϧͳ͠ͷσʔληοτ͔Β
ಛྔදݱΛ֫ಘͰ͖ɺը૾υϝΠϯͰ·Γͭͭ͋Δ
ࣗݾڭࢣ͋ΓରরֶशʹΑΔಛྔදݱͷ֫ಘ
[5] Jaiswal, A. et al.: A survey on contrastive self-supervised learning, Technologies (2021).
[6] Jing, L. and Tian, Y.: Self-supervised visual feature learning with deep neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell. (2021).
w ػցతʹมͨ͠ೖྗର͕
ࣅͨજࡏදݱʹͳΔΑ͏ʹ
ਂֶशϞσϧΛ܇࿅͢Δ
Slide 8
Slide 8 text
w ࣗݾڭࢣ͋Γରরֶश<
>
ϥϕϧͳ͠ͷσʔληοτ͔Β
ಛྔදݱΛ֫ಘͰ͖ɺը૾υϝΠϯͰ·Γͭͭ͋Δ
ࣗݾڭࢣ͋ΓରরֶशʹΑΔಛྔදݱͷ֫ಘ
[5] Jaiswal, A. et al.: A survey on contrastive self-supervised learning, Technologies (2021).
[6] Jing, L. and Tian, Y.: Self-supervised visual feature learning with deep neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell. (2021).
୯७ʹը૾ͱಉ͡ܗͰ
ϞσϧΛ܇࿅ͯ͠
ਫ਼શ্ͤͣ͘
w ػցతʹมͨ͠ೖྗର͕
ࣅͨજࡏදݱʹͳΔΑ͏ʹ
ਂֶशϞσϧΛ܇࿅͢Δ
Slide 9
Slide 9 text
w Վͷੑ࣭࣭ͱՎ͍ํʹղͯ͠ߟ͑Δ͜ͱ͕Ͱ͖Δ
w ࣭εϖΫτϧแབྷϑΥϧϚϯτʹґଘ͢Δ
w Վ͍ํϏϒϥʔτΞʔςΟΩϡϨʔγϣϯʹݱΕΔ
Վͷߏཁૉͱͦͷ࣭
ָثతͳ࣭
ʷ
ͲͷఔλϝΔ͔ ϏϒϥʔτͷՃݮ
ͦΕͧΕͷදݱʹΑΔՎ͍ํ
ಓͷܗ
ͳͲʹ༝དྷ
ʹ
࠷ऴతͳՎ
͜ΕΒ͕ʮࣅ͍ͯΔʯͱͲ͏͍͏ঢ়ଶ͔Λߟ͑ͯ
ࣗݾڭࢣ͋ΓରরֶशΛઃܭ͢Δ
w ͦͦʮ࣭͕ࣅ͍ͯΔʯͱͲ͏͍͏͜ͱ͔
w ػցతʹϐονΛม͑Δͱશવҧ͏࣭ʹฉ͑͜Δ
w ָثͷԻ৭ͱಉ༷ʹɺ࣭
पଳͷͰଊ͑ΒΕΔ
w ಉ͡ਓ͕ҧ͏ߴ͞ͷԻΛ
Վͬͯେ·͔ͳࢁͷܗෆม
w ػցతʹϐονΛม͑Δͱ
ࢁͷܗมΘͬͯ͠·͏
࣭͕ࣅ͍ͯΔͱͲ͏͍͏͜ͱ͔
ϐʔΫ
ʢॎʣ
ͷ
ִؒҰఆ
ػցతͳϐον
มͷΠϝʔδ
ग़య: S. Duvvuru, et al.: The Effect of Timbre, Pitch, and Vibrato on Vocal Pitch-Matching Accuracy. Journal of Voice (2016).
Slide 12
Slide 12 text
Վ͍ํ͕ࣅ͍ͯΔͱͲ͏͍͏͜ͱ͔
[31] Kako, T. et al.: Automatic identification for singing style based on sung melodic contour characterized in phase plane, ISMIR (2009).
ػցతͳλΠϜ
ετϨονͷΠϝʔδ
w ͰʮՎ͍ํ͕ࣅ͍ͯΔʯͱͲ͏͍͏͜ͱ͔
w ඍখ࣌ؒʹൃੜ͢ΔΞʔςΟΩϡϨʔγϣϯʹண
w ԻߴͷΓସΘΓͰͷ
'
ͷޯ
ݸਓͷՎ͍ํ͕ө͞ΕΔ<>
w ಉ͡ਓ͕Ώͬ͘ΓՎͬͨ߹
'
ͷಈ͖ํมΘΒͳ͍
w ػցతʹλΠϜετϨον͢Δͱ
'
ͷࡉ͔ͳޯͷܗมΘͬͯ͠·͏
Slide 13
Slide 13 text
ࣗݾڭࢣ͋ΓֶशʹΑΔՎελΠϧͷಛྔදݱͷ֫ಘ
w ͜ΕΒΛө͠ɺՎʹಛԽͨ͠ಛྔදݱΛ֫ಘ͢Δ
ࣗݾڭࢣ͋Γରরֶशͷख๏ΛఏҊ
࣭Վ͍ํͷҧ͍Λ
หผ͢Δಛྔදݱʹ
ͳΔͱߟ͑ΒΕΔ
ػցతʹϐονγϑτ
λΠϜετϨονͨ͠
ՎผϞϊͱͯ͠ѻ͏
Slide 14
Slide 14 text
w ·ͣՎखใͳ͠ͷՎσʔλͰࣗݾڭࢣ͋ΓֶशΛ࣮ࢪ
w
ۂ
ʷ
ඵͷԻָ
%#
Λɺ
Վ<>
ͯ͠܇࿅ʹ༻
w ΞʔςΟετ໊ͦͷଞϝλσʔλ܇࿅ʹؚΊͣ
w ্هͱผʹɺ
໊
ʷ
ۂͷσʔληοτΛߏங͠
μϯετϦʔϜλεΫͱͯ͠Վखࣝผͷਫ਼Λൺֱ
w طଘख๏<>
͔ΒQU
ఔͷ
ਫ਼্Λ֬ೝ
w Վͷੑ࣭Λ׆͔ͨ͠
ઃܭͷ༗ޮੑ֬ೝ
ՎखࣝผλεΫͰͷධՁ
[35] JHennequin, R. et al.: Spleeter: A fast and efficient music source separation tool with pre-trained models, J. Open Source Softw. (2020).
[12] Spijkervet, J. and Burgoyne, J. A.: Contrastive learning of musical representations, ISMIR (2021).
Slide 15
Slide 15 text
࣭ʹಛԽͨ͠ಛྔදݱ
Վ͍ํͷҧ͍ແࢹ͠
࣭ͷҧ͍Λଊ͑Δ
ಛྔදݱ͕ಘΒΕΔ
λΠϜετϨονͨ͠
Վಉ͡ͷɺ
ϐονΛม͑ͨՎ
ผͷͱͯ͠ѻ͏
w มͷ͍ํΛΈସ͑ͯɺఏҊख๏Λ֦ு͢Δ͜ͱՄೳ
w ྫ͑ɺՎ͍ํʹؔͳ࣭͚ͩ͘ͷྨࣅΛࢉग़Ͱ͖Δ
Slide 16
Slide 16 text
Վ͍ํʹಛԽͨ͠ಛྔදݱ
࣭ͷҧ͍ແࢹ͠
Վ͍ํͷҧ͍Λଊ͑Δ
ಛྔදݱ͕ಘΒΕΔ
λΠϜετϨονͨ͠
Վผͷɺ
ϐονΛม͑ͨՎ
ಉ͡ͷͱͯ͠ѻ͏
w มͷ͍ํΛΈସ͑ͯɺఏҊख๏Λ֦ு͢Δ͜ͱՄೳ
w ٯʹɺ࣭ʹؔͳ͘Վ͍ํ͚ͩͷྨࣅࢉग़Ͱ͖Δ
Slide 17
Slide 17 text
w ਓؒͷՎͷಛੑʹண͢Δ͜ͱͰ
ࣗݾڭࢣ͋ΓରরֶशʹΑΔ৽ͨͳಛྔ֫ಘख๏ΛఏҊ
w ϐονγϑτͱλΠϜετϨονΛޮՌతʹ༻͍Δ͜ͱͰ
ՎखࣝผλεΫͷਫ਼Λେ෯ʹ্ͤ͞ΒΕΔ͜ͱΛ֬ೝ
w ࣭ɾՎ͍ํͷ͍ͣΕ͔ͷΈʹ͍ͭͯͷྨࣅΛଊ͑ΔΑ͏
ఏҊख๏Λ֦ு͢Δ͜ͱͰ͖Δͱ֬ೝ
w ྫ͑ɺՎ͍ํ·ͩ·͕࣭͕ͩͩࣅ͍ͯΔ
࠽ೳͷݪੴΛݟ͚ͭΔͱ͍ͬͨํੑʜʁ
·ͱΊ
Ϣʔβͷଆ͔ΒΛଊ͑͢͜ͱͰ
طଘͷֶशख๏Λ֦େͯ͠৽ͨͳ͍ಓΛੜΈग़͢
Slide 18
Slide 18 text
Tool- and Domain-Agnostic
Parameterization of Style Transfer E
ff
ects
Leveraging Pretrained Perceptual Metrics
Hiromu Yakura†, Yuki Koyama‡, Masataka Goto‡
† University of Tsukuba, Japan
‡ AIST, Japan
IJCAI 2021
Slide 19
Slide 19 text
w ਂֶश͕͜Ε·Ͱʹͳ͔ͬͨΑ͏ͳίϯςϯπੜΛ࣮ݱ
w ಛʹελΠϧసҠ෯͍υϝΠϯͰख๏ͷఏҊ͕͋Δ
എܠਂֶशʹΑΔελΠϧసҠख๏ͷ͕Γ
෩ܠࣸਅ
<-J
>
ΦϦδφϧ ϦϑΝϨϯε ελΠϧΛసҠ
ϝΠΫࣸਅ
<$IBOH
>
("/
Y. Li, et al. A Closed-form Solution to Photorealistic Image Stylization. ECCV 2018.
H. Chang. et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup. CVPR 2018.
Slide 20
Slide 20 text
w Ұํɺ࡞දݱͷπʔϧͱͯ͘͠ΘΕΔʹ·ͩࢸΒͣ
w ྫʣ1SJTNBͱ͍͏ࣸਅՃΞϓϦ͕େྲྀߦͨ͠ͷͷ
ɹɹ͍·Ͱ*OTUBHSBNʹ͍ͬͯΔϢʔβ͕େ
എܠελΠϧసҠͷ࡞πʔϧͱͯ͠ͷ·Βͳ͍ར༻
ग़య: https://jp.techcrunch.com/2016/07/20/20160719prismagram/
Slide 21
Slide 21 text
w ͳͥελΠϧసҠ͕࡞πʔϧͱͯ͠·Βͳ͍ͷ͔ʁ
w ࠷ॳ͔ΒᘳͳΰʔϧΠϝʔδΛ͍࣋ͬͯΔͷكͰɺ
΅ͬͱͨ͠Πϝʔδ͔ΒσβΠϯΛ࢝ΊΔͷ͕΄ͱΜͲ
w ༷ʑͳࢼߦࡨޡΛ܁Γฦ͢தͰɺηϨϯσΟϐςΟతʹ
ϏϏοͱ͖ͨͷ͕ɺ݁Ռͱͯ͠ͱͳΔ
w ݱࡏͷελΠϧసҠɺϫϯγϣοτͰਫ਼៛ͳ݁ՌΛੜΉ͕
ਓ͕ؒࡉ͔͘ࢼߦࡨޡ͠ͳ͕Β୳ࡧ͍ͯ͘͠༨͕ͳ͍
എܠελΠϧసҠͱզʑͷσβΠϯϓϩηεͱͷෆҰக
ਓؒͷσβΠϯϓϩηεԟʑʹͯ͠୳ࡧతͰ<5BMUPO
>
ͭͷ݁Ռ͚ͩΛग़ྗ͢Δख๏ͱೃછ·ͳ͍͔Β
J. Talton, et al. Exploratory modeling with collaborative design spaces. ACM Trans. Graph. 28(5). 2009.
w ࣅͤͨ݁ՌͰͳ͘ɺࣅͤΔํ๏Λڭ͑Δͱ͍͏ΞΠσΞ
w DGࢠʮڕΛ༩͑ΔΑΓΓํΛڭ͑Αʯ
ఏҊύϥϝτϦοΫͳมʹΑΔελΠϧసҠͷ฿
ͲͷϑΟϧλΛ͍ɺ
ͲΜͳύϥϝλʹ͢Ε
ελΠϧ͕ࣅΔ͔͕͔Δ
׳Ε͠ΜͩπʔϧͰͷࣅͤํ͕Θ͔Δͱ
ͦΕΛͱʹࣗ༝ʹ୳ࡧ͢Δͷ༰қ
ϦϑΝϨϯε
ม݁Ռ
ΦϦδφϧ
ϦϑΝϨϯε
ม݁Ռ
ΦϦδφϧ
Slide 25
Slide 25 text
w ͰͲ͏ͬͯࣅͤํʢύϥϝλมʣΛٻΊΔ͔ʁ
w ࠷దԽͱͯ͠ଊ͑ͯɺ
ม݁ՌͱϦϑΝϨϯεͷྨࣅΛ
ൺֱͰ͖ͣత͕ؔ࡞Εͳ͍
w ͭͷ伴ͱͳΔཁૉΛಋೖ
w ("/ͷજࡏදݱʹΑΔ֮తई
w ϒϥοΫϘοΫε࠷దԽ
ఏҊύϥϝτϦοΫͳมʹΑΔελΠϧసҠͷ฿
ม݁Ռ ϦϑΝϨϯε
ݩʑͷࣸਅ͕ҧ͏ͷͰ
ͲΕ͘Β͍ࣅ͍ͯΔ͔ͷ
ܭࢉػతͳஅ͕͍͠
Slide 26
Slide 26 text
w ("/ʹΑΔελΠϧసҠͰɺ&ODPEFS͕ελΠϧͱ༰Λ
͢Δ͜ͱͰɺελΠϧͷΈͷసҠΛՄೳʹ͍ͯ͠Δ
w ֶशࡁΈϞσϧͷ&ODPEFSͰಘΒΕΔελΠϧͷજࡏදݱΛ
ൺֱ͢ΕɺͲΕ͘Β͍ࣅ͍ͯΔ͔ͷईΛಘΒΕΔ
w ͔͠ɺࣸਅϝΠΫελΠϧͳͲɺϞσϧ͑͋͞Ε
෯͍ରΛ࠶ֶशͳ͠ʹѻ͏͜ͱ͕Ͱ͖Δ
ఏҊ("/ͷજࡏදݱʹΑΔ֮తई
ࣅͨελΠϧͱΘ͔Δ
Slide 27
Slide 27 text
w ϒϥοΫϘοΫε࠷దԽΛ༻͍ɺ֮తई͕ۙ͘ͳΔΑ͏
ϑΟϧλύϥϝλΛࣗಈతʹ୳ࡧ͍ͯ͘͠
w ϒϥοΫϘοΫε࠷దԽ༻͍ΔมΛ੍͠ͳ͍ͨΊɺ
*OTUBHSBN͚ͩͰͳ༷͘ʑͳπʔϧͰͷࣅͤํΛ୳ࡧՄೳ
ఏҊϒϥοΫϘοΫε࠷దԽʹΑΔࣅͤํͷ୳ࡧ
Slide 28
Slide 28 text
w *OTUBHSBN
4/08
ͳͲͷ
"1*
ͷͳ͍ΞϓϦͰ͋ͬͯ
"OESPJE&NVMBUPS
ͱςετϥΠϒϥϦͰࣗಈతʹૢ࡞
w ֶशࡁΈ֮ϞσϧͱϕΠζ࠷దԽͰঃʑʹ͚͍ۙͮͯ͘
࣮
Slide 29
Slide 29 text
w ʮࣅͤͨ݁ՌͰͳ͘ɺࣅͤΔํ๏Λڭ͑Δʯͱ͍͏खஈͷ
࣮ݱՄೳੑΛ෩ܠࣸਅɾϝΠΫՃͷγφϦΦͰݕূ
w ΫϥυϫʔΧʹΑΔୈࡾऀओ؍ධՁͰΫΦϦςΟΛධՁ
ධՁ݁Ռ
ϦϑΝϨϯε ΦϦδφϧ ఏҊख๏ ڠྗऀ" ڠྗऀ# ϦϑΝϨϯε ΦϦδφϧ ఏҊख๏ ڠྗऀ" ڠྗऀ#
Slide 30
Slide 30 text
w ਓؒͷσβΠϯϓϩηεͷಛੑʹண্ͨ͠Ͱ
ελΠϧసҠϞσϧΛطଘΞϓϦʹΈ߹ΘͤΔख๏ΛఏҊ
w ಛʹɺయܕతͳػցֶशλεΫͱਓؒͷߦಈͷ
Ϊϟοϓʹண͢Δ͜ͱͰ৽ͨͳઃఆΛఏى
w ʮࣅͤͨ݁ՌͰͳ͘ɺࣅͤΔํ๏Λڭ͑Δʯͱ͍͏
Ξϓϩʔνֶ͕शࡁϞσϧͷΈͰ࣮ݱͰ͖Δ͜ͱΛ֬ೝ
·ͱΊ
Ϣʔβͷೝɾߦಈ͔ΒΛଊ͑͢͜ͱͰ
طଘͷֶशϞσϧͷ৽ͨͳϢʔεέʔεΛੜΈग़͢
Slide 31
Slide 31 text
Human Behavior-Informed ML
ML-Informed Human Behavior
Slide 32
Slide 32 text
Empirical evidence of Large Language Model's
influence on human spoken communication
Hiromu Yakura*, Ezequiel Lopez-Lopez*, Levin Brinkmann*,
Ignacio Serna, Prateek Gupta, Iyad Rahwan
*: equal contribution
Max-Planck Institute for Human Development
arXiv 2409.01754
Slide 33
Slide 33 text
$IBU(15
ͷ
EFMWF
όΠΞε
‣ $IBU(15
ͳ͔ͥ
EFMWF
ͱ͍͏୯ޠΛ͍͕ͪͱ͍͏ݱ͕
͘ΒΕͭͭ͋Δ
‣ ଞʹ
$IBU(15
ʹಛతͳ
୯ޠ͕ൃݟ͞ΕɺจͰ
ग़ݱසͷ૿Ճ͕ࢦఠ͞ΕΔ
W. Liang, et al. Mapping the Increasing Use of LLMs in Scientific Papers. Proc. CoLM (2024).
https://pshapira.net/2024/03/31/delving-into-delve/