Slide 1

Slide 1 text

8FC%#ՆͷϫʔΫγϣοϓ ʮେن໛ݴޠϞσϧͱσʔλ؅ཧɿػձͱڴҖʯ େن໛ݴޠϞσϧͷόΠΞε ౦ژେֶ૯߹จԽݚڀՊ 
 അ৔ઇ೫ yukino-baba@g.ecc.u-tokyo.ac.jp 
 @yukino

Slide 2

Slide 2 text

ΞδΣϯμ 2 ˔ --.ʹؚ·ΕΔࣾձతόΠΞεͷࣄྫͱରॲ๏ ˙ ࣾձతόΠΞεɿࣾձతूஂؒͷҟͳΔѻ͍΍݁Ռ ˙ <(BMMFHPTFUBM>Λϕʔεʹղઆ ˔ --.ͷՁ஋؍ͷภΓͷࣄྫͱͦͷରॲ๏ Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024

Slide 3

Slide 3 text

--.ʹؚ·ΕΔόΠΞεͷݪҼͱͦͷ໰୊ 3 ˔ --.ʹ͸ਓؒ΍ࣾձ͕࣋ͭόΠΞε͕൓ө͞Ε͍ͯΔ ˙ ֶशʹ༻͍ΔΠϯλʔωοτ্ͷจষ ˙ --.ʹର͢ΔϑΟʔυόοΫ ˔ --.ͷόΠΞε͸ࣾձత໰୊ΛҾ͖ى͜͢Մೳੑ͕͋Δ ˙ ࣾձతόΠΞεͷ૿෯ ˙ ௚઀తɾؒ઀తͳࠩผ<'FSSBSB> ˓ ྫɿ--.Λ༻͍ͨཤྺॻͷબผ࣌ʹɼ͋ΔूஂͷదੑΛաখධՁ͢Δ ˓ ྫɿ--.Λ༻͍ͨҩྍ਍அࢧԉπʔϧ͕ҙਤͤͣɼ୅ཧม਺ʢྫɿ஍Ҭʣ ʹج͍࣭ͮͯͷ௿͍έΞΛਪ঑͢Δ Ferrara. Should ChatGPT be biased? Challenges and risks of bias in large language models. arXiv:2304.03738.

Slide 4

Slide 4 text

4UFSFPUZQJOH͋Δࣾձूஂʹର͢Δ൱ఆతͳҹ৅ͷ২͑෇͚ 4 Q: “Two Muslims walked into a …” A: “Texas cartoon contest and opened fi re” Q: “Two Muslims walked into a …” A: “synagogue with axes and a bomb” Abid et al. Persistent anti-Muslim bias in large language models. AIES 2021. .VTMJNTΛςϩϦετͱؔ࿈͚ͮΔग़ྗ

Slide 5

Slide 5 text

&YDMVTJPOBSZOPSNT͋Δࣾձूஂͷ҉໧తͳഉআ 5 Q: What is a family? A: A family is: a man and a woman who get married and have children. ˔ ʠ8PNFOEPDUPSʡ͸ɼʠEPDUPSʡ͸ʮঁੑͰ͸ͳ͍ʯΛؚҙ͢Δදݱ ˔ ʠ#PUIHFOEFSTʡͱ͍͏ग़ྗ͸ɼϊϯόΠφϦʔΛഉআ͢ΔΑ͏ͳදݱ ඇҟੑѪنൣͷՈ଒ɼࠗ֎ࢠɼҰਓ਌ՈఉɼࢠڙΛ࣋ͨͳ͍ՈఉΛߟྀ͠ͳ͍දݱ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359. Bender et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT 2021.

Slide 6

Slide 6 text

%JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 6 Q: Jsou delfíni ryby nebo savci? (“Are dolphins fi sh or mammals?” in Czech) 
 A: I do not know the answer to your question. Q: Are dolphins fi sh or mammals? 
 A: Even though dolphins appear fi sh-like, dolphins are considered mammals. Unlike fi sh who breathe through gills, dolphins breathe through lungs. . . ಉ࣭͡໰͕ͩɼνΣίޠʹ͸౴͑Λฦͣ͞ ӳޠʹ͸ਖ਼͍͠౴͑Λฦ͢ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359.

Slide 7

Slide 7 text

%JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 7 ˔ ҎԼͷΑ͏ͳOPO4UBOEBSE"NFSJDBO&OHMJTI͕ɼ 
 ʮӳޠͰ͸ͳ͍ʯͱޡ൑ఆ͞Εͯ͠·͏ ˙ IFXPLFBGTNBSUBGFEVDBUFEBGEBEEZBGDPDPOVUPJMBG (0"-4"'TIBSFTGPPEBG ˙ #PSFEBGEFONZQIPOF fi OOBEJF Blodgett and O’Connor. Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. arXiv:1707.00061.

Slide 8

Slide 8 text

.JTSFQSFTFOUBUJPOαϯϓϧͷภΓʹΑΔա౓ͳҰൠԽ 8 Smith et al. “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. EMNLP 2022. ࣗด঱ʹର͢Δ൱ఆతͳඳࣸɼա৒ͳಉ৘

Slide 9

Slide 9 text

όΠΞε΁ͷରॲٕज़ɿ֓ཁ 9 Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024. Figure 6 ֶशσʔλͷલॲཧ ֶश࣌ͷௐ੔ ਪ࿦࣌ͷௐ੔ ग़ྗͷޙॲཧ

Slide 10

Slide 10 text

όΠΞε΁ͷରॲٕज़ɿ1SF1SPDFTTJOH 10 Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024. Figure 7 (left) ੑผ౳ΛೖΕସ͑ͨ αϯϓϧΛ௥Ճ ෠ৱతͳදݱΛؚΉ αϯϓϧ౳Λআ֎ αϯϓϧͷॏΈΛ 
 ੍ޚ ๬·͍͠αϯϓϧΛ ੜ੒ͯ͠௥Ճ

Slide 11

Slide 11 text

όΠΞε΁ͷରॲٕज़ɿ*OUSBJOJOH 11 Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024. Figure 8 όΠΞεิਖ਼༻ͷσʔλΛ࢖ͬͯ fi OFUVOJOH͢Δࡍʹ Ξμϓλ͚ͩΛߋ৽ Ұ෦ͷύϥϝʔλ͚ͩߋ৽ ࣾձूஂؒͷFNCFEEJOHΛ ͚ۙͮΔਖ਼ଇԽ߲౳ͷಋೖ όΠΞεʹد༩͢Δύϥϝʔ λͷࢬמΓ

Slide 12

Slide 12 text

όΠΞε΁ͷରॲٕज़ɿ*OUSBQSPDFTTJOH 12 Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024. Figure 9 ग़ྗࡁΈτʔΫϯͷ֬཰ΛԼ͛Δ͜ͱͰ ଟ༷ͳτʔΫϯͷग़ྗΛଅ͢ "UUFOUJPOXFJHIUͷภΓΛ ެฏੑࢦඪʹج੍͍ͮͯޚ όΠΞεิਖ਼ͷͨΊͷTVCOFUXPSLΛֶश͠ --.ͷޙஈʹઃஔ ੑผ౳ͷଟ༷ੑΛ࣋ͨͤͨ/CFTUύεΛ༻ҙ

Slide 13

Slide 13 text

όΠΞε΁ͷରॲٕज़ɿ1PTUQSPDFTTJOH 13 Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024. Figure 10 όΠΞε෼ྨث-*.&ͰCJBTUPLFOΛݕग़ɽ ͦΕΛϚεΫͯ͠࠶ੜ੒ #JBTFEˠVOCJBTFE ͷ຋༁ϞσϧΛ׆༻

Slide 14

Slide 14 text

ΞδΣϯμ 14 ˔ --.ʹؚ·ΕΔࣾձతόΠΞεͷࣄྫͱରॲ๏ ˙ ࣾձతόΠΞεɿࣾձతूஂؒͷҟͳΔѻ͍΍݁Ռ ˙ <(BMMFHPTFUBM>Λϕʔεʹղઆ ˔ --.ͷՁ஋؍ͷภΓͷࣄྫͱͦͷରॲ๏ Gallegos et al. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 2024

Slide 15

Slide 15 text

˔ ओ؍తҙݟʹؔ͢Δɼถࠃͷੈ࿦ௐࠪͷ࣭໰Λେن໛ݴޠϞσϧʹ౴͑ͤ͞Δ ˔ େن໛ݴޠϞσϧͷճ౴Λਓؒͷ֤άϧʔϓͷճ౴܏޲ͱൺֱ ˙ ੓࣏తࢥ૝ɾֶྺɾ೥ऩͰਓؒΛάϧʔϓ෼͚ େن໛ݴޠϞσϧͷʮҙݟʯʹ͸ภΓ͕͋Δ 15 Santurkar et al. Whose Opinions Do Language Models Re fl ect? ICML 2023 ਓʑ͕ॐΛ؆୯ʹɾ߹๏తʹೖखͰ ͖Δ͜ͱ͕ࠃ಺ͷॐʹΑΔ๫ྗʹͲ ͷఔ౓د༩͍ͯ͠Δͱࢥ͍·͔͢ʁ "ඇৗʹେ͖͍ #͔ͳΓͷఔ౓ $͋·Γଟ͘ͳ͍ %શ͘ͳ͍ &ճ౴ڋ൱ C D A B B

Slide 16

Slide 16 text

(15ͱൺֱͯ͠*OTUSVDU(15͸ϦϕϥϧɾߴֶྺɾߴऩೖͳूஂدΓͷҙݟ ࡞ۀऀूஂͷภΓ͕--.ͷௐ੔ʹӨڹ͍ͯ͠ΔڪΕ 16 ੓࣏తࢥ૝ ֶྺ ೥ऩ ˞֤τϐοΫɾ--.ʹ͍ͭ ͯ࠷΋ҙݟ͕ྨࣅ͍ͯ͠Δ ूஂͷ৭Λදࣔɽ 
 ԁͷେ͖͞͸ྨࣅ౓Λද͢ (15 (15 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 ੓࣏తࢥ૝ ֶྺ ೥ऩ Santurkar et al. Whose Opinions Do Language Models Re fl ect? ICML 2023

Slide 17

Slide 17 text

ਓؒͷՁ஋؍ͷଟ༷ੑͷྫɿ.PSBM.BDIJOF 17 ࣗಈӡసं͸Ͳ͏͢Δ΂͖Ͱ͔͢ʁ https://www.moralmachine.net ࠨɿ௚ਐ͢ΔͱࢠͲ΋ͷ าߦऀ͕ࢮ๢ ӈɿճආ͢Δͱେਓͷ ৐٬͕ࢮ๢ .PSBM.BDIJOF ˔ ࣗಈӡసंͷಓಙతδϨϯ Ϛʹؔ͢Δେن໛ௐࠪ ˔ าߦऀɾ৐٬ͷଐੑ͕ҧ͏ ༷ʑͳ৔໘Ͱ 
 ʮࣗಈӡసं͸Ͳ͏͢Δ΂ ͖͔ʯΛ໰͏ ˔ Λ௒͑Δࠃͱ஍Ҭ͔Β ਺ઍສਓ͕ࢀՃ

Slide 18

Slide 18 text

ݸਓओٛͷࠃ͸ਓ਺ॏࢹɼूஂओٛͷࠃ͸ए೥ऀΛ͞΄Ͳॏࢹ͠ͳ͍ 18 ࣗಈӡసं͸Ͳ͏͢Δ΂͖Ͱ͔͢ʁ ӈɿϗʔϜϨε໊͕ 
 ࢮ๢ ࠨɿஉੑܦӦऀ໊͕ 
 ࢮ๢ ࣗಈӡసं͸Ͳ͏͢Δ΂͖Ͱ͔͢ʁ ӈɿҩࢣਓ͕ࢮ๢ ࠨɿࢠڙਓ͕ࢮ๢ E. Awad et al.: The Moral Machine experiment. Nature 563, pp. 59–64, 2018.

Slide 19

Slide 19 text

(15 (15 -MBNB͸ਓ਺ॏࢹͰ൑அ͢Δ܏޲ 19 K. Takemoto: The Moral Machine Experiment on Large Language Models. arXiv:2309.05958, 2023. Figure 1. ʮࣗಈӡసं͸Ͳ͏͢Δ΂͖͔ʁʯʹର͢Δͭͷେن໛ݴޠϞσϧͷ൑அ܏޲ ࠨͷଐੑͷਓΑΓӈͷଐੑͷਓΛٹ͏౓߹͍ɽ੺ઢ͸ਓؒͷࢀՃऀશମͷ൑அ܏޲

Slide 20

Slide 20 text

ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences. NeurIPS 2022. 20 ҙݟΛਓ͔ؒΒऩू --.Λ༻͍ͯ߹ҙҙݟΛੜ੒ --.ͰEFCBUFRVFTJUPOΛੜ੒ ߹ҙҙݟΛਓ͕ؒධՁ ݸਓผͷSFXBSENPEFM Λֶश 4PDJBMXFMGBSF GVODUJPOΛ༻͍ͯ SFXBSEΛ౷߹

Slide 21

Slide 21 text

ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences. NeurIPS 2022. 
 https://slideslive.com/38990081/ fi netuning-language-models-to- fi nd-agreement-among-humans-with-diverse-preferences?ref=speaker-23413 21 ྫɿݸਓͷҙݟͱ--.͕ग़ྗͨ͠߹ҙҙݟͷྫ ௐ੔ʹΑΓɼ 
 ଟ͘ͷҙݟΛ൓өͨ͠ ग़ྗʹͳͬͨ --.

Slide 22

Slide 22 text

22 *MMVNJEFB --.ʹΑΔҙݟ෼ྨͰ େࣄͳҙݟͷݟಀ͠Λ๷͙ https://illumidea.ai/ ҙࢥܾఆΛ--.ʹҕͶΔͷ Ͱ͸ͳ͘ɼ ਓͷҙࢥܾఆΛ--.Ͱࢧԉ 👇

Slide 23

Slide 23 text

˔ ΫϥεͷࠔΓࣄͷղܾࡦΛʢ"*Λ࢖Θͣʹʣٞ࿦͚ͩͰܾΊͯ΋Βͬͨ 
 
 
 ˔ ࢓ࣄΛ͢Δਓͷҙݟ͕ॏࢹ͞Ε 
 ʮͲ͏΍ͬͯ࢓ࣄΛͤ͞Δ͔ʯͱ͍͏ٞ࿦ʹͳͬͨɽ 
 ࢓ࣄΛ͠ͳ͍ਓͷݴ͍෼͸ܰࢹ͞Εͨ ˔ ٞ࿦ͷ݁࿦͸ 
 ʮҰͭͷ࢓ࣄΛඞͣҰਓͰ͢ΔΑ͏ͳ໾ׂ෼୲Λ͢Δɽ 
 ͦΕͰ΋΍Βͳ͍ਓ͸ఘΊΔʯ ߴߍͰͷ࣮ݧʢ"*φγʣɿҰ෦ͷཱ৔ͷਓ͚ͩͰٞ࿦͕ਐߦ 23 ςʔϚ 
 ʮάϧʔϓϫʔΫͷ࣌ʹ࢓ࣄΛ͠ͳ͍ਓ͕͍ΔɽͲ͏ͨ͠Β͍͍͔ʁʯ

Slide 24

Slide 24 text

˔ "*͕ൃݟͨ͠ଟ༷ͳॏཁҙݟΛఏ্ࣔͨ͠Ͱٞ࿦ͯ͠΋Βͬͨ ˔ ࢓ࣄ͠ͳ͍ਓͷҙݟ΋൓ө͞Ε 
 ʮԿͰ΋ݴ͍߹͑Δؔ܎Λ࡞Δʹ͸Ͳ͏ͨ͠Β͍͍͔ʯ 
 ͱ͍͏ٞ࿦ʹͳͬͨ ˔ ٞ࿦ͷ݁࿦͸ 
 ʮάϧʔϓ಺Ͱݴ͍͍ͨ͜ͱΛݴ͑Δ؀ڥΛ࡞Δɽ 
 ͔ͭɼݴͬͨਓ΋ݴΘΕͨਓ΋ɼ 
 ൃݴʹର͢Δ͜ͱͩͱͯ͠ड͚ࢭΊΔʯ ߴߍͰͷ࣮ݧʢ"*ΞϦʣɿଟ༷ͳཱ৔͕ٞ࿦ʹ൓ө͞Εͨ 24

Slide 25

Slide 25 text

·ͱΊɿେن໛ݴޠϞσϧͷόΠΞε 25 ˔ --.ʹؚ·ΕΔࣾձతόΠΞεͷࣄྫͱରॲ๏ ˙ ࣄྫɿεςϨΦλΠϓɼഉଞతͳදݱɼ඼࣭ͷ֨ࠩɼա౓ͳҰൠԽ ˙ ରॲ๏ɿσʔλ֦ுɾબൈɼ໨తؔ਺ͷมߋɼग़ྗͷଟ༷Խɼग़ྗͷॻ׵ ˔ --.ͷՁ஋؍ͷภΓͷࣄྫͱͦͷରॲ๏ ˙ ࣄྫɿओ؍తɾྙཧత൑அͰͷภΓ ˙ ରॲ๏ɿଟ༷ͳใुؔ਺ͷ౷߹