Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(Reading )Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators’ Disagreement

ando
January 29, 2022

(Reading )Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators’ Disagreement

komachi lab.

ando

January 29, 2022
Tweet

More Decks by ando

Other Decks in Research

Transcript

  1. - Ξϊςʔγϣϯʹ͓͚ΔJOUFSBOOPUBUPSBHSFFNFOUʹ͍ͭͯͷ࿦จ Ұൠతʹ͸BHSFFNFOU௿͍ ΨΠυϥΠϯͷϛεɺεύϚʔͷӨڹˠσʔλͷ࣭௿͍ - 0GGFOTJWFMBOHVBHFEFUFDUJPOͱ͍͏ओ؍ੑͷߴ͍λεΫͰͷݕূ - ͍͍͍ͨ͜ͱɿ • ओ؍తͳλεΫʹ͓͍ͯɺJOUFSBOOPUBUPSBHSFFNFOU͕௿͍ʹϊΠζʢΞϊςʔγϣϯ

    ͷ ϛεʣ Ͱ͸ͳ͍ • ओ؍ੑɺCJBTɺςΩετͷᐆດ͞ͳͲ͸ॏཁͰ͋Γɺओ؍తͳλεΫͰ͸ຊ࣭తͳཁૉɻͳ ͷͰɺ͜ΕΒΛࣺͯͯ͸ͩΊɻ • BHSFFNFOU͕ߴ͍σʔλΛूΊΔͱσʔληοτ͕؆୯ʹͳΓ͗͢ɺݱߦͷσʔληοτ͸ ؆୯Ͱ͋Δ 2 概要
  2. - $SPXETPVSDJOHͰΫϥ΢υϫʔΧʔ͸ෳࡶͳλεΫΛ࣮ߦ͢ΔͨΊͷτϨʔχϯά Λड͚͍ͯΔΘ͚Ͱ͸ͳ͍ Ұൠʹ͸"HSFFNFOUΛ࢖ͬͯͦͷ඼࣭Λଌఆ͢Δ - 1P4 UBHHJOHͱ͔QBSTJOH͸ΞϊςʔγϣϯΨΠυϥΠϯ͕ݪҼͰΞϊςʔλʔؒͷ ૬ҧ͕ੜ͡΍͍͢ɻˠ ࿩͠߹͍ʹΑͬͯղܾͰ͖Δɻ -

    4PDJBMDPNQVUJOHUBTL PGGFOTJWFMBOHVBHFEFUFDUJPOͳͲ Ͱ͸ɺΞϊςʔλʔͷ ׬શͳ߹ҙ͸ڧ੍͞ΕΔ΂͖Ͱ͸ͳ͍ ˠ HMPCBMDPOTFOTVT͸໨ࢦ͞ͳ͍ - Θ͔ͬͨ͜ͱɿ "HSFFNFOUͷ௿͍΋ͷͰ΋ϞσϧͷτϨʔχϯάʹ༗༻Ͱ͋ΓɺϥϯμϜͳΞϊςʔ γϣϯͰ͸ͳ͍͜ͱΛࣔͨ͠ɻ ͭ·Γɺࠓ·ͰߦΘΕ͖ͯͨΠϯελϯεͷ࡟আ͸ɺEJTBHSFFNFOU໰୊Λղܾ͢Δͨ Ίͷਖ਼͍͠ઓུͰ͸ͳ͍ɻ 3 Introduction
  3. - ࠷ۙͰ͸ɺݴޠ͸ຊ࣭తʹᐆດͰ͋ΔͨΊɺEJTBHSFFNFOU͸ආ͚ΒΕͳ͍ͱओு͢ Δ࿦จ͕૿͍͑ͯΔ -PSB"SPZP BOE$ISJT8FMUZ5SVUIJTBMJF$SPXEUSVUIBOEUIFTFWFONZUITPGIVNBOBOOPUBUJPO"* .BHB[JOF - /&3΍1PT UBHHJOHͳͲͰɺ"HSFFNFOUͷ௿͍ΠϯελϯεΛϑΟϧλϦϯάɺॏ Έ෇͚ɺ৴པੑͷ௿͍ΞϊςʔλʔͷಛఆͳͲΛ͢Δઌߦݚڀ͕ଟ਺͋Δ

    ˠ ؆୯ͳέʔεͷ෼ྨਫ਼౓Λ্͛Δ໨త - ࣅͨݚڀ • Ϋϥ΢υιʔεͷσʔληοτʹ͓͚ΔTUBCMFPQJOJPOTͱϊΠζΛ෼཭ɻ .JUDIFMM-(PSEPO ,BJUMZO;IPV ,BZVS 1BUFM 5BUTVOPSJ )BTIJNPUP BOE.JDIBFM4#FSOTUFJO5IF EJTBHSFFNFOUEFDPOWPMVUJPO#SJOHJOHNBDIJOFMFBSOJOHQFSGPSNBODFNFUSJDTJOMJOFXJUISFBMJUZ • ΞϊςʔλʔΛͦͷภ޲ੑʹج͍ͮͯάϧʔϓʹ෼͚ɺҟͳΔΰʔϧυελϯμʔυͱ͢Δ 4PIBJM "LIUBS 7BMFSJP#BTJMF BOE7JWJBOB1BUUJ.PEFMJOHBOOPUBUPSQFSTQFDUJWFBOEQPMBSJ[FEPQJOJPOT UPJNQSPWFIBUFTQFFDIEFUFDUJPO 4 関連研究
  4. - πΠʔτ $PWJEɺถࠃେ౷ྖબڍɺ#MBDL-JWFT.BUUFS #-. ϋογϡλάͱಛఆͷ Ωʔϫʔυؚ͕·ΕΔπΠʔτΛऔಘ FHDPWJE FMFDUJPO CMN ສ݅ͷπΠʔτʢ֤υϝΠϯͰ໿ສ݅ʣΛϥϯμϜʹநग़

    - ΞϊςʔγϣϯͷͨΊʹΫϥεʢ/ 0 "HSFFNFOUMFWFMʣͷόϥϯεͷऔΕͨσʔ ληοτʹ͍ͨ͠ • ઌߦݚڀͷΞϊςʔγϣϯ͞ΕͨσʔληοτΛ࢖͏ɻ • ৚݅Λม͑ͨͭͷ#FSUϕʔεͷ෼ྨثΛ༻ҙͯ͠ɺΞϯαϯϒϧֶशʢ֤෼ྨثΛԾ૝త ͳΞϊςʔλʔͱݟ၏͢ʣɻ • ଟ਺ܾͷථ਺ΛҎԼͷΑ͏ʹఆٛ • Ξϯαϯϒϧͷ֤ҰகΫϥεʢ" " "ʣʹ͍ͭͯ ͷπΠʔτΛબ୒͠ɺυϝΠ ϯ͝ͱʹ߹ܭ ͷπΠʔτΛநग़ɻ 5 Data Selection and Annotation
  5. - ωΠςΟϒεϐʔΧʔਓ͕Ξϊςʔγϣϯ - ߴ඼࣭ͳΞϊςʔγϣϯΛ࣮ݱ͢Δ • ਓͷઐ໳ՈʹΞϊςʔγϣϯΛґཔ͠ɺ׬શʹҰகͨ͠πΠʔτΛΰʔϧυελϯμʔυ ͱ͢Δɻ • Ϋϥ΢υϫʔΧʔ͕ΰʔϧυΛؒҧͬͨ৔߹ɺͦͷ)*5͸ഁغɻ •

    ૯߹తͳਫ਼౓͕࠷௿Ͱ΋ʹୡ͠ͳ͔ͬͨϫʔΧʔ͕ߦͬͨΞϊςʔγϣϯΛ͢΂ͯ࡟আɻ - ".5ͰΞϊςʔγϣϯ͞ΕͨπΠʔτͷ૯਺͸ ݅ 6 こまかいAnnotation 内容 ͜Ε͸ؒ઀తʹBHSFFNFOU؅ཧɺ CJBTআڈ͍ͯ͠ΔͷͰ͸ʁ