Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
(Reading )Agreeing to Disagree: Annotating Offe...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
ando
January 29, 2022
Research
180
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
(Reading )Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators’ Disagreement
komachi lab.
ando
January 29, 2022
More Decks by ando
See All by ando
(Reading )Does BERT Know that the IS-A Relation Is Transitive?
ando55
0
110
博士論文公聴会資料
ando55
0
450
Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?
ando55
0
260
(読み会)Evaluating Factuality in Text Simplification
ando55
1
200
(Reading) Relational Multi-Task Learning Modeling Relations between Data and Tasks
ando55
0
200
(Reading) Preregistering NLP research
ando55
0
72
(Reading) Predictive Adversarial Learning from Positive and Unlabeled Data
ando55
0
150
Argument Invention from First Principles
ando55
2
350
Other Decks in Research
See All in Research
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
310
2026年1月の生成AI領域の重要リリース&トピック解説
kajikent
0
1k
Anthropic が提案する LLM の内部状態を自然言語で説明可能にした Natural Language Autoencoders / Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
shunk031
0
120
SoftMatcha 2: 1兆語規模コーパスの超高速かつ柔らかい検索
e869120_sub
6
3.4k
AIで最適化を解けるか?
mickey_kubo
0
110
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
140
羽田新ルート運用6年の検証
1manken
0
160
AI Agentの精度改善に見るML開発との共通点 / commonalities in accuracy improvements in agentic era
shimacos
6
1.7k
はじまりの クエスチョンブック —余暇と豊かさにあふれた社会とは?
culturaltransition
PRO
0
510
敵対生成プロンプト同時探索による内省型プロンプト最適化
kinoue_smarthr
0
130
第12回人と環境にやさしい交通をめざす全国大会/熊本都市圏「車1割削減、渋滞半減、公共交通2倍」をめざして
trafficbrain
0
110
コーディングエージェントとABNを再考
hf149
2
700
Featured
See All Featured
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
820
Crafting Experiences
bethany
1
170
Fireside Chat
paigeccino
42
3.9k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Building the Perfect Custom Keyboard
takai
2
790
Marketing to machines
jonoalderson
1
5.4k
The Art of Programming - Codeland 2020
erikaheidi
57
14k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
Rails Girls Zürich Keynote
gr2m
96
14k
Everyday Curiosity
cassininazir
0
220
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
530
Transcript
EMNLP 2021
- Ξϊςʔγϣϯʹ͓͚ΔJOUFSBOOPUBUPSBHSFFNFOUʹ͍ͭͯͷจ ҰൠతʹBHSFFNFOU͍ ΨΠυϥΠϯͷϛεɺεύϚʔͷӨڹˠσʔλͷ࣭͍ - 0GGFOTJWFMBOHVBHFEFUFDUJPOͱ͍͏ओ؍ੑͷߴ͍λεΫͰͷݕূ - ͍͍͍ͨ͜ͱɿ • ओ؍తͳλεΫʹ͓͍ͯɺJOUFSBOOPUBUPSBHSFFNFOU͕͍ʹϊΠζʢΞϊςʔγϣϯ
ͷ ϛεʣ Ͱͳ͍ • ओ؍ੑɺCJBTɺςΩετͷᐆດ͞ͳͲॏཁͰ͋Γɺओ؍తͳλεΫͰຊ࣭తͳཁૉɻͳ ͷͰɺ͜ΕΒΛࣺͯͯͩΊɻ • BHSFFNFOU͕ߴ͍σʔλΛूΊΔͱσʔληοτ͕؆୯ʹͳΓ͗͢ɺݱߦͷσʔληοτ ؆୯Ͱ͋Δ 2 概要
- $SPXETPVSDJOHͰΫϥυϫʔΧʔෳࡶͳλεΫΛ࣮ߦ͢ΔͨΊͷτϨʔχϯά Λड͚͍ͯΔΘ͚Ͱͳ͍ Ұൠʹ"HSFFNFOUΛͬͯͦͷ࣭Λଌఆ͢Δ - 1P4 UBHHJOHͱ͔QBSTJOHΞϊςʔγϣϯΨΠυϥΠϯ͕ݪҼͰΞϊςʔλʔؒͷ ૬ҧ͕ੜ͍͢͡ɻˠ ͠߹͍ʹΑͬͯղܾͰ͖Δɻ -
4PDJBMDPNQVUJOHUBTL PGGFOTJWFMBOHVBHFEFUFDUJPOͳͲ ͰɺΞϊςʔλʔͷ શͳ߹ҙڧ੍͞ΕΔ͖Ͱͳ͍ ˠ HMPCBMDPOTFOTVTࢦ͞ͳ͍ - Θ͔ͬͨ͜ͱɿ "HSFFNFOUͷ͍ͷͰϞσϧͷτϨʔχϯάʹ༗༻Ͱ͋ΓɺϥϯμϜͳΞϊςʔ γϣϯͰͳ͍͜ͱΛࣔͨ͠ɻ ͭ·Γɺࠓ·ͰߦΘΕ͖ͯͨΠϯελϯεͷআɺEJTBHSFFNFOUΛղܾ͢Δͨ Ίͷਖ਼͍͠ઓུͰͳ͍ɻ 3 Introduction
- ࠷ۙͰɺݴޠຊ࣭తʹᐆດͰ͋ΔͨΊɺEJTBHSFFNFOUආ͚ΒΕͳ͍ͱओு͢ Δจ͕૿͍͑ͯΔ -PSB"SPZP BOE$ISJT8FMUZ5SVUIJTBMJF$SPXEUSVUIBOEUIFTFWFONZUITPGIVNBOBOOPUBUJPO"* .BHB[JOF - /&31PT UBHHJOHͳͲͰɺ"HSFFNFOUͷ͍ΠϯελϯεΛϑΟϧλϦϯάɺॏ Έ͚ɺ৴པੑͷ͍ΞϊςʔλʔͷಛఆͳͲΛ͢Δઌߦݚڀ͕ଟ͋Δ
ˠ ؆୯ͳέʔεͷྨਫ਼Λ্͛Δత - ࣅͨݚڀ • Ϋϥυιʔεͷσʔληοτʹ͓͚ΔTUBCMFPQJOJPOTͱϊΠζΛɻ .JUDIFMM-(PSEPO ,BJUMZO;IPV ,BZVS 1BUFM 5BUTVOPSJ )BTIJNPUP BOE.JDIBFM4#FSOTUFJO5IF EJTBHSFFNFOUEFDPOWPMVUJPO#SJOHJOHNBDIJOFMFBSOJOHQFSGPSNBODFNFUSJDTJOMJOFXJUISFBMJUZ • ΞϊςʔλʔΛͦͷภੑʹج͍ͮͯάϧʔϓʹ͚ɺҟͳΔΰʔϧυελϯμʔυͱ͢Δ 4PIBJM "LIUBS 7BMFSJP#BTJMF BOE7JWJBOB1BUUJ.PEFMJOHBOOPUBUPSQFSTQFDUJWFBOEQPMBSJ[FEPQJOJPOT UPJNQSPWFIBUFTQFFDIEFUFDUJPO 4 関連研究
- πΠʔτ $PWJEɺถࠃେ౷ྖબڍɺ#MBDL-JWFT.BUUFS #-. ϋογϡλάͱಛఆͷ Ωʔϫʔυؚ͕·ΕΔπΠʔτΛऔಘ FHDPWJE FMFDUJPO CMN ສ݅ͷπΠʔτʢ֤υϝΠϯͰສ݅ʣΛϥϯμϜʹநग़
- ΞϊςʔγϣϯͷͨΊʹΫϥεʢ/ 0 "HSFFNFOUMFWFMʣͷόϥϯεͷऔΕͨσʔ ληοτʹ͍ͨ͠ • ઌߦݚڀͷΞϊςʔγϣϯ͞ΕͨσʔληοτΛ͏ɻ • ݅Λม͑ͨͭͷ#FSUϕʔεͷྨثΛ༻ҙͯ͠ɺΞϯαϯϒϧֶशʢ֤ྨثΛԾత ͳΞϊςʔλʔͱݟ၏͢ʣɻ • ଟܾͷථΛҎԼͷΑ͏ʹఆٛ • Ξϯαϯϒϧͷ֤ҰகΫϥεʢ" " "ʣʹ͍ͭͯ ͷπΠʔτΛબ͠ɺυϝΠ ϯ͝ͱʹ߹ܭ ͷπΠʔτΛநग़ɻ 5 Data Selection and Annotation
- ωΠςΟϒεϐʔΧʔਓ͕Ξϊςʔγϣϯ - ߴ࣭ͳΞϊςʔγϣϯΛ࣮ݱ͢Δ • ਓͷઐՈʹΞϊςʔγϣϯΛґཔ͠ɺશʹҰகͨ͠πΠʔτΛΰʔϧυελϯμʔυ ͱ͢Δɻ • ΫϥυϫʔΧʔ͕ΰʔϧυΛؒҧͬͨ߹ɺͦͷ)*5ഁغɻ •
૯߹తͳਫ਼͕࠷Ͱʹୡ͠ͳ͔ͬͨϫʔΧʔ͕ߦͬͨΞϊςʔγϣϯΛͯ͢আɻ - ".5ͰΞϊςʔγϣϯ͞ΕͨπΠʔτͷ૯ ݅ 6 こまかいAnnotation 内容 ͜ΕؒతʹBHSFFNFOUཧɺ CJBTআڈ͍ͯ͠ΔͷͰʁ
- ΫϥυΞϊςʔγϣϯͷ߈ܸతͳπΠʔτ͕ˋɺඇ߈ܸతͳπΠʔτ͕ˋɺ ΞϯαϯϒϧΞϊςʔγϣϯͰͦΕͧΕˋɻ - ΞϯαϯϒϧͷBHSFFNFOUΫϥεVOJGPSN͕ͩɺΫϥυΞϊςʔγϣϯVOJGPSNͰ ͳ͍ɻ - ΞϯαϯϒϧͱΫϥυͷBHSFFFNFOUͷؒͷ1FBSTPOTͷ૬ؔ - ΞϯαϯϒϧΛ༻͍ͨ1SFTDSFFOᐆດ͞ͷͳ͍ɺ·ͨΑΓࠔͳπΠʔτΛಛఆͰ͖
ΔՄೳੑ͕͋Δɻ 7 (Annotation結果)
- ໊ࢺ - ಛఆͷਓάϧʔϓΛରͱ͠ͳ͍ҰൠతͳౖΓͷදݱ - 2VFTUJPOT ˠ EJTBHSFFNFOUɺQPPSBOOPUBUJPOʹىҼ͢ΔͷͰͳ͘ɺπΠʔτͷղऍͷ૬ ҧʹىҼ͢Δͷɻ 8
アノテーションのdisagreementにつながる現象
- 5SBJO 5FTUEBUB • " " ͷͭΛ܇࿅ʹ༻͍Δͷ͕Ͳͷ5FTUTFUʹରͯ͠Ұ൪ྑ͍ • "HSFFNFOU͕͍σʔλΓਫ਼ΛԼ͛Δ 9
Classificationに対する影響
"SBOE"ͷϥϕϧΛϥϯμϜͳͷʹஔ͖͑ͨͷ • "SBOEΑΓ"ͷํ͕ྑ͍ͷͰɺ"ͷϥϕϧۮવʹׂΓͯΒΕͨͷͰ ͳ͘ɺࠔͰ͋Δ͕ྨثʹͱͬͯ༗༻ͳ৴߸ΛؚΜͰ͍Δɻ • Πϯελϯεͷআใͷଛࣦʹͭͳ͕ΔͨΊྑ͘ͳ͍ 10 Agreementが低いものは精度向上に役⽴つか
4IBSEUBTL 0GGFOTFWBM Ͱ༻͍ΒΕ͍ͯΔσʔληοτʹରͯ͠.UVSLͰΞϊςʔγϣϯͨ͠ͷ • طଘͷλεΫେͷγεςϜ͕ೋྨλεΫͰ'είΞʼ • ͍͠Πϯελϯε͕গͳ͍ͨΊείΞ͕ߴ͘ͳ͍ͬͯΔ ˠΦʔόʔαϯϓϦϯάͱ͔Ͱ͍͠ΠϯελϯεΛ૿͖͢Ͱʁʢஶऀʣ - ͔͠͠ɺσʔληοτͷେنσʔλͱൺͯಉ͡ͳͷͰࣗવɻ
11 既存データセットに対する調査 λεΫͱͯࣗ͠વͳͱΦʔόʔαϯ ϓϦϯά͢ΔͷͱͲͬͪͷํ͕·͍͠ʁ
- "HSFFNFOU͕͍σʔλਫ਼ΛԼ͛Δ͕ɺਫ਼ΛԼ͛Δ͔ΒͱݴͬͯϊΠζͰ ͳ͍ - σʔλʹଟ༷ੑΛऔΓೖΕɺϚΠϊϦςΟͷͷഉআͱਓޱ౷ܭֶతͳޡΓͷ྆ํΛ ݮΒ͍ͨ͠ͷͰ͋Εʢ)PWZ BOE4QSVJU ʣɺҙݟͷ૬ҧϊΠζͰͳ͘ TJHOBMͱͯ͠ݟΒΕΔ͖Ͱ͢ɻ -
ײ • ͜ͷݚڀͷ"HSFFNFOUSBUFͷ͍Πϯελϯεʹ݁ہϊΠζͱTUBCMFPQJOJPOTͷͲͪΒ ؚ·Ε͍ͯΔΑ͏ͳؾ͕͢Δ 12 まとめ