Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
(Reading )Agreeing to Disagree: Annotating Offe...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
ando
January 29, 2022
Research
180
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
(Reading )Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators’ Disagreement
komachi lab.
ando
January 29, 2022
More Decks by ando
See All by ando
(Reading )Does BERT Know that the IS-A Relation Is Transitive?
ando55
0
110
博士論文公聴会資料
ando55
0
450
Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?
ando55
0
260
(読み会)Evaluating Factuality in Text Simplification
ando55
1
200
(Reading) Relational Multi-Task Learning Modeling Relations between Data and Tasks
ando55
0
200
(Reading) Preregistering NLP research
ando55
0
72
(Reading) Predictive Adversarial Learning from Positive and Unlabeled Data
ando55
0
150
Argument Invention from First Principles
ando55
2
350
Other Decks in Research
See All in Research
適応的スパムフィルタのための軽量な類似メッセージカウンタ / jsai2026-adaptive-spam-filter
monochromegane
0
2.3k
Ghost in the 7‑Zip: The Shadow of Residential Proxies Creeping into Your Life
nttcom
0
960
東京大学工学部計数工学科、計数工学特別講義の説明資料
kikuzo
0
460
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
shunk031
4
1k
Apache Gravitinoで実現する Icebergカタログ統合とアクセスの一元化
matsumooon
0
260
Harness Engineering and Al Agent
kzinmr
3
1.6k
PGDM: Physically Guided Diffusion Model for L Downscaling
satai
2
250
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
1
1.3k
「行ける・行けない表」による地域公共交通の性能評価
bansousha
0
160
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
1.1k
Ankylosing Spondylitis
ankh2054
0
170
NII S. Koyama's Lab Research Overview AY2026
skoyamalab
0
280
Featured
See All Featured
Un-Boring Meetings
codingconduct
0
310
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
190
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
2
390
Designing Experiences People Love
moore
143
24k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Navigating Weather and Climate Data
rabernat
0
210
Art, The Web, and Tiny UX
lynnandtonic
304
22k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
140
Context Engineering - Making Every Token Count
addyosmani
9
950
A designer walks into a library…
pauljervisheath
211
24k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Transcript
EMNLP 2021
- Ξϊςʔγϣϯʹ͓͚ΔJOUFSBOOPUBUPSBHSFFNFOUʹ͍ͭͯͷจ ҰൠతʹBHSFFNFOU͍ ΨΠυϥΠϯͷϛεɺεύϚʔͷӨڹˠσʔλͷ࣭͍ - 0GGFOTJWFMBOHVBHFEFUFDUJPOͱ͍͏ओ؍ੑͷߴ͍λεΫͰͷݕূ - ͍͍͍ͨ͜ͱɿ • ओ؍తͳλεΫʹ͓͍ͯɺJOUFSBOOPUBUPSBHSFFNFOU͕͍ʹϊΠζʢΞϊςʔγϣϯ
ͷ ϛεʣ Ͱͳ͍ • ओ؍ੑɺCJBTɺςΩετͷᐆດ͞ͳͲॏཁͰ͋Γɺओ؍తͳλεΫͰຊ࣭తͳཁૉɻͳ ͷͰɺ͜ΕΒΛࣺͯͯͩΊɻ • BHSFFNFOU͕ߴ͍σʔλΛूΊΔͱσʔληοτ͕؆୯ʹͳΓ͗͢ɺݱߦͷσʔληοτ ؆୯Ͱ͋Δ 2 概要
- $SPXETPVSDJOHͰΫϥυϫʔΧʔෳࡶͳλεΫΛ࣮ߦ͢ΔͨΊͷτϨʔχϯά Λड͚͍ͯΔΘ͚Ͱͳ͍ Ұൠʹ"HSFFNFOUΛͬͯͦͷ࣭Λଌఆ͢Δ - 1P4 UBHHJOHͱ͔QBSTJOHΞϊςʔγϣϯΨΠυϥΠϯ͕ݪҼͰΞϊςʔλʔؒͷ ૬ҧ͕ੜ͍͢͡ɻˠ ͠߹͍ʹΑͬͯղܾͰ͖Δɻ -
4PDJBMDPNQVUJOHUBTL PGGFOTJWFMBOHVBHFEFUFDUJPOͳͲ ͰɺΞϊςʔλʔͷ શͳ߹ҙڧ੍͞ΕΔ͖Ͱͳ͍ ˠ HMPCBMDPOTFOTVTࢦ͞ͳ͍ - Θ͔ͬͨ͜ͱɿ "HSFFNFOUͷ͍ͷͰϞσϧͷτϨʔχϯάʹ༗༻Ͱ͋ΓɺϥϯμϜͳΞϊςʔ γϣϯͰͳ͍͜ͱΛࣔͨ͠ɻ ͭ·Γɺࠓ·ͰߦΘΕ͖ͯͨΠϯελϯεͷআɺEJTBHSFFNFOUΛղܾ͢Δͨ Ίͷਖ਼͍͠ઓུͰͳ͍ɻ 3 Introduction
- ࠷ۙͰɺݴޠຊ࣭తʹᐆດͰ͋ΔͨΊɺEJTBHSFFNFOUආ͚ΒΕͳ͍ͱओு͢ Δจ͕૿͍͑ͯΔ -PSB"SPZP BOE$ISJT8FMUZ5SVUIJTBMJF$SPXEUSVUIBOEUIFTFWFONZUITPGIVNBOBOOPUBUJPO"* .BHB[JOF - /&31PT UBHHJOHͳͲͰɺ"HSFFNFOUͷ͍ΠϯελϯεΛϑΟϧλϦϯάɺॏ Έ͚ɺ৴པੑͷ͍ΞϊςʔλʔͷಛఆͳͲΛ͢Δઌߦݚڀ͕ଟ͋Δ
ˠ ؆୯ͳέʔεͷྨਫ਼Λ্͛Δత - ࣅͨݚڀ • Ϋϥυιʔεͷσʔληοτʹ͓͚ΔTUBCMFPQJOJPOTͱϊΠζΛɻ .JUDIFMM-(PSEPO ,BJUMZO;IPV ,BZVS 1BUFM 5BUTVOPSJ )BTIJNPUP BOE.JDIBFM4#FSOTUFJO5IF EJTBHSFFNFOUEFDPOWPMVUJPO#SJOHJOHNBDIJOFMFBSOJOHQFSGPSNBODFNFUSJDTJOMJOFXJUISFBMJUZ • ΞϊςʔλʔΛͦͷภੑʹج͍ͮͯάϧʔϓʹ͚ɺҟͳΔΰʔϧυελϯμʔυͱ͢Δ 4PIBJM "LIUBS 7BMFSJP#BTJMF BOE7JWJBOB1BUUJ.PEFMJOHBOOPUBUPSQFSTQFDUJWFBOEQPMBSJ[FEPQJOJPOT UPJNQSPWFIBUFTQFFDIEFUFDUJPO 4 関連研究
- πΠʔτ $PWJEɺถࠃେ౷ྖબڍɺ#MBDL-JWFT.BUUFS #-. ϋογϡλάͱಛఆͷ Ωʔϫʔυؚ͕·ΕΔπΠʔτΛऔಘ FHDPWJE FMFDUJPO CMN ສ݅ͷπΠʔτʢ֤υϝΠϯͰສ݅ʣΛϥϯμϜʹநग़
- ΞϊςʔγϣϯͷͨΊʹΫϥεʢ/ 0 "HSFFNFOUMFWFMʣͷόϥϯεͷऔΕͨσʔ ληοτʹ͍ͨ͠ • ઌߦݚڀͷΞϊςʔγϣϯ͞ΕͨσʔληοτΛ͏ɻ • ݅Λม͑ͨͭͷ#FSUϕʔεͷྨثΛ༻ҙͯ͠ɺΞϯαϯϒϧֶशʢ֤ྨثΛԾత ͳΞϊςʔλʔͱݟ၏͢ʣɻ • ଟܾͷථΛҎԼͷΑ͏ʹఆٛ • Ξϯαϯϒϧͷ֤ҰகΫϥεʢ" " "ʣʹ͍ͭͯ ͷπΠʔτΛબ͠ɺυϝΠ ϯ͝ͱʹ߹ܭ ͷπΠʔτΛநग़ɻ 5 Data Selection and Annotation
- ωΠςΟϒεϐʔΧʔਓ͕Ξϊςʔγϣϯ - ߴ࣭ͳΞϊςʔγϣϯΛ࣮ݱ͢Δ • ਓͷઐՈʹΞϊςʔγϣϯΛґཔ͠ɺશʹҰகͨ͠πΠʔτΛΰʔϧυελϯμʔυ ͱ͢Δɻ • ΫϥυϫʔΧʔ͕ΰʔϧυΛؒҧͬͨ߹ɺͦͷ)*5ഁغɻ •
૯߹తͳਫ਼͕࠷Ͱʹୡ͠ͳ͔ͬͨϫʔΧʔ͕ߦͬͨΞϊςʔγϣϯΛͯ͢আɻ - ".5ͰΞϊςʔγϣϯ͞ΕͨπΠʔτͷ૯ ݅ 6 こまかいAnnotation 内容 ͜ΕؒతʹBHSFFNFOUཧɺ CJBTআڈ͍ͯ͠ΔͷͰʁ
- ΫϥυΞϊςʔγϣϯͷ߈ܸతͳπΠʔτ͕ˋɺඇ߈ܸతͳπΠʔτ͕ˋɺ ΞϯαϯϒϧΞϊςʔγϣϯͰͦΕͧΕˋɻ - ΞϯαϯϒϧͷBHSFFNFOUΫϥεVOJGPSN͕ͩɺΫϥυΞϊςʔγϣϯVOJGPSNͰ ͳ͍ɻ - ΞϯαϯϒϧͱΫϥυͷBHSFFFNFOUͷؒͷ1FBSTPOTͷ૬ؔ - ΞϯαϯϒϧΛ༻͍ͨ1SFTDSFFOᐆດ͞ͷͳ͍ɺ·ͨΑΓࠔͳπΠʔτΛಛఆͰ͖
ΔՄೳੑ͕͋Δɻ 7 (Annotation結果)
- ໊ࢺ - ಛఆͷਓάϧʔϓΛରͱ͠ͳ͍ҰൠతͳౖΓͷදݱ - 2VFTUJPOT ˠ EJTBHSFFNFOUɺQPPSBOOPUBUJPOʹىҼ͢ΔͷͰͳ͘ɺπΠʔτͷղऍͷ૬ ҧʹىҼ͢Δͷɻ 8
アノテーションのdisagreementにつながる現象
- 5SBJO 5FTUEBUB • " " ͷͭΛ܇࿅ʹ༻͍Δͷ͕Ͳͷ5FTUTFUʹରͯ͠Ұ൪ྑ͍ • "HSFFNFOU͕͍σʔλΓਫ਼ΛԼ͛Δ 9
Classificationに対する影響
"SBOE"ͷϥϕϧΛϥϯμϜͳͷʹஔ͖͑ͨͷ • "SBOEΑΓ"ͷํ͕ྑ͍ͷͰɺ"ͷϥϕϧۮવʹׂΓͯΒΕͨͷͰ ͳ͘ɺࠔͰ͋Δ͕ྨثʹͱͬͯ༗༻ͳ৴߸ΛؚΜͰ͍Δɻ • Πϯελϯεͷআใͷଛࣦʹͭͳ͕ΔͨΊྑ͘ͳ͍ 10 Agreementが低いものは精度向上に役⽴つか
4IBSEUBTL 0GGFOTFWBM Ͱ༻͍ΒΕ͍ͯΔσʔληοτʹରͯ͠.UVSLͰΞϊςʔγϣϯͨ͠ͷ • طଘͷλεΫେͷγεςϜ͕ೋྨλεΫͰ'είΞʼ • ͍͠Πϯελϯε͕গͳ͍ͨΊείΞ͕ߴ͘ͳ͍ͬͯΔ ˠΦʔόʔαϯϓϦϯάͱ͔Ͱ͍͠ΠϯελϯεΛ૿͖͢Ͱʁʢஶऀʣ - ͔͠͠ɺσʔληοτͷେنσʔλͱൺͯಉ͡ͳͷͰࣗવɻ
11 既存データセットに対する調査 λεΫͱͯࣗ͠વͳͱΦʔόʔαϯ ϓϦϯά͢ΔͷͱͲͬͪͷํ͕·͍͠ʁ
- "HSFFNFOU͕͍σʔλਫ਼ΛԼ͛Δ͕ɺਫ਼ΛԼ͛Δ͔ΒͱݴͬͯϊΠζͰ ͳ͍ - σʔλʹଟ༷ੑΛऔΓೖΕɺϚΠϊϦςΟͷͷഉআͱਓޱ౷ܭֶతͳޡΓͷ྆ํΛ ݮΒ͍ͨ͠ͷͰ͋Εʢ)PWZ BOE4QSVJU ʣɺҙݟͷ૬ҧϊΠζͰͳ͘ TJHOBMͱͯ͠ݟΒΕΔ͖Ͱ͢ɻ -
ײ • ͜ͷݚڀͷ"HSFFNFOUSBUFͷ͍Πϯελϯεʹ݁ہϊΠζͱTUBCMFPQJOJPOTͷͲͪΒ ؚ·Ε͍ͯΔΑ͏ͳؾ͕͢Δ 12 まとめ