Revealing the Myth of Higher-Order Inference in Coreference Resolution

Slide 1

Slide 1 text

Liyan Xu, Jinho Choi Emory University The Myth of Higher-Order Inference in Coreference Resolution

Slide 2

Slide 2 text

Introduction • PyTorch implementation of end-to-end coreference resolution model • Based on [Lee et al.’17], [Joshi et al.’20] • Four higher-order inference (HOI) methods • Two previous methods from [Lee et al.’ 18], [Kantor and Globerson’19] • Two new methods • Empirical e ff ectiveness of four HOI methods

Slide 3

Slide 3 text

Motivation Recent state-of-the-art performance on CoNLL 2012 shared task Encoder vs. HOI

Slide 4

Slide 4 text

Approach Overview • Local-decision coreference model: c2f-coref by [Lee et al.’18] • Four HOI methods on top: • Span re fi nement • Attended Antecedent (AA): [Lee et al.’18] • Entity Equalization (EE): [Kantor and Globerson’19] • Span Clustering (SC) • Cluster Merging (CM): inspired from [Wiseman et al.’16]

Slide 5

Slide 5 text

Approach End-to-End Coreference Model • Mention-linking process • Local decisions between two spans • Learn distribution over each span’s antecedents

Slide 6

Slide 6 text

Approach Span Re fi nement • Enrich span representation using predicted antecedent distribution • : di ff erent ways of enrichment • Re-rank antecedents

Slide 7

Slide 7 text

Approach Attended Antecedent (AA) • [Lee et al.’18] • attended antecedents over antecedent distribution

Slide 8

Slide 8 text

Approach Entity Equalization (EE) • [Kantor and Globerson’19] • “soft” entity from antecedents • attended entity representation over entity distribution

Slide 9

Slide 9 text

Approach Span Clustering (SC) • New span re fi nement • true predicted entities from antecedent distribution • entity representation: attended spans

Slide 10

Slide 10 text

Approach Cluster Merging (CM) • Build up and maintain entity representation through antecedent ranking • Con fi guration: ranking order (sequential vs. easy- fi rst) • Con fi guration: cluster merging reduction (max vs. average pooling) • Ranking score: antecedent score + cluster matching score

Slide 11

Slide 11 text

Experiments • Transformers-based encoders: BERT and SpanBERT • Evaluation: CoNLL 2012 shared task

Slide 12

Slide 12 text

Results Avg. F1 Avg. F1 - M Joshi et al.’19 76.9 - Joshi et al.’20 79.6 - BERT(Local) 77.4 77.3 ( ± 0.1) SpanBERT(Local) 79.9 79.7 ( ± 0.1) + AA 79.7 79.4 ( ± 0.2) + EE 79.4 78.9 ( ± 0.4) + SC 79.7 79.2 ( ± 0.3) + CM 80.2 79.9 ( ± 0.2)

Slide 13

Slide 13 text

Analysis Direct Impact • Turn o ff HOI at evaluation • Performance drop on test set: trivial AA -0.02 ( ± 0.06) EE 0.03 ( ± 0.07) SC 0.11 ( ± 0.10) CM 0.04 ( ± 0.04)

Slide 14

Slide 14 text

Analysis Coreferent Links • Examine the change of link correctness on test set (W: Wrong; C: Correct) • HOI e ff ects are two-sided W→C C→W + AA 240.8 (1.3%) 241.2 (1.3%) + EE 244.1 (1.3%) 245.3 (1.3%) + SC 248.2 (1.3%) 262.0 (1.4) + CM 226.4 (1.2%) 235.0 (1.2%)

Slide 15

Slide 15 text

Analysis Pronoun Resolution • Examine coreferent links on pronouns w.r.t plurality (S: Singular; P: Plural) S→P P→S BERT(Local) 2.3 6.5 SpanBERT(Local) 2.8 6.6 + AA 1.8 8.8 + EE 1.8 5.5 + SC 3.8 7.2 + CM 3.0 6.6

Slide 16

Slide 16 text

Analysis Ambiguous Pronoun • Long-standing HOI motivation: contamination from ambiguous pronouns • (he, you) and (you, they) → (you, he, they) • Number of clusters containing ambiguous pronouns: trivial di ff erence # Clusters BERT(Local) 48.8 (3.5%) SpanBERT(Local) 43.8 (2.7%) + AA 44.8 (2.4%) + EE 44.0 (2.5%) + SC 45.4 (3.0%) + CM 43.8 (2.6%)

Slide 17

Slide 17 text

Summary • Current HOI provides marginal bene fi ts over local decisions. • HOI depends on the quality of fi rst-round antecedent ranking. • HOI is an implicit regularization (mutually dependent with local ranking).