Slide 1

Slide 1 text

Entity Centric Coreference Resolution with Model Stacking Kevin Clark and Christopher D. Manning (ACL-IJCNLP 2015) (Tables are taken from the above-mentioned paper) Presented by Mamoru Komachi ACL 2015 Reading Group @ Tokyo Institute of Technology August 26th, 2015

Slide 2

Slide 2 text

Entity-level information allows early coreference decisions to inform later ones v Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014) 2 Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016. …. Clinton is confident that her poll numbers will skyrocket once the divorce is final. ? ! ?

Slide 3

Slide 3 text

Problem: How to build up clusters effectively? v Model stacking v Two mention pair models: classification model and ranking model v Generates clusters features for clusters of mentions v Imitation learning v Assigns exact costs to actions based on coreference evaluation metrics v Uses the scores of the pairwise models to reduce the search space 3

Slide 4

Slide 4 text

Mention Pair Models Previous approach using local information 4

Slide 5

Slide 5 text

Two models for predicting whether a given pair of mentions belong to the same coreference cluster v Is it a coreferent? v Classification model v Which one best suites for the mention? v Ranking model 5 Bill arrived, but nobody saw him. I talked to him on the phone.

Slide 6

Slide 6 text

Logistic classifiers for classification model M: set of all mentions in the training set T(m): set of true antecedents of a mention m F(m): set of false antecedents of m v Considers each pair of mentions independently 6

Slide 7

Slide 7 text

Logistic classifiers for ranking model v Considers candidate antecedents simultaneously v Max-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model 7

Slide 8

Slide 8 text

Features for mention pair model v Distance features: the distance between the two mentions in sentences or number of mentions v Syntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head word v Semantic features: named entity type, speaker identification v Rule-based features: exact and partial string matching v Lexical features: the first, last, and head word of the current mention 8

Slide 9

Slide 9 text

Entity-Centric Coreference Model Proposed approach using cluster features 9

Slide 10

Slide 10 text

Entity-centric model can exhibit high coherency v Best first clustering (Ng and Cardie, 2002) v Assigns the most probable preceding mention classified as coreferent with it as the antecedent v Only relies on local information v Entity-centric model (this work) v Operates between pairs of clusters instead of pairs of mentions v Builds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one 10

Slide 11

Slide 11 text

Inference v Reducing the search space by using a threshold from mention-pair models v Sort P to perform easy-first clustering v s is a scoring function to make a binary decision for merge action 11

Slide 12

Slide 12 text

Learning entity-centric model by imitation learning v Sequential prediction problem: future observations depend on previous actions v Imitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009) v Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite high v DAgger exposes the system to states at train time similar to the ones it will face at test time 12

Slide 13

Slide 13 text

Learning cluster merging policy by DAgger (Ross et al., 2011) v Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states v b controls the probability of the expert’s policy and current policy (decays exponentially as the iteration number increases) 13

Slide 14

Slide 14 text

Adding cost to actions: Directly tune to optimize coreference metrics v Merging clusters (order of merge operations is also important) influence the score v How a particular local decision will affect the final score of the coreference system? v Problem: standard coreference metrics do not decompose into clusters v Answer: rolling out the actions from the current state 14 A(s): set of actions that can be taken from the state s

Slide 15

Slide 15 text

Cluster features for classification model and ranking model v Between clusters features v Minimum and maximum probability of coreference v Average probability and average log prob. of coreference v Average probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not) 15

Slide 16

Slide 16 text

Only 56 features for entity-centric model v State features v Whether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current one v The index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?) v … v Entity-centric model doesn’t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model) 16

Slide 17

Slide 17 text

Results and discussions CoNLL 2012 English coreference task 17

Slide 18

Slide 18 text

Experimental setup: CoNLL 2012 Shared Task v English portion of OntoNotes v Training: 2802, development: 343, test:345 documents v Use the provided pre-processing (parse trees, NE, etc) v Common evaluation metrics v MUC, B3, CEAFE v CoNLL F1 (the average F1 score of the three metrics) v CoNLL scorer version 8.01 v Rule-based mention detection (Raghunathan et al., 2010) 18

Slide 19

Slide 19 text

Results: Entity-centric model outperforms best-first clustering in both classification and ranking 19

Slide 20

Slide 20 text

Entity-centric model beats other state-of- the-art coreference models 20 v This work primarily optimize for B3 metric during training v State-of-the-art systems use latent antecedents to learn scoring functions over mention pairs, but are trained to maximize global objective functions

Slide 21

Slide 21 text

Entity-centric model directly learns a coreference model that maximizes an evaluation metric v Post-processing of mention pair and ranking models v Closest-first clustering (Soon et al., 2001) v Best-first clustering (Ng and Cardie, 2002) v Global inference models v Global inference with integer linear programming (Denis and Baldridge, 2007; Finkel and Manning, 2008) v Graph partitioning (McCallum and Wellner, 2005; Nicolae and Nicolae, 2006) v Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005) 21

Slide 22

Slide 22 text

Previous approaches do not directly tune against coreference metrics v Non-local entity-level information v Cluster model (Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011) v Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010) v Learning trajectories of decisions v Imitation learning (Daume et al., 2005; Ma et al., 2014) v Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014) 22

Slide 23

Slide 23 text

Summary v Proposed an entity-centric coreference model using the scores produced by mention pair models as features v Pairwise scores are learned using standard coreference metrics v Imitation learning can be used to learn how to build up coreference chains incrementally v Proposed model outperforms the commonly used best- first method and current state-of-the-art 23