Entity Centric Coreference Resolution with Model Stacking

Entity Centric Coreference Resolution with Model Stacking Kevin Clark and
Christopher D. Manning (ACL-IJCNLP 2015) (Tables are taken from the above-mentioned paper) Presented by Mamoru Komachi <[email protected]> ACL 2015 Reading Group @ Tokyo Institute of Technology August 26th, 2015

Entity-level information allows early coreference decisions to inform later ones
v Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014) 2 Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016. …. Clinton is confident that her poll numbers will skyrocket once the divorce is final. ? ! ?

Problem: How to build up clusters effectively? v Model stacking
v Two mention pair models: classification model and ranking model v Generates clusters features for clusters of mentions v Imitation learning v Assigns exact costs to actions based on coreference evaluation metrics v Uses the scores of the pairwise models to reduce the search space 3

Mention Pair Models Previous approach using local information 4

Two models for predicting whether a given pair of mentions
belong to the same coreference cluster v Is it a coreferent? v Classification model v Which one best suites for the mention? v Ranking model 5 Bill arrived, but nobody saw him. I talked to him on the phone.

Logistic classifiers for classification model M: set of all mentions
in the training set T(m): set of true antecedents of a mention m F(m): set of false antecedents of m v Considers each pair of mentions independently 6

Logistic classifiers for ranking model v Considers candidate antecedents simultaneously
v Max-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model 7

Features for mention pair model v Distance features: the distance
between the two mentions in sentences or number of mentions v Syntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head word v Semantic features: named entity type, speaker identification v Rule-based features: exact and partial string matching v Lexical features: the first, last, and head word of the current mention 8

Entity-Centric Coreference Model Proposed approach using cluster features 9

Entity-centric model can exhibit high coherency v Best first clustering
(Ng and Cardie, 2002) v Assigns the most probable preceding mention classified as coreferent with it as the antecedent v Only relies on local information v Entity-centric model (this work) v Operates between pairs of clusters instead of pairs of mentions v Builds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one 10

Inference v Reducing the search space by using a threshold
from mention-pair models v Sort P to perform easy-first clustering v s is a scoring function to make a binary decision for merge action 11

Learning entity-centric model by imitation learning v Sequential prediction problem:
future observations depend on previous actions v Imitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009) v Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite high v DAgger exposes the system to states at train time similar to the ones it will face at test time 12

Learning cluster merging policy by DAgger (Ross et al., 2011)
v Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states v b controls the probability of the expert’s policy and current policy (decays exponentially as the iteration number increases) 13

Adding cost to actions: Directly tune to optimize coreference metrics
v Merging clusters (order of merge operations is also important) influence the score v How a particular local decision will affect the final score of the coreference system? v Problem: standard coreference metrics do not decompose into clusters v Answer: rolling out the actions from the current state 14 A(s): set of actions that can be taken from the state s

Cluster features for classification model and ranking model v Between
clusters features v Minimum and maximum probability of coreference v Average probability and average log prob. of coreference v Average probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not) 15

Only 56 features for entity-centric model v State features v
Whether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current one v The index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?) v … v Entity-centric model doesn’t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model) 16

Results and discussions CoNLL 2012 English coreference task 17

Experimental setup: CoNLL 2012 Shared Task v English portion of
OntoNotes v Training: 2802, development: 343, test:345 documents v Use the provided pre-processing (parse trees, NE, etc) v Common evaluation metrics v MUC, B3, CEAFE v CoNLL F1 (the average F1 score of the three metrics) v CoNLL scorer version 8.01 v Rule-based mention detection (Raghunathan et al., 2010) 18

Results: Entity-centric model outperforms best-first clustering in both classification and
ranking 19

Entity-centric model beats other state-of- the-art coreference models 20 v
This work primarily optimize for B3 metric during training v State-of-the-art systems use latent antecedents to learn scoring functions over mention pairs, but are trained to maximize global objective functions

Entity-centric model directly learns a coreference model that maximizes an
evaluation metric v Post-processing of mention pair and ranking models v Closest-first clustering (Soon et al., 2001) v Best-first clustering (Ng and Cardie, 2002) v Global inference models v Global inference with integer linear programming (Denis and Baldridge, 2007; Finkel and Manning, 2008) v Graph partitioning (McCallum and Wellner, 2005; Nicolae and Nicolae, 2006) v Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005) 21

Previous approaches do not directly tune against coreference metrics v
Non-local entity-level information v Cluster model (Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011) v Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010) v Learning trajectories of decisions v Imitation learning (Daume et al., 2005; Ma et al., 2014) v Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014) 22

Summary v Proposed an entity-centric coreference model using the scores
produced by mention pair models as features v Pairwise scores are learned using standard coreference metrics v Imitation learning can be used to learn how to build up coreference chains incrementally v Proposed model outperforms the commonly used best- first method and current state-of-the-art 23

Entity Centric Coreference Resolution with Mode...

Entity Centric Coreference Resolution with Model Stacking

Mamoru Komachi

More Decks by Mamoru Komachi

Other Decks in Research

Featured

Transcript