Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Entity Centric Coreference Resolution with Model Stacking

Entity Centric Coreference Resolution with Model Stacking

Kevin Clark and Christopher D. Manning, Entity Centric Coreference Resolution with Model Stacking, ACL-IJCNLP 2015.

Presented by Mamoru Komachi at ACL 2015 Reading Group @ Tokyo Institute of Technology.

Mamoru Komachi

August 23, 2015
Tweet

More Decks by Mamoru Komachi

Other Decks in Research

Transcript

  1. Entity Centric Coreference Resolution with Model Stacking Kevin Clark and

    Christopher D. Manning (ACL-IJCNLP 2015) (Tables are taken from the above-mentioned paper) Presented by Mamoru Komachi <[email protected]> ACL 2015 Reading Group @ Tokyo Institute of Technology August 26th, 2015
  2. Entity-level information allows early coreference decisions to inform later ones

    v Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014) 2 Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016. …. Clinton is confident that her poll numbers will skyrocket once the divorce is final. ? ! ?
  3. Problem: How to build up clusters effectively? v Model stacking

    v Two mention pair models: classification model and ranking model v Generates clusters features for clusters of mentions v Imitation learning v Assigns exact costs to actions based on coreference evaluation metrics v Uses the scores of the pairwise models to reduce the search space 3
  4. Two models for predicting whether a given pair of mentions

    belong to the same coreference cluster v Is it a coreferent? v Classification model v Which one best suites for the mention? v Ranking model 5 Bill arrived, but nobody saw him. I talked to him on the phone.
  5. Logistic classifiers for classification model M: set of all mentions

    in the training set T(m): set of true antecedents of a mention m F(m): set of false antecedents of m v Considers each pair of mentions independently 6
  6. Logistic classifiers for ranking model v Considers candidate antecedents simultaneously

    v Max-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model 7
  7. Features for mention pair model v Distance features: the distance

    between the two mentions in sentences or number of mentions v Syntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head word v Semantic features: named entity type, speaker identification v Rule-based features: exact and partial string matching v Lexical features: the first, last, and head word of the current mention 8
  8. Entity-centric model can exhibit high coherency v Best first clustering

    (Ng and Cardie, 2002) v Assigns the most probable preceding mention classified as coreferent with it as the antecedent v Only relies on local information v Entity-centric model (this work) v Operates between pairs of clusters instead of pairs of mentions v Builds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one 10
  9. Inference v Reducing the search space by using a threshold

    from mention-pair models v Sort P to perform easy-first clustering v s is a scoring function to make a binary decision for merge action 11
  10. Learning entity-centric model by imitation learning v Sequential prediction problem:

    future observations depend on previous actions v Imitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009) v Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite high v DAgger exposes the system to states at train time similar to the ones it will face at test time 12
  11. Learning cluster merging policy by DAgger (Ross et al., 2011)

    v Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states v b controls the probability of the expert’s policy and current policy (decays exponentially as the iteration number increases) 13
  12. Adding cost to actions: Directly tune to optimize coreference metrics

    v Merging clusters (order of merge operations is also important) influence the score v How a particular local decision will affect the final score of the coreference system? v Problem: standard coreference metrics do not decompose into clusters v Answer: rolling out the actions from the current state 14 A(s): set of actions that can be taken from the state s
  13. Cluster features for classification model and ranking model v Between

    clusters features v Minimum and maximum probability of coreference v Average probability and average log prob. of coreference v Average probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not) 15
  14. Only 56 features for entity-centric model v State features v

    Whether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current one v The index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?) v … v Entity-centric model doesn’t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model) 16
  15. Experimental setup: CoNLL 2012 Shared Task v English portion of

    OntoNotes v Training: 2802, development: 343, test:345 documents v Use the provided pre-processing (parse trees, NE, etc) v Common evaluation metrics v MUC, B3, CEAFE v CoNLL F1 (the average F1 score of the three metrics) v CoNLL scorer version 8.01 v Rule-based mention detection (Raghunathan et al., 2010) 18
  16. Entity-centric model beats other state-of- the-art coreference models 20 v

    This work primarily optimize for B3 metric during training v State-of-the-art systems use latent antecedents to learn scoring functions over mention pairs, but are trained to maximize global objective functions
  17. Entity-centric model directly learns a coreference model that maximizes an

    evaluation metric v Post-processing of mention pair and ranking models v Closest-first clustering (Soon et al., 2001) v Best-first clustering (Ng and Cardie, 2002) v Global inference models v Global inference with integer linear programming (Denis and Baldridge, 2007; Finkel and Manning, 2008) v Graph partitioning (McCallum and Wellner, 2005; Nicolae and Nicolae, 2006) v Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005) 21
  18. Previous approaches do not directly tune against coreference metrics v

    Non-local entity-level information v Cluster model (Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011) v Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010) v Learning trajectories of decisions v Imitation learning (Daume et al., 2005; Ma et al., 2014) v Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014) 22
  19. Summary v Proposed an entity-centric coreference model using the scores

    produced by mention pair models as features v Pairwise scores are learned using standard coreference metrics v Imitation learning can be used to learn how to build up coreference chains incrementally v Proposed model outperforms the commonly used best- first method and current state-of-the-art 23