Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coreference Resolution - Haghighi and Klein’s Model

Coreference Resolution - Haghighi and Klein’s Model

Mahak Gupta

July 08, 2016
Tweet

More Decks by Mahak Gupta

Other Decks in Education

Transcript

  1. Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Authors: Aria

    Haghighi and Dan Klein Presented By: Mahak Gupta Date: 8th July 2016 1 ACL 2007
  2. Haghighi and Klein’s Model • Nonparametric Bayesian model • Directed

    Graphical Model. • Enables the use of prior knowledge to put a higher probability on hypotheses deemed more likely. • We’ll not discuss the Mathematics specifics for Bayesian Model in this talk • Don’t commit to a particular set of parameters (don’t attempt to compute the most likely hypothesis) • Dirichlet processes are used in this paper Adapted from Vincent Ng Class on Coreference resolution at UT Dallas
  3. Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

    Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 3
  4. Introduction • Reference to an entity in natural language is

    a two step processes. a) First, speakers introduce new entities into discourse (with proper or nominal expressions) b) Second, speakers refer back to entities already introduced (with pronouns) 4
  5. Introduction • In general a document consists of a set

    of mentions (usually noun phrases) • A mention is a reference to some entity • There are three types of mentions: 1.Proper (names) 2.Nominal (descriptions) 3.Pronominal (pronouns) • Therefore, the coreference resolution problem is to partition the mentions according to their referents 5
  6. Introduction : Example • The Weir Group1 , whose2 headquarters3

    is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. 6
  7. Introduction : Dirichlet Processes • A Dirichilet process is a

    mathematical method. • It gives probabilities of how likely a set of random variables will lie in a certain probability distribution. 7 Blackbox α (Concentration Param) β (Distribution) Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart) (Melissa, The red-haired woman) (Melissa=0.05, The=0.05, red-haired=0.75 woman=0.15) Document Find adjective
  8. Background • Primary approach is to treat the problem as

    a set of pairwise coreference decisions –Use discriminative learning with features encoding properties such as distance and environment • However, there are several problems with this approach –Rich features require a large amount of data à Not available always –Greedy approach is generally adopted which works well only for pairwise model.. 8
  9. Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

    Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 9
  10. Data • Automatic Context Extraction (ACE) 2004 task (View Data)

    – Used English translations of the Arabic and Chinese treebanks. – 95 documents, 3905 mentions – Access restricted (LDC*), only training data available • MUC-6 – Data from the 6th Message Understanding Conference – Training, development and test data available – No manual annotation of head and mention type 10 Linguistic Data Consortium
  11. Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

    Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 11
  12. Model : Assumptions • The system assumes that the following

    data is provided as input: – The true mention boundaries – The head words for mentions – ACE - this is already given – MUC-6 – Right most token is taken – The mention types 12
  13. Finite mixture models • Documents are independent, with the exception

    of some global parameters (I : document collection) • Each document is a mixture of a fixed number of components (entities), K • Each entity is associated with a multinomial distribution over head words • The head word for each mention is drawn from the associated multinomial 13 Zi – Index of Entity I Hi – Head of Entity I (Observed) J – One Document Feature vector Dirichlet Process
  14. • A big problem with this model is that the

    number of entities, K • In real life it is not possible to have a fix value of K entities in a document. • What we want is for the model to be able to select K itself. • Solution : Replace the finite Dirichlet with the non- parametric Dirichlet process (DP) 14 Finite mixture models : Problems
  15. Infinite mixture models 15 Zi – Index of Entity I

    Hi – Head of Entity I (Observed) Feature vector Dirichlet Process ∞ ∞
  16. • The approach is effective for proper and some nominal

    mentions, but do not make sense for pronominal mentions. • F1 = 54.5 on development set • The Weir Group1 , whose2 headquarters3 is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. Infinite mixture models : Problems 16
  17. Pronoun Head Model • Idea : When generating a head

    word for a mention we consider more than the entity specific multinomial distribution over head words. • i.e Add more linguistic feature to enrich our model. • Features added : – Entity type (Person, Location, Organization, Misc.) – Gender (Male, Female, Neuter) – Number (Single, Plural) 17
  18. • Substantial Improvement F1 = 64.1 on development set. •

    The model corrects the systematic problem of pronouns being considered. Still there is no local preference for pronominal mentions existing in this model. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which6 will be situated in Rudong7 , Jiangsu8 , has an annual generation capacity of 2.4 million kilowatts. • Solution : Introduce Salience in model. Pronoun Head Model : Problems 19
  19. Salience Information • Salience models how present an entity is

    in the current discourse. • Idea: Add salient weights, so that it becomes more likely to align a pronoun to the most salient entity 20 Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart)
  20. Salience Information - modelling • Each entity/cluster is initially assigned

    a salience value of 0 • As we process the discourse, the salience value of each entity will change – When we encounter a mention, we update the salience scores (* 0.5 for each entity and add 1 to current entity) 22
  21. • Substantial Improvement F1 = 71.5 on development set. •

    The model now correctly aligns the pronouns to the most salient entity mention. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which5 will be situated in Rudong6 , Jiangsu7 , has an annual generation capacity of 2.4 million kilowatts. Salience Information : No Problems 23
  22. Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

    Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 24
  23. Results Analysis • Difficult to compare because most of the

    other systems are supervised. • Most comparable supervised system on MUC-6 test set: F1 = 73.4 (compare to F1 = 63.9 of Haghighi and Klein). • Unsupervised systems often tend to under-perform supervised systems. Considering this the results in the current Unsupervised setting are reasonably alright. • Higher performance on Chinese data due to due to the lack of prenominal mentions as well as fewer pronouns as compared to English 27
  24. Agenda 1.Introduction and Background 2.Coreference Resolution Models a) Finite/Infinite Mixture

    Model b) Pronoun Head Model c) Salience Information 3.Data 4.Experiments and Results 5.Conclusion 28
  25. Conclusion • The paper discusses a Unsupervised approach towards Coreference

    Resolution. • Adding salience to pronoun entity model results in the best performance. • Results of this Unsupervised approach are comparable with state-of-the-art Supervised approach towards Coreference Resolution. 29
  26. References • Haghighi, A., & Klein, D. (2007). Unsupervised coreference

    resolution in a nonparametric bayesian model. In J. A. Carroll, A. Bosch, & A. Zaenen (Eds.), Proceedings of the 45th annual meeting of the ACL (Vol. 45, pp. 848–855). Association for Computational Linguistics. • Ng, V. (2008). Unsupervised models for coreference resolution. In Proceedings of the conference on empirical methods in natural language processing (pp. 640–649). Stroudsburg, PA, USA: Association for Computational Linguistics. • Brandon Norick talk in UoI, Urbana Champaign • Janis Pagel Talk in Discourse Theories and Models class. 30