Mahak Gupta
July 08, 2016
43

# Coreference Resolution - Haghighi and Klein’s Model

July 08, 2016

## Transcript

1. ### Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Authors: Aria

Haghighi and Dan Klein Presented By: Mahak Gupta Date: 8th July 2016 1 ACL 2007
2. ### Haghighi and Klein’s Model • Nonparametric Bayesian model • Directed

Graphical Model. • Enables the use of prior knowledge to put a higher probability on hypotheses deemed more likely. • We’ll not discuss the Mathematics specifics for Bayesian Model in this talk • Don’t commit to a particular set of parameters (don’t attempt to compute the most likely hypothesis) • Dirichlet processes are used in this paper Adapted from Vincent Ng Class on Coreference resolution at UT Dallas
3. ### Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 3
4. ### Introduction • Reference to an entity in natural language is

a two step processes. a) First, speakers introduce new entities into discourse (with proper or nominal expressions) b) Second, speakers refer back to entities already introduced (with pronouns) 4
5. ### Introduction • In general a document consists of a set

of mentions (usually noun phrases) • A mention is a reference to some entity • There are three types of mentions: 1.Proper (names) 2.Nominal (descriptions) 3.Pronominal (pronouns) • Therefore, the coreference resolution problem is to partition the mentions according to their referents 5
6. ### Introduction : Example • The Weir Group1 , whose2 headquarters3

is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. 6
7. ### Introduction : Dirichlet Processes • A Dirichilet process is a

mathematical method. • It gives probabilities of how likely a set of random variables will lie in a certain probability distribution. 7 Blackbox α (Concentration Param) β (Distribution) Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart) (Melissa, The red-haired woman) (Melissa=0.05, The=0.05, red-haired=0.75 woman=0.15) Document Find adjective
8. ### Background • Primary approach is to treat the problem as

a set of pairwise coreference decisions Use discriminative learning with features encoding properties such as distance and environment • However, there are several problems with this approach Rich features require a large amount of data à Not available always Greedy approach is generally adopted which works well only for pairwise model.. 8
9. ### Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 9
10. ### Data • Automatic Context Extraction (ACE) 2004 task (View Data)

 Used English translations of the Arabic and Chinese treebanks.  95 documents, 3905 mentions  Access restricted (LDC*), only training data available • MUC-6  Data from the 6th Message Understanding Conference  Training, development and test data available  No manual annotation of head and mention type 10 Linguistic Data Consortium
11. ### Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 11
12. ### Model : Assumptions • The system assumes that the following

data is provided as input:  The true mention boundaries  The head words for mentions  ACE - this is already given  MUC-6 – Right most token is taken  The mention types 12
13. ### Finite mixture models • Documents are independent, with the exception

of some global parameters (I : document collection) • Each document is a mixture of a fixed number of components (entities), K • Each entity is associated with a multinomial distribution over head words • The head word for each mention is drawn from the associated multinomial 13 Zi – Index of Entity I Hi – Head of Entity I (Observed) J – One Document Feature vector Dirichlet Process
14. ### • A big problem with this model is that the

number of entities, K • In real life it is not possible to have a fix value of K entities in a document. • What we want is for the model to be able to select K itself. • Solution : Replace the finite Dirichlet with the non- parametric Dirichlet process (DP) 14 Finite mixture models : Problems
15. ### Infinite mixture models 15 Zi – Index of Entity I

Hi – Head of Entity I (Observed) Feature vector Dirichlet Process ∞ ∞
16. ### • The approach is effective for proper and some nominal

mentions, but do not make sense for pronominal mentions. • F1 = 54.5 on development set • The Weir Group1 , whose2 headquarters3 is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. Infinite mixture models : Problems 16
17. ### Pronoun Head Model • Idea : When generating a head

word for a mention we consider more than the entity specific multinomial distribution over head words. • i.e Add more linguistic feature to enrich our model. • Features added :  Entity type (Person, Location, Organization, Misc.)  Gender (Male, Female, Neuter)  Number (Single, Plural) 17

19. ### • Substantial Improvement F1 = 64.1 on development set. •

The model corrects the systematic problem of pronouns being considered. Still there is no local preference for pronominal mentions existing in this model. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which6 will be situated in Rudong7 , Jiangsu8 , has an annual generation capacity of 2.4 million kilowatts. • Solution : Introduce Salience in model. Pronoun Head Model : Problems 19
20. ### Salience Information • Salience models how present an entity is

in the current discourse. • Idea: Add salient weights, so that it becomes more likely to align a pronoun to the most salient entity 20 Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart)

22. ### Salience Information - modelling • Each entity/cluster is initially assigned

a salience value of 0 • As we process the discourse, the salience value of each entity will change  When we encounter a mention, we update the salience scores (* 0.5 for each entity and add 1 to current entity) 22
23. ### • Substantial Improvement F1 = 71.5 on development set. •

The model now correctly aligns the pronouns to the most salient entity mention. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which5 will be situated in Rudong6 , Jiangsu7 , has an annual generation capacity of 2.4 million kilowatts. Salience Information : No Problems 23
24. ### Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 24

27. ### Results Analysis • Difficult to compare because most of the

other systems are supervised. • Most comparable supervised system on MUC-6 test set: F1 = 73.4 (compare to F1 = 63.9 of Haghighi and Klein). • Unsupervised systems often tend to under-perform supervised systems. Considering this the results in the current Unsupervised setting are reasonably alright. • Higher performance on Chinese data due to due to the lack of prenominal mentions as well as fewer pronouns as compared to English 27
28. ### Agenda 1.Introduction and Background 2.Coreference Resolution Models a) Finite/Infinite Mixture

Model b) Pronoun Head Model c) Salience Information 3.Data 4.Experiments and Results 5.Conclusion 28
29. ### Conclusion • The paper discusses a Unsupervised approach towards Coreference

Resolution. • Adding salience to pronoun entity model results in the best performance. • Results of this Unsupervised approach are comparable with state-of-the-art Supervised approach towards Coreference Resolution. 29
30. ### References • Haghighi, A., & Klein, D. (2007). Unsupervised coreference

resolution in a nonparametric bayesian model. In J. A. Carroll, A. Bosch, & A. Zaenen (Eds.), Proceedings of the 45th annual meeting of the ACL (Vol. 45, pp. 848–855). Association for Computational Linguistics. • Ng, V. (2008). Unsupervised models for coreference resolution. In Proceedings of the conference on empirical methods in natural language processing (pp. 640–649). Stroudsburg, PA, USA: Association for Computational Linguistics. • Brandon Norick talk in UoI, Urbana Champaign • Janis Pagel Talk in Discourse Theories and Models class. 30