Coreference Resolution - Haghighi and Klein’s Model

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Authors: Aria
Haghighi and Dan Klein Presented By: Mahak Gupta Date: 8th July 2016 1 ACL 2007

Haghighi and Klein’s Model • Nonparametric Bayesian model • Directed
Graphical Model. • Enables the use of prior knowledge to put a higher probability on hypotheses deemed more likely. • We’ll not discuss the Mathematics specifics for Bayesian Model in this talk • Don’t commit to a particular set of parameters (don’t attempt to compute the most likely hypothesis) • Dirichlet processes are used in this paper Adapted from Vincent Ng Class on Coreference resolution at UT Dallas

Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite
Mixture Model b) Pronoun Head Model c) Salience Information 4.Experiments and Results 5.Conclusion 3

Introduction • Reference to an entity in natural language is
a two step processes. a) First, speakers introduce new entities into discourse (with proper or nominal expressions) b) Second, speakers refer back to entities already introduced (with pronouns) 4

Introduction • In general a document consists of a set
of mentions (usually noun phrases) • A mention is a reference to some entity • There are three types of mentions: 1.Proper (names) 2.Nominal (descriptions) 3.Pronominal (pronouns) • Therefore, the coreference resolution problem is to partition the mentions according to their referents 5

Introduction : Example • The Weir Group1 , whose2 headquarters3
is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. 6

Introduction : Dirichlet Processes • A Dirichilet process is a
mathematical method. • It gives probabilities of how likely a set of random variables will lie in a certain probability distribution. 7 Blackbox α (Concentration Param) β (Distribution) Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart) (Melissa, The red-haired woman) (Melissa=0.05, The=0.05, red-haired=0.75 woman=0.15) Document Find adjective

Background • Primary approach is to treat the problem as
a set of pairwise coreference decisions Use discriminative learning with features encoding properties such as distance and environment • However, there are several problems with this approach Rich features require a large amount of data à Not available always Greedy approach is generally adopted which works well only for pairwise model.. 8

Data • Automatic Context Extraction (ACE) 2004 task (View Data)
Used English translations of the Arabic and Chinese treebanks. 95 documents, 3905 mentions Access restricted (LDC*), only training data available • MUC-6 Data from the 6th Message Understanding Conference Training, development and test data available No manual annotation of head and mention type 10 Linguistic Data Consortium

Model : Assumptions • The system assumes that the following
data is provided as input: The true mention boundaries The head words for mentions ACE - this is already given MUC-6 – Right most token is taken The mention types 12

Finite mixture models • Documents are independent, with the exception
of some global parameters (I : document collection) • Each document is a mixture of a fixed number of components (entities), K • Each entity is associated with a multinomial distribution over head words • The head word for each mention is drawn from the associated multinomial 13 Zi – Index of Entity I Hi – Head of Entity I (Observed) J – One Document Feature vector Dirichlet Process

• A big problem with this model is that the
number of entities, K • In real life it is not possible to have a fix value of K entities in a document. • What we want is for the model to be able to select K itself. • Solution : Replace the finite Dirichlet with the nonparametric Dirichlet process (DP) 14 Finite mixture models : Problems

Infinite mixture models 15 Zi – Index of Entity I
Hi – Head of Entity I (Observed) Feature vector Dirichlet Process ∞ ∞

• The approach is effective for proper and some nominal
mentions, but do not make sense for pronominal mentions. • F1 = 54.5 on development set • The Weir Group1 , whose2 headquarters3 is in the US4 , is a large, specialized corporation5 investing in the area of electricity generation. This power plant6 , which7 will be situated in Rudong8 , Jiangsu9 , has an annual generation capacity of 2.4 million kilowatts. Infinite mixture models : Problems 16

Pronoun Head Model • Idea : When generating a head
word for a mention we consider more than the entity specific multinomial distribution over head words. • i.e Add more linguistic feature to enrich our model. • Features added : Entity type (Person, Location, Organization, Misc.) Gender (Male, Female, Neuter) Number (Single, Plural) 17

Pronoun Head Model – Graphical Model 18

• Substantial Improvement F1 = 64.1 on development set. •
The model corrects the systematic problem of pronouns being considered. Still there is no local preference for pronominal mentions existing in this model. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which6 will be situated in Rudong7 , Jiangsu8 , has an annual generation capacity of 2.4 million kilowatts. • Solution : Introduce Salience in model. Pronoun Head Model : Problems 19

Salience Information • Salience models how present an entity is
in the current discourse. • Idea: Add salient weights, so that it becomes more likely to align a pronoun to the most salient entity 20 Adapted from Janis Pagel talk in Discourse Theory and Computation (University of Stuttgart)

Salience Information – Graphical Model 21

Salience Information - modelling • Each entity/cluster is initially assigned
a salience value of 0 • As we process the discourse, the salience value of each entity will change When we encounter a mention, we update the salience scores (* 0.5 for each entity and add 1 to current entity) 22

• Substantial Improvement F1 = 71.5 on development set. •
The model now correctly aligns the pronouns to the most salient entity mention. • The Weir Group1 , whose1 headquarters2 is in the US3 , is a large, specialized corporation4 investing in the area of electricity generation. This power plant5 , which5 will be situated in Rudong6 , Jiangsu7 , has an annual generation capacity of 2.4 million kilowatts. Salience Information : No Problems 23

Experiments and Results (MUC-6) 25

Experiments and Results (ACE-2004) 26

Results Analysis • Difficult to compare because most of the
other systems are supervised. • Most comparable supervised system on MUC-6 test set: F1 = 73.4 (compare to F1 = 63.9 of Haghighi and Klein). • Unsupervised systems often tend to under-perform supervised systems. Considering this the results in the current Unsupervised setting are reasonably alright. • Higher performance on Chinese data due to due to the lack of prenominal mentions as well as fewer pronouns as compared to English 27

Agenda 1.Introduction and Background 2.Coreference Resolution Models a) Finite/Infinite Mixture
Model b) Pronoun Head Model c) Salience Information 3.Data 4.Experiments and Results 5.Conclusion 28

Conclusion • The paper discusses a Unsupervised approach towards Coreference
Resolution. • Adding salience to pronoun entity model results in the best performance. • Results of this Unsupervised approach are comparable with state-of-the-art Supervised approach towards Coreference Resolution. 29

References • Haghighi, A., & Klein, D. (2007). Unsupervised coreference
resolution in a nonparametric bayesian model. In J. A. Carroll, A. Bosch, & A. Zaenen (Eds.), Proceedings of the 45th annual meeting of the ACL (Vol. 45, pp. 848–855). Association for Computational Linguistics. • Ng, V. (2008). Unsupervised models for coreference resolution. In Proceedings of the conference on empirical methods in natural language processing (pp. 640–649). Stroudsburg, PA, USA: Association for Computational Linguistics. • Brandon Norick talk in UoI, Urbana Champaign • Janis Pagel Talk in Discourse Theories and Models class. 30

Thanks!! Confusions… Questions… Feedback… Please 31

Coreference Resolution - Haghighi and Klein’s M...

Coreference Resolution - Haghighi and Klein’s Model

Mahak Gupta

More Decks by Mahak Gupta

Other Decks in Education

Featured

Transcript

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Authors: Aria

Haghighi and Klein’s Model • Nonparametric Bayesian model • Directed

Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Introduction • Reference to an entity in natural language is

Introduction • In general a document consists of a set

Introduction : Example • The Weir Group1 , whose2 headquarters3

Introduction : Dirichlet Processes • A Dirichilet process is a

Background • Primary approach is to treat the problem as

Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Data • Automatic Context Extraction (ACE) 2004 task (View Data)

Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Model : Assumptions • The system assumes that the following

Finite mixture models • Documents are independent, with the exception

• A big problem with this model is that the

Infinite mixture models 15 Zi – Index of Entity I

• The approach is effective for proper and some nominal

Pronoun Head Model • Idea : When generating a head

Pronoun Head Model – Graphical Model 18

• Substantial Improvement F1 = 64.1 on development set. •

Salience Information • Salience models how present an entity is

Salience Information – Graphical Model 21

Salience Information - modelling • Each entity/cluster is initially assigned

• Substantial Improvement F1 = 71.5 on development set. •

Agenda 1.Introduction and Background 2.Data 3.Coreference Resolution Models a) Finite/Infinite

Experiments and Results (MUC-6) 25

Experiments and Results (ACE-2004) 26

Results Analysis • Difficult to compare because most of the

Agenda 1.Introduction and Background 2.Coreference Resolution Models a) Finite/Infinite Mixture

Conclusion • The paper discusses a Unsupervised approach towards Coreference

References • Haghighi, A., & Klein, D. (2007). Unsupervised coreference

Thanks!! Confusions… Questions… Feedback… Please 31