[Literature review] Learning Topic-Sensitive Word Representations

Literature review: Marzieh Fadaee, Arianna Bisazza and Christof Monz. Proceedings
of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 2 (abs/1705.00441), pages 441–447, July 2017. Nagaoka University of Technology VO HUYNH QUOC VIET  Natural Language Processing Laboratory 2018 / 07 / 19 Learning Topic-Sensitive Word Representations

Abstract • Distributed word representations are widely used for modeling
words in NLP tasks. • Most of the existing models generate one representation per word and do not consider different meanings of a word. • This paper introduced two approaches (unsupervised learning) to learn multiple topic- sensitive representations per word by using Hierarchical Dirichlet Process (HDP). ⇨ Able to distinguish between different meanings of a given word. 2 The polysemous word ⬌ Diverse contexts ⬌ Distinct topic distributions

Introduction • Topic Model: Hierarchical Dirichlet Process (HDP) • Applied
to document modeling. • Uses a Dirichlet process to capture the unknow number of topics. • Assumed that topics and senses are interchangeable, and train individual models per word. 3 Example: the word “bat” in two different sentences: While the team at bat is trying to score run, the team in the field is attempting to record outs. The bat wing is membrane strtched acreoss four “extremely” elongated fingers.

Introduction • Extent distributions over word senses can be approximated
by distributions over topics without assuming these concepts to be identical. • The contributions of this paper are: 1/ Proposed three unsupervised, language-independent approaches to approximate senses with topics and learn multiple topic-sensitive embeddings per word. 2/ Showed that in the Lexical Substitution ranking task our models outperform two competitive baselines. 4

Method - Phrase Dependency Parsing 5 Hard Topic-Labeled Representation Hard
Topic-Labeled + Generic Word Representation Soft Topic-Labeled Representation

Topic-Labeled + Generic Word Representation Soft Topic-Labeled Representation (First approach) Hard Topic-Labeled Representations: • Relies on hard labeling by simply considering each word-topic pair as a separate vocabulary entry. • Use: - topic-sensitive representations for target words (input to the network) - unlabeled word representations for context words (output).

Topic-Labeled + Generic Word Representation Soft Topic-Labeled Representation The embedding of a word in context h(wi ) is obtained by extracting the row of the input lookup table (r) corresponding to the HDP-labeled word-topic pair: wi : context words ; wi τ : target word-topic pair shortcoming of the HTLE model is that: • the representations are trained separately and information is not shared between different topic-sensitive representations of the same word. (First approach) Hard Topic-Labeled Representations: • Relies on hard labeling by simply considering each word-topic pair as a separate vocabulary entry. • Use: - topic-sensitive representations for target words (input to the network) - unlabeled word representations for context words (output).

Topic-Labeled + Generic Word Representation Soft Topic-Labeled Representation A model variant that learns multiple topic- sensitive word representations and word representations simultaneously. Obtained by adding the word-topic pair representation (r’) to the representation of the corresponding word (r0 ): (First approach) Hard Topic-Labeled Representations: • Relies on hard labeling by simply considering each word-topic pair as a separate vocabulary entry. • Use: - topic-sensitive representations for target words (input to the network) - unlabeled word representations for context words (output).

Topic-Labeled + Generic Word Representation Soft Topic-Labeled Representation (Second approach) Soft Topic-Labeled Representations: • Directly include the topic distributions estimated by HDP for each document. For each update, use the topic distribution to compute a weighted sum over the word-topic representations (r’’) T : the total number of topics di : the document containing wi p(τk |di ) : the probability assigned to topic τk by HDP in document di

Method - Phrase Dependency Parsing 10 Embeddings for Polysemous Words
• The representations obtained from this models are expected to capture the meaning of a word in different topics. • For comparison, included this method baseline (embeddings learned with Skipgram), Word2Vec (Mikolov et al., 2013b) and GloVe embeddings (Pennington et al., 2014) pre-trained on large data.

Experiments 11 • All word representations are learned on the
English Wikipedia corpus containing 4.8M documents. • Run HDP on the whole corpus to obtain the word-topic labeling (1st approach) and the document-level topic distributions (2nd approach) • Window size c = 10 and different embedding sizes (100, 300, 600) • Compare this models to several baselines: • Skipgram (SGE) • Multisense embeddings model per word type (MSSG) (Neelakantan et al., 2014). (All model are trained on the same training data with the same settings)

Experiments 12 Objective: Lexical Substitution Task • This task requires
one to identify the best replacements for a word in a sentential context. • The presence of many polysemous target words makes this task more suitable for evaluating sense embedding. • Used two evaluation sets: • LS-SE07 (McCarthy and Navigli, 2007) • LS-CIC (Kremer et al., 2014). • The evaluation is performed by computing the Generalized Average Precision (GAP) score

Experiments 13 • Run HDP on the evaluation set and
compute the similarity between target word wt and each substitution w s h(wτ s ) and h(wτ’t ) : the representations for substitution word s with topic τ and target word t with topic τ’ respectively. wc are context words of wt taken from a sliding window of the same size as the embeddings o(wc ) is the context (output) representation of wc, and C is the total number of context words.

Conclusions 14 • Approach to learn topic-sensitive word representations that
exploits the document-level context of words and does not require annotated data. • Obtain statistically significant improvements in the lexical substitution task while not using any syntactic information.

[Literature review] Learning Topic-Sensitive Wo...

[Literature review] Learning Topic-Sensitive Word Representations

vhqviet

More Decks by vhqviet

Featured

Transcript

Literature review: Marzieh Fadaee, Arianna Bisazza and Christof Monz. Proceedings

Abstract • Distributed word representations are widely used for modeling

Introduction • Topic Model: Hierarchical Dirichlet Process (HDP) • Applied

Introduction • Extent distributions over word senses can be approximated

Method - Phrase Dependency Parsing 5 Hard Topic-Labeled Representation Hard

Method - Phrase Dependency Parsing 6 Hard Topic-Labeled Representation Hard

Method - Phrase Dependency Parsing 7 Hard Topic-Labeled Representation Hard

Method - Phrase Dependency Parsing 8 Hard Topic-Labeled Representation Hard

Method - Phrase Dependency Parsing 9 Hard Topic-Labeled Representation Hard

Method - Phrase Dependency Parsing 10 Embeddings for Polysemous Words

Experiments 11 • All word representations are learned on the

Experiments 12 Objective: Lexical Substitution Task • This task requires

Experiments 13 • Run HDP on the evaluation set and

Conclusions 14 • Approach to learn topic-sensitive word representations that