Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Learning Word Representation

Towards Learning Word Representation

Magdalena Wiercioch

February 13, 2017
Tweet

Other Decks in Research

Transcript

  1. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Towards Learning Word Representation Magdalena Wiercioch Faculty of Mathematics and Computer Science Jagiellonian University, Poland e-mail: [email protected] TFML, Feb. 13, 2017 1/20 Towards Learning Word Representation Forward
  2. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Outline 1 Research motivation 2 Background 3 Model 4 Experiments 5 Conclusions 2/20 2/20 Towards Learning Word Representation Forward Back
  3. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Research motivation Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov: Enriching Word Vectors with Subword Information, arXiv, 2016. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean: Efficient estimation of word representations in vector space, arXiv, 2013. 3/20 3/20 Towards Learning Word Representation Forward Back
  4. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Background Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990. neural networks Hinrich Schütze: Dimensions of meaning. Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, 1992. N. Sakamoto, K. Yamamoto, and S. Nakagawa: Combination of syllable based n-gram search and word search for spoken term detection through spoken queries and iv/oov classification. Automatic Speech Recognition and Understanding (ASRU), 2015. 4/20 4/20 Towards Learning Word Representation Forward Back
  5. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Model We extended the method introduced by Bojanowski et al. The model demonstrated by Bojanowski is derived from continuous Skip-gram (SG) model proposed by Mikolov et al. 5/20 5/20 Towards Learning Word Representation Forward Back
  6. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Skip-gram model The goal of Skip-gram model is to find word representation that is useful for predicting the surrounding words in a corpus. Let us denote the sequence of training words - vocabulary, W = {w1, w2, . . . , wS}, where S is the size of vocabulary. Skip-gram model maximizes the average log probability l(W ) = S t=1 c∈Ct log p(wc |wt ), where Ct is the context. 6/20 6/20 Towards Learning Word Representation Forward Back
  7. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Skip-gram model 7/20 7/20 Towards Learning Word Representation Forward Back
  8. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Skip-gram model The probability of observing a context word wc given wt is parametrized using the word vectors. Given a scoring function s, which maps pairs of (word, context) to value in R, a possible choice to define the probability of a context word is the softmax. p(Context|Word) = yc = ewc wt S j=1 ew j wt , where wc , wt , wj are vector representations of words and yc is the output of the c-th neuron of the output layer. The parametrization for the scoring function is done by taking the scalar product between word and context embeddings: s(Word, Context) = wt wc . 8/20 8/20 Towards Learning Word Representation Forward Back
  9. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Subword model by Bojanowski et al. The Skip-gram model ignores the internal structure of words. They introduced a different scoring function s s(w, c) = g∈Gw zg vc , where Gw = {1, . . . , G} is the set of letter n-grams which appear in w. 9/20 9/20 Towards Learning Word Representation Forward Back
  10. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Subword model by Bojanowski et al. The Skip-gram model ignores the internal structure of words. They introduced a different scoring function s s(w, c) = g∈Gw zg vc , where Gw = {1, . . . , G} is the set of letter n-grams which appear in w. Limitations: n-grams with a length greater or equal than 3 and smaller or equal than 6 were considered. We claim it may be insufficient for short and rare words. 9/20 9/20 Towards Learning Word Representation Forward Back
  11. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Fragmentation model Let us denote by Gw = {1, . . . , G} the set of letter n-grams which appear in w and Hw = {1, . . . , H} to be the set of syllable n-grams which appear in w. 10/20 10/20 Towards Learning Word Representation Forward Back
  12. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Fragmentation model Let us denote by Gw = {1, . . . , G} the set of letter n-grams which appear in w and Hw = {1, . . . , H} to be the set of syllable n-grams which appear in w. We associate a vector representation zg to each letter n-gram g and a vector representation zh to each syllable n-gram h. 10/20 10/20 Towards Learning Word Representation Forward Back
  13. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Fragmentation model Let us denote by Gw = {1, . . . , G} the set of letter n-grams which appear in w and Hw = {1, . . . , H} to be the set of syllable n-grams which appear in w. We associate a vector representation zg to each letter n-gram g and a vector representation zh to each syllable n-gram h. The new word representation is considered as the direct concatenation of the two vector representations of its n-grams (letter and syllables) znew = [zg , zh ]. 10/20 10/20 Towards Learning Word Representation Forward Back
  14. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Fragmentation model Let us denote by Gw = {1, . . . , G} the set of letter n-grams which appear in w and Hw = {1, . . . , H} to be the set of syllable n-grams which appear in w. We associate a vector representation zg to each letter n-gram g and a vector representation zh to each syllable n-gram h. The new word representation is considered as the direct concatenation of the two vector representations of its n-grams (letter and syllables) znew = [zg , zh ]. The scoring function is s(w, c) = new∈Gw ∪Hw znew vc . The upgraded model makes use of n-grams of varied length n. 10/20 10/20 Towards Learning Word Representation Forward Back
  15. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Datasets We used benchmarks of three languages, i.e. English, German and Romanian. The data contains word pairs along with human-assigned similarity judgements. 11/20 11/20 Towards Learning Word Representation Forward Back
  16. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Settings We compared our approach with 5 baseline representations: a model based on recurrent neural network (RNNLM) from 2010, a method trained using Noise Contrastive Estimation (NCE), two log bilinear methods by Mikolov, i.e. Continuous Bag of Words (CBoW) and Skip-gram (SG), the model proposed by Bojanowski et al. (Ft). A context window of 6 words (both left and right) was used. 12/20 12/20 Towards Learning Word Representation Forward Back
  17. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Spearman’s correlation coefficient for the word similarity task. dataset RNNLM NCE CBoW Sg Ft our WS353 (en) 0.42 0.45 0.48 0.47 0.5 0.5 SimVerb-3500 (en) 0.44 0.46 0.44 0.47 0.47 0.47 Sim999 (en) 0.44 0.45 0.45 0.46 0.45 0.45 RG65 (en) 0.39 0.4 0.43 0.46 0.46 0.47 SGS130 (en) 0.45 0.48 0.5 0.49 0.5 0.5 YP130 (en) 0.43 0.45 0.44 0.47 0.48 0.48 Gur30 (ge) 0.45 0.46 0.49 0.51 0.51 0.51 Gur65 (ge) 0.45 0.47 0.52 0.54 0.54 0.55 ZG222 (ge) 0.5 0.53 0.53 0.55 0.56 0.56 RO353 (ro) 0.51 0.55 0.57 0.59 0.59 0.61 It enables to assess how well the given representations capture word similarity. Our method slightly outperformed the baseline models in 3 cases. 13/20 13/20 Towards Learning Word Representation Forward Back
  18. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Semantic analogies task results. The accuracy specified as %. dataset RNNLM NCE CBoW Sg Ft our WS353 (en) 15.3 24.2 0.23.8 28 27.5 27.5 SimVerb-3500 (en) 20.1 26.7 30.6 34.5 34.5 34 Sim999 (en) 18.3 21.2 29.8 24.3 24.8 24.8 RG65 (en) 29.7 35.2 39.1 42 42 42 SGS130 (en) 35.2 41.3 47 56.1 56.1 56.1 YP130 (en) 46.4 42.6 43.6 56.3 56.3 56.3 Gur30 (ge) 37.2 61.2 38.7 46.7 46.7 46.7 Gur65 (ge) 39.8 34.2 44.7 46.9 46 46 ZG222 (ge) 41.7 36.2 55.3 52.6 52.6 52.2 RO353 (ro) 43.9 50 46.6 60.4 60.1 60.1 14/20 14/20 Towards Learning Word Representation Forward Back
  19. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Syntactic analogies task results. The accuracy specified as %. dataset RNNLM NCE CBoW Sg Ft our WS353 (en) 24.7 30.2 33.5 40.9 40.2 40.2 SimVerb-3500 (en) 31.6 33.9 37.2 52 52 52 Sim999 (en) 26 32 55 49.8 49.3 49.5 RG65 (en) 35.6 40.2 40.7 48.9 48.9 48.9 SGS130 (en) 38.4 59 43.2 49.6 49.6 49.6 YP130 (en) 32.3 37.8 45.8 50.3 50.3 50.3 Gur30 (ge) 30.1 35.2 40.9 49.3 49.3 49.3 Gur65 (ge) 24 35.7 47.3 62.5 62.5 62.5 ZG222 (ge) 38.7 45.3 56.9 67.2 67.2 67.1 RO353 (ro) 30.6 41.7 59.2 53.1 53.1 53.1 15/20 15/20 Towards Learning Word Representation Forward Back
  20. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Semantic and syntactic analogies tasks results Our method did not overcome any competing model. It gave similar results to other Skip-gram based approaches. It may be worth to explore the method’s performance on more dense languages. 16/20 16/20 Towards Learning Word Representation Forward Back
  21. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions The plots of performance versus training epoch for word similarity task. Dataset: SimVerb-3500. All three models converge quickly to a satisfactory level of performance. Our approach yields more reliable results. 17/20 17/20 Towards Learning Word Representation Forward Back
  22. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Two dimensional projections of our method and Bojanowski-based (right) word representations. We projected the learned word representations into two dimensions using the t-SNE tool. All words were assigned to their groups correctly. 18/20 18/20 Towards Learning Word Representation Forward Back
  23. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Conclusions We have shown our method outperforms state-of-the-art approaches on dense languages when tasks such as word similarity ranking or syntactic and semantic analogies are taken into consideration. This research indicates that other methods of subword information retrieval should be investigated in depth. 19/20 19/20 Towards Learning Word Representation Forward Back
  24. Towards Learning Word Representation Magdalena Wiercioch Research motivation Background Model

    Experiments Conclusions Conclusions Thank you for your attention. 20/20 20/20 Towards Learning Word Representation Back