Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An overview of word embeddings

Aee56554ec30edfd680e1c937ed4e54d?s=47 Olivier Grisel
November 19, 2014

An overview of word embeddings


Olivier Grisel

November 19, 2014


  1. Word Embeddings Inria Parietal Team Meeting November 2014 an overview

  2. Outline • Neural Word Models • word2vec & GloVe •

    Computing word analogies with word vectors • Applications, implementations & pre-trained models • Extensions: RNNs for Machine Translation
  3. Neural Language Models • Each word is represented by a

    fixed dimensional vector • Goal is to predict target word given ~5-10 words context from a random sentence in Wikipedia • Use NN-style training to optimize the vector coefficients typically with log-likelihood objective (bi-linear, deep or recurrent architectures)
  4. Trend in 2013 / 2014 • Simple linear models (word2vec)

    benefit from larger training data (1B+ words) and dimensions (typically 50-300) • Some models (GloVe) closer to matrix factorization than neural networks • Can successfully uncover semantic and syntactic word relationships from unlabeled corpora (wikipedia, Google News, Common Crawl).
  5. Analogies • [king] - [male] + [female] ~= [queen] •

    [Berlin] - [Germany] + [France] ~= [Paris] • [eating] - [eat] + [fly] ~= [flying]
  6. source: http://nlp.stanford.edu/projects/glove/

  7. source: http://nlp.stanford.edu/projects/glove/

  8. source: Exploiting Similarities among Languages for MT

  9. word2vec focus on skip-gram with negative sampling

  10. https://code.google.com/p/word2vec/

  11. None
  12. Likelihood objective but summing over W is too costly in

    practice: need approximation
  13. Negative Sampling in practice use k=15: 15 negative samples drawn

    for 1 positive example
  14. GloVe

  15. None
  16. GloVe Objective Xij = count of j in context of

    i (zero counts are ignored)
  17. Computing analogies

  18. Triplet queries • b* is to b what a* is

    to a • queen is to king what female is to male
  19. Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

  20. Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

  21. English Wikipedia models source: GloVe model eval by Yoav Goldberg

  22. Dealing with multi-word expressions (phrases) http://code.google.com/p/word2vec/source/browse/trunk/ questions-phrases.txt Do Several passes

    to extract 3-gram and 4-gram phrases. Then treat phrases in as new “words” to embed along with the unigrams. Score interesting bi-grams from counts and threshold:
  23. Example results w/ word2vec on phrases

  24. Implementations

  25. Reference implementations in C with command line interface Word2Vec: https://code.google.com/p/word2vec/

    GloVe: http://nlp.stanford.edu/projects/glove/
  26. word2vec commandline ./word2vec -train $CORPUS -size 300 -window 10 -hs

    0 \ -negative 15 -threads 20 -min-count 100 \ -output word-vecs -dumpcv context-vecs source: GloVe model eval by Yoav Goldberg
  27. GloVe command-line ./vocab_count -min-count 100 -verbose 2 < $CORPUS >

    $VOCAB_FILE ./cooccur -memory 40 -vocab-file $VOCAB_FILE -verbose 2 \ -window-size 10 < $CORPUS > $COOCCURRENCE_FILE ./shuffle -memory 40 -verbose 2 \ < $COOCCURRENCE_FILE \ > $COOCCURRENCE_SHUF_FILE ./glove -save-file $SAVE_FILE -threads 8 \ -input-file $COOCCURRENCE_SHUF_FILE \ -x-max 100 -iter 15 -vector-size 300 -binary 2 \ -vocab-file $VOCAB_FILE -verbose 2 -model 0 source: GloVe model eval by Yoav Goldberg
  28. Loading GloVe vectors in NumPy http://www-nlp.stanford.edu/data/glove.6B.300d.txt.gz

  29. Implementations in Python • gensim has most of word2vec (and

    GloVe is planned) • gensim also has: • Wikipedia corpus loader (markup cleaning) • similarity queries and evaluation tools • glove-python (work in progress, very active) • Both use Cython and multi-threading
  30. Neural Machine Translation

  31. RNN for MT source: Learning Phrase Representations using RNN Encoder-

    Decoder for Statistical Machine Translation
  32. RNN for MT Language independent, vector representation of the meaning

    of any sentence!
  33. Neural MT vs Phrase-based SMT BLEU scores of NMT &

    Phrase-SMT models on English / French (Oct. 2014)
  34. Thank you! http://speakerdeck.com/ogrisel

  35. References • Word embeddings (see references to main papers on

    each project page) First gen: http://metaoptimize.com/projects/wordreprs/ Word2Vec: https://code.google.com/p/word2vec/ GloVe: http://nlp.stanford.edu/projects/glove/ Word2Vec & GloVe both provide pre-trained embeddings on English datasets. • Relation to sparse and explicit representations Linguistic Regularities in Sparse and Explicit Word Representations by Omer Levy and Yoav Goldberg
  36. References • Python implementations • http://radimrehurek.com/gensim/ • https://github.com/maciejkula/glove-python

  37. References • Neural Machine Translation Google Brain: http://arxiv.org/abs/1409.3215 U. of

    Montreal: http://arxiv.org/abs/1406.1078 https://github.com/lisa-groundhog/GroundHog
  38. Backup slides

  39. None
  40. 3CosAdd

  41. 3CosMul Puts more emphasis on small similarities.

  42. Explicit Sparse Vector Representations • Extract contexts with offsets: “The

    cat sat on the mat.” c(sat) = {the_m2, cat_m1, on_p1, the_p2}
  43. None