Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An overview of word embeddings

Olivier Grisel
November 19, 2014

An overview of word embeddings

Olivier Grisel

November 19, 2014

More Decks by Olivier Grisel

Other Decks in Science


  1. Word Embeddings Inria Parietal Team Meeting November 2014 an overview

  2. Outline • Neural Word Models • word2vec & GloVe •

    Computing word analogies with word vectors • Applications, implementations & pre-trained models • Extensions: RNNs for Machine Translation
  3. Neural Language Models • Each word is represented by a

    fixed dimensional vector • Goal is to predict target word given ~5-10 words context from a random sentence in Wikipedia • Use NN-style training to optimize the vector coefficients typically with log-likelihood objective (bi-linear, deep or recurrent architectures)
  4. Trend in 2013 / 2014 • Simple linear models (word2vec)

    benefit from larger training data (1B+ words) and dimensions (typically 50-300) • Some models (GloVe) closer to matrix factorization than neural networks • Can successfully uncover semantic and syntactic word relationships from unlabeled corpora (wikipedia, Google News, Common Crawl).
  5. Analogies • [king] - [male] + [female] ~= [queen] •

    [Berlin] - [Germany] + [France] ~= [Paris] • [eating] - [eat] + [fly] ~= [flying]
  6. source: http://nlp.stanford.edu/projects/glove/

  7. source: http://nlp.stanford.edu/projects/glove/

  8. source: Exploiting Similarities among Languages for MT

  9. word2vec focus on skip-gram with negative sampling

  10. https://code.google.com/p/word2vec/

  11. None
  12. Likelihood objective but summing over W is too costly in

    practice: need approximation
  13. Negative Sampling in practice use k=15: 15 negative samples drawn

    for 1 positive example
  14. GloVe

  15. None
  16. GloVe Objective Xij = count of j in context of

    i (zero counts are ignored)
  17. Computing analogies

  18. Triplet queries • b* is to b what a* is

    to a • queen is to king what female is to male
  19. Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

  20. Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

  21. English Wikipedia models source: GloVe model eval by Yoav Goldberg

  22. Dealing with multi-word expressions (phrases) http://code.google.com/p/word2vec/source/browse/trunk/ questions-phrases.txt Do Several passes

    to extract 3-gram and 4-gram phrases. Then treat phrases in as new “words” to embed along with the unigrams. Score interesting bi-grams from counts and threshold:
  23. Example results w/ word2vec on phrases

  24. Implementations

  25. Reference implementations in C with command line interface Word2Vec: https://code.google.com/p/word2vec/

    GloVe: http://nlp.stanford.edu/projects/glove/
  26. word2vec commandline ./word2vec -train $CORPUS -size 300 -window 10 -hs

    0 \ -negative 15 -threads 20 -min-count 100 \ -output word-vecs -dumpcv context-vecs source: GloVe model eval by Yoav Goldberg
  27. GloVe command-line ./vocab_count -min-count 100 -verbose 2 < $CORPUS >

    $VOCAB_FILE ./cooccur -memory 40 -vocab-file $VOCAB_FILE -verbose 2 \ -window-size 10 < $CORPUS > $COOCCURRENCE_FILE ./shuffle -memory 40 -verbose 2 \ < $COOCCURRENCE_FILE \ > $COOCCURRENCE_SHUF_FILE ./glove -save-file $SAVE_FILE -threads 8 \ -input-file $COOCCURRENCE_SHUF_FILE \ -x-max 100 -iter 15 -vector-size 300 -binary 2 \ -vocab-file $VOCAB_FILE -verbose 2 -model 0 source: GloVe model eval by Yoav Goldberg
  28. Loading GloVe vectors in NumPy http://www-nlp.stanford.edu/data/glove.6B.300d.txt.gz

  29. Implementations in Python • gensim has most of word2vec (and

    GloVe is planned) • gensim also has: • Wikipedia corpus loader (markup cleaning) • similarity queries and evaluation tools • glove-python (work in progress, very active) • Both use Cython and multi-threading
  30. Neural Machine Translation

  31. RNN for MT source: Learning Phrase Representations using RNN Encoder-

    Decoder for Statistical Machine Translation
  32. RNN for MT Language independent, vector representation of the meaning

    of any sentence!
  33. Neural MT vs Phrase-based SMT BLEU scores of NMT &

    Phrase-SMT models on English / French (Oct. 2014)
  34. Thank you! http://speakerdeck.com/ogrisel

  35. References • Word embeddings (see references to main papers on

    each project page) First gen: http://metaoptimize.com/projects/wordreprs/ Word2Vec: https://code.google.com/p/word2vec/ GloVe: http://nlp.stanford.edu/projects/glove/ Word2Vec & GloVe both provide pre-trained embeddings on English datasets. • Relation to sparse and explicit representations Linguistic Regularities in Sparse and Explicit Word Representations by Omer Levy and Yoav Goldberg
  36. References • Python implementations • http://radimrehurek.com/gensim/ • https://github.com/maciejkula/glove-python

  37. References • Neural Machine Translation Google Brain: http://arxiv.org/abs/1409.3215 U. of

    Montreal: http://arxiv.org/abs/1406.1078 https://github.com/lisa-groundhog/GroundHog
  38. Backup slides

  39. None
  40. 3CosAdd

  41. 3CosMul Puts more emphasis on small similarities.

  42. Explicit Sparse Vector Representations • Extract contexts with offsets: “The

    cat sat on the mat.” c(sat) = {the_m2, cat_m1, on_p1, the_p2}
  43. None