An overview of word embeddings

Slide 1

Slide 1 text

Word Embeddings Inria Parietal Team Meeting November 2014 an overview

Slide 2

Slide 2 text

Outline • Neural Word Models • word2vec & GloVe • Computing word analogies with word vectors • Applications, implementations & pre-trained models • Extensions: RNNs for Machine Translation

Slide 3

Slide 3 text

Neural Language Models • Each word is represented by a ﬁxed dimensional vector • Goal is to predict target word given ~5-10 words context from a random sentence in Wikipedia • Use NN-style training to optimize the vector coefﬁcients typically with log-likelihood objective (bi-linear, deep or recurrent architectures)

Slide 4

Slide 4 text

Trend in 2013 / 2014 • Simple linear models (word2vec) beneﬁt from larger training data (1B+ words) and dimensions (typically 50-300) • Some models (GloVe) closer to matrix factorization than neural networks • Can successfully uncover semantic and syntactic word relationships from unlabeled corpora (wikipedia, Google News, Common Crawl).

Slide 5

Slide 5 text

Analogies • [king] - [male] + [female] ~= [queen] • [Berlin] - [Germany] + [France] ~= [Paris] • [eating] - [eat] + [ﬂy] ~= [ﬂying]

Slide 6

Slide 6 text

source: http://nlp.stanford.edu/projects/glove/

Slide 7

Slide 7 text

source: http://nlp.stanford.edu/projects/glove/

Slide 8

Slide 8 text

source: Exploiting Similarities among Languages for MT

Slide 9

Slide 9 text

word2vec focus on skip-gram with negative sampling

Slide 10

Slide 10 text

https://code.google.com/p/word2vec/

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Likelihood objective but summing over W is too costly in practice: need approximation

Slide 13

Slide 13 text

Negative Sampling in practice use k=15: 15 negative samples drawn for 1 positive example

Slide 14

Slide 14 text

GloVe

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

GloVe Objective Xij = count of j in context of i (zero counts are ignored)

Slide 17

Slide 17 text

Computing analogies

Slide 18

Slide 18 text

Triplet queries • b* is to b what a* is to a • queen is to king what female is to male

Slide 19

Slide 19 text

Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

Slide 20

Slide 20 text

Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

Slide 21

Slide 21 text

English Wikipedia models source: GloVe model eval by Yoav Goldberg

Slide 22

Slide 22 text

Dealing with multi-word expressions (phrases) http://code.google.com/p/word2vec/source/browse/trunk/ questions-phrases.txt Do Several passes to extract 3-gram and 4-gram phrases. Then treat phrases in as new “words” to embed along with the unigrams. Score interesting bi-grams from counts and threshold:

Slide 23

Slide 23 text

Example results w/ word2vec on phrases

Slide 24

Slide 24 text

Implementations

Slide 25

Slide 25 text

Reference implementations in C with command line interface Word2Vec: https://code.google.com/p/word2vec/ GloVe: http://nlp.stanford.edu/projects/glove/

Slide 26

Slide 26 text

word2vec commandline ./word2vec -train $CORPUS -size 300 -window 10 -hs 0 \ -negative 15 -threads 20 -min-count 100 \ -output word-vecs -dumpcv context-vecs source: GloVe model eval by Yoav Goldberg

Slide 27

Slide 27 text

GloVe command-line ./vocab_count -min-count 100 -verbose 2 < $CORPUS > $VOCAB_FILE ./cooccur -memory 40 -vocab-file $VOCAB_FILE -verbose 2 \ -window-size 10 < $CORPUS > $COOCCURRENCE_FILE ./shuffle -memory 40 -verbose 2 \ < $COOCCURRENCE_FILE \ > $COOCCURRENCE_SHUF_FILE ./glove -save-file $SAVE_FILE -threads 8 \ -input-file $COOCCURRENCE_SHUF_FILE \ -x-max 100 -iter 15 -vector-size 300 -binary 2 \ -vocab-file $VOCAB_FILE -verbose 2 -model 0 source: GloVe model eval by Yoav Goldberg

Slide 28

Slide 28 text

Loading GloVe vectors in NumPy http://www-nlp.stanford.edu/data/glove.6B.300d.txt.gz

Slide 29

Slide 29 text

Implementations in Python • gensim has most of word2vec (and GloVe is planned) • gensim also has: • Wikipedia corpus loader (markup cleaning) • similarity queries and evaluation tools • glove-python (work in progress, very active) • Both use Cython and multi-threading

Slide 30

Slide 30 text

Neural Machine Translation

Slide 31

Slide 31 text

RNN for MT source: Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation

Slide 32

Slide 32 text

RNN for MT Language independent, vector representation of the meaning of any sentence!

Slide 33

Slide 33 text

Neural MT vs Phrase-based SMT BLEU scores of NMT & Phrase-SMT models on English / French (Oct. 2014)

Slide 34

Slide 34 text

Thank you! http://speakerdeck.com/ogrisel

Slide 35

Slide 35 text

References • Word embeddings (see references to main papers on each project page) First gen: http://metaoptimize.com/projects/wordreprs/ Word2Vec: https://code.google.com/p/word2vec/ GloVe: http://nlp.stanford.edu/projects/glove/ Word2Vec & GloVe both provide pre-trained embeddings on English datasets. • Relation to sparse and explicit representations Linguistic Regularities in Sparse and Explicit Word Representations by Omer Levy and Yoav Goldberg

Slide 36

Slide 36 text

References • Python implementations • http://radimrehurek.com/gensim/ • https://github.com/maciejkula/glove-python

Slide 37

Slide 37 text

References • Neural Machine Translation Google Brain: http://arxiv.org/abs/1409.3215 U. of Montreal: http://arxiv.org/abs/1406.1078 https://github.com/lisa-groundhog/GroundHog

Slide 38

Slide 38 text

Backup slides

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

3CosAdd

Slide 41

Slide 41 text

3CosMul Puts more emphasis on small similarities.

Slide 42

Slide 42 text

Explicit Sparse Vector Representations • Extract contexts with offsets: “The cat sat on the mat.” c(sat) = {the_m2, cat_m1, on_p1, the_p2}

Slide 43

Slide 43 text

No content