An overview of word embeddings

Word Embeddings Inria Parietal Team Meeting November 2014 an overview

Outline • Neural Word Models • word2vec & GloVe •
Computing word analogies with word vectors • Applications, implementations & pre-trained models • Extensions: RNNs for Machine Translation

Neural Language Models • Each word is represented by a
ﬁxed dimensional vector • Goal is to predict target word given ~5-10 words context from a random sentence in Wikipedia • Use NN-style training to optimize the vector coefﬁcients typically with log-likelihood objective (bi-linear, deep or recurrent architectures)

Trend in 2013 / 2014 • Simple linear models (word2vec)
beneﬁt from larger training data (1B+ words) and dimensions (typically 50-300) • Some models (GloVe) closer to matrix factorization than neural networks • Can successfully uncover semantic and syntactic word relationships from unlabeled corpora (wikipedia, Google News, Common Crawl).

Analogies • [king] - [male] + [female] ~= [queen] •
[Berlin] - [Germany] + [France] ~= [Paris] • [eating] - [eat] + [ﬂy] ~= [ﬂying]

source: http://nlp.stanford.edu/projects/glove/

source: Exploiting Similarities among Languages for MT

word2vec focus on skip-gram with negative sampling

https://code.google.com/p/word2vec/

Likelihood objective but summing over W is too costly in
practice: need approximation

Negative Sampling in practice use k=15: 15 negative samples drawn
for 1 positive example

GloVe Objective Xij = count of j in context of
i (zero counts are ignored)

Computing analogies

Triplet queries • b* is to b what a* is
to a • queen is to king what female is to male

Evaluation labels https://code.google.com/p/word2vec/source/browse/trunk/ questions-words.txt

English Wikipedia models source: GloVe model eval by Yoav Goldberg

Dealing with multi-word expressions (phrases) http://code.google.com/p/word2vec/source/browse/trunk/ questions-phrases.txt Do Several passes
to extract 3-gram and 4-gram phrases. Then treat phrases in as new “words” to embed along with the unigrams. Score interesting bi-grams from counts and threshold:

Example results w/ word2vec on phrases

Implementations

Reference implementations in C with command line interface Word2Vec: https://code.google.com/p/word2vec/
GloVe: http://nlp.stanford.edu/projects/glove/

word2vec commandline ./word2vec -train $CORPUS -size 300 -window 10 -hs
0 \ -negative 15 -threads 20 -min-count 100 \ -output word-vecs -dumpcv context-vecs source: GloVe model eval by Yoav Goldberg

GloVe command-line ./vocab_count -min-count 100 -verbose 2 < $CORPUS >
$VOCAB_FILE ./cooccur -memory 40 -vocab-file $VOCAB_FILE -verbose 2 \ -window-size 10 < $CORPUS > $COOCCURRENCE_FILE ./shuffle -memory 40 -verbose 2 \ < $COOCCURRENCE_FILE \ > $COOCCURRENCE_SHUF_FILE ./glove -save-file $SAVE_FILE -threads 8 \ -input-file $COOCCURRENCE_SHUF_FILE \ -x-max 100 -iter 15 -vector-size 300 -binary 2 \ -vocab-file $VOCAB_FILE -verbose 2 -model 0 source: GloVe model eval by Yoav Goldberg

Loading GloVe vectors in NumPy http://www-nlp.stanford.edu/data/glove.6B.300d.txt.gz

Implementations in Python • gensim has most of word2vec (and
GloVe is planned) • gensim also has: • Wikipedia corpus loader (markup cleaning) • similarity queries and evaluation tools • glove-python (work in progress, very active) • Both use Cython and multi-threading

Neural Machine Translation

RNN for MT source: Learning Phrase Representations using RNN Encoder-
Decoder for Statistical Machine Translation

RNN for MT Language independent, vector representation of the meaning
of any sentence!

Neural MT vs Phrase-based SMT BLEU scores of NMT &
Phrase-SMT models on English / French (Oct. 2014)

Thank you! http://speakerdeck.com/ogrisel

References • Word embeddings (see references to main papers on
each project page) First gen: http://metaoptimize.com/projects/wordreprs/ Word2Vec: https://code.google.com/p/word2vec/ GloVe: http://nlp.stanford.edu/projects/glove/ Word2Vec & GloVe both provide pre-trained embeddings on English datasets. • Relation to sparse and explicit representations Linguistic Regularities in Sparse and Explicit Word Representations by Omer Levy and Yoav Goldberg

References • Python implementations • http://radimrehurek.com/gensim/ • https://github.com/maciejkula/glove-python

References • Neural Machine Translation Google Brain: http://arxiv.org/abs/1409.3215 U. of
Montreal: http://arxiv.org/abs/1406.1078 https://github.com/lisa-groundhog/GroundHog

Backup slides

3CosAdd

3CosMul Puts more emphasis on small similarities.

Explicit Sparse Vector Representations • Extract contexts with offsets: “The
cat sat on the mat.” c(sat) = {the_m2, cat_m1, on_p1, the_p2}

An overview of word embeddings

An overview of word embeddings

More Decks by Olivier Grisel

Other Decks in Science

Featured

Transcript