Upgrade to Pro — share decks privately, control downloads, hide ads and more …

word2vec

 word2vec

Javier Honduvilla Coto

November 24, 2016
Tweet

More Decks by Javier Honduvilla Coto

Other Decks in Programming

Transcript

  1. What’s word2vec? • Vector representation of words • Uses neural

    networks (more on the training later) • Unsupervised • Published in 2013 by Google researchers and engineers • A companion C implementation was published with the paper
  2. Why? Image and video representation is pretty rich, usually done

    with humongous vectors – commonly having a high dimensionality. Meanwhile, words are usually mapped to arbitrary IDs such as the word itself.
  3. Previous work • Counting based methods: probability of a word

    happening with some neighbour words • Predictive models: guess using nearby words’ vectors
  4. Cool things of this model • Continuous Bag of Words:

    predict a word using previous words (good in small models) • Skip-Gram: predict words which are close, from the context from an input word (good for big models) => • Pretty good performance (100 billions words/day in a single box) • 33 billions: 72% accuracy