Upgrade to Pro — share decks privately, control downloads, hide ads and more …

word2vec

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

 word2vec

Avatar for Javier Honduvilla Coto

Javier Honduvilla Coto

November 24, 2016
Tweet

More Decks by Javier Honduvilla Coto

Other Decks in Programming

Transcript

  1. What’s word2vec? • Vector representation of words • Uses neural

    networks (more on the training later) • Unsupervised • Published in 2013 by Google researchers and engineers • A companion C implementation was published with the paper
  2. Why? Image and video representation is pretty rich, usually done

    with humongous vectors – commonly having a high dimensionality. Meanwhile, words are usually mapped to arbitrary IDs such as the word itself.
  3. Previous work • Counting based methods: probability of a word

    happening with some neighbour words • Predictive models: guess using nearby words’ vectors
  4. Cool things of this model • Continuous Bag of Words:

    predict a word using previous words (good in small models) • Skip-Gram: predict words which are close, from the context from an input word (good for big models) => • Pretty good performance (100 billions words/day in a single box) • 33 billions: 72% accuracy