networks (more on the training later) • Unsupervised • Published in 2013 by Google researchers and engineers • A companion C implementation was published with the paper
predict a word using previous words (good in small models) • Skip-Gram: predict words which are close, from the context from an input word (good for big models) => • Pretty good performance (100 billions words/day in a single box) • 33 billions: 72% accuracy