predict a word using previous words (good in small models) • Skip-Gram: predict words which are close, from the context from an input word (good for big models) => • Pretty good performance (100 billions words/day in a single box) • 33 billions: 72% accuracy