Slide 4
Slide 4 text
Trend in 2013 / 2014
• Simple linear models (word2vec) benefit from
larger training data (1B+ words) and dimensions
(typically 50-300)
• Some models (GloVe) closer to matrix factorization
than neural networks
• Can successfully uncover semantic and syntactic
word relationships from unlabeled corpora
(wikipedia, Google News, Common Crawl).