that result from models like Word2Vec or Fasttext were the most powerful ones • Each word has a fixed representation under Word2Vec regardless of the context • BERT produces word representations that are dynamically informed by the words around [source for the picture]
co. (How NLP Cracked Transfer Learning) [Blog post]. • Alammar, Jay (2019). A Visual Guide to Using BERT for the First Time [Blog post]. • Chris McCormick (2019). BERT Word Embeddings Tutorial [Blog post].