that result from models like Word2Vec or Fasttext were the most powerful ones • Each word has a fixed representation under Word2Vec regardless of the context • BERT produces word representations that are dynamically informed by the words around [source for the picture]