meaning of a single word with a low-dimensional real-valued vector l Static word embeddings (e.g. word2vec[4]) l Dynamic word embeddings (e.g. BERT[5]) l Word2vec l Be based on the assumption that words used in the same context have similar meanings l Can perform addition and subtraction l Can calculate similarity l Remove the effects of dynamic contexts 7 [4] T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, “Efficient Estimationof Word Representations in Vector Space,” In Proceedings of Workshopat ICLR, 2013. [5] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “Bert: Pre-trainingof Deep Bidirectional Transformers for Language Understanding”, InProceedings of the 2019 Conference of the North American Chapter ofthe Association for Computational Linguistics, Minneapolis, MN, USA,pp. 4171–4186, 2019.