Introduction to Word Embeddings

Introduction to Word Embeddings

France is to Paris like Czechia is to _______. If I ask you to fill the blank, you would answer "Prague" right away even without me giving a clue such as "the answer is a capital city". Our existing knowledge enables us to determine that France and Paris has a Country - Capital City connection and Czechia's capital city of Prague, so that must be the answer.

However, computers don't know that Prague belongs to the same "category" as Paris and other capital cities unless we tell them so. If we want to get computers to understand human language as well as we do, there are way too many things that we need to teach computers explicitly. Is there a better way?

With word embeddings, we represent words by a series of numbers. This opens up a whole new world for computers because now they can understand the context of a word and infer relationships between words using numbers and maths—the language they are proficient in. We'll delve into the details of what word embeddings actually are and why we need them, popular word embedding models, what problems you can solve using word embeddings, and how you can use word embeddings with Python.

Jupyter Notebook is available here: https://github.com/galuhsahid/intro-to-word-embeddings

2b6d7bdd43058e87f53866eb86538a59?s=128

Galuh Sahid

June 14, 2019
Tweet

Transcript

  1. Introduction to Word Embeddings Galuh Sahid @galuhsahid | github.com/galuhsahid

  2. https://unsplash.com/photos/q0AtbGIOb5k @galuhsahid

  3. @galuhsahid

  4. How do we represent words? @galuhsahid

  5. Language is hard @galuhsahid

  6. Lake Forest Ocean River @galuhsahid

  7. Lake Forest Ocean River @galuhsahid

  8. Lake Forest Ocean River Body of water Not body of

    water Body of water Body of water @galuhsahid
  9. Lake Forest Ocean River Doesn’t have trees Has trees Doesn’t

    have trees Doesn’t have trees @galuhsahid
  10. WordNet https://www.nltk.org/images/wordnet-hierarchy.png @galuhsahid

  11. What kind of representation do we want? • Real numbers

    • What do we want to know about a word? Whether they have the same meaning, semantic relationship… etc. • Can we do it without labelling everything manually? • Ideally it’s not too large! @galuhsahid
  12. Word Embeddings to the Rescue Represent words as vectors of

    real numbers with much lower and thus denser dimensions @galuhsahid We’re putting words that are outside any vector space into a vector space - hence, we’re embedding the words into that vector space
  13. Lake [ 0.89254 , 2.3112 , -0.70036 , 0.76679 ,

    -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] (Spoiler alert) @galuhsahid
  14. Visualization Lake Ocean River Forest Pizza @galuhsahid

  15. Visualization Lake Ocean River Forest Pizza Maybe this dimension represents

    the concept of whether it is a food or not… @galuhsahid
  16. Visualization Lake Ocean River Forest Pizza Maybe this dimension represents

    the concept of whether it is a food or not… Or it could be something not intuitive to us - we actually have no idea, though. It could be anything @galuhsahid
  17. One-Hot Encoding Our vocabulary: lake, forest, ocean, river, pizza Lake

    1 0 0 0 0 Forest 0 1 0 0 0 Ocean 0 0 1 0 0 River 0 0 0 1 0 Pizza 0 0 0 0 1 Size: |V| = 5 0 1 2 3 4 0 1 2 3 4 @galuhsahid
  18. One-Hot Encoding Our vocabulary: every English word (approx. 171,476 words

    in use) Lake 0 0 0 … 1 … 0 0 Size: |V| = ~171.476 Aardvark 1 0 0 … 0 … 0 0 Zyzzogeton 0 0 0 … 0 … 0 1 https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/ … … Size: |V| = ~171.476 @galuhsahid
  19. Distributional Representation “Tell me who your friends are, and I’ll

    tell you who you are.” @galuhsahid
  20. Distributional Representation “Tell me who your friends are, and I’ll

    tell you who you are.” “You shall know a word by the company it keeps.” @galuhsahid (Firth, 1957)
  21. Distributional Representation “Tell me who your friends are, and I’ll

    tell you who you are.” “You shall know a word by the company it keeps.” Distributional Hypothesis (Harris, 1954) Words that occur in similar contexts have similar meaning @galuhsahid (Firth, 1957)
  22. Distributional Representation A lake is a large body of water

    in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid
  23. Distributional Representation A lake is a large body of water

    in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid
  24. Distributional Representation A lake is a large body of water

    in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid
  25. No manual annotation! @galuhsahid

  26. Approaches W O R D E M B E D

    D I N G S • Count-based methods • Computes how often a word co-occurs with its neighbour words, then map the counts to a small, dense vector @galuhsahid
  27. Count-based Lake large body water body land. Ocean large area

    water continents. River stream water flows channel surface ground. Forest piece land many trees. Pizza type food created Italy. [large, body, water] [large, area, water] [stream, water, flows] [piece, land, many] [type, food, created] window = 4 Neighbor words A P P R O A C H E S @galuhsahid
  28. Lake Ocean River Forest Pizza [large, body, water] [large, area,

    water] [stream, water, flows] [piece, land, many] [type, food, created] Neighbor words Large Body Water Area Stream Flows Piece Land Many 1 1 1 0 0 0 0 0 0 0 0 0 Type Food Created 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Count-based A P P R O A C H E S @galuhsahid
  29. Approaches W O R D E M B E D

    D I N G S • Count-based methods • Computes how often a word co-occurs with its neighbour words, then map the counts to a small, dense vector • Reduce dimensions using Singular Vector Decomposition (SVD) or Latent Dirichlet Allocation (LDA) @galuhsahid
  30. Approaches W O R D E M B E D

    D I N G S • Predictive methods • Try to predict a word from its neighbors in terms of small and denser embedding vectors B a r o n i , M . , D i n u , G . , & K r u s z e w s k i , G . ( 2 0 1 4 ) . D o n ' t c o u n t , p r e d i c t ! A s y s t e m a t i c c o m p a r i s o n o f c o n t e x t - c o u n t i n g v s . c o n t e x t - p r e d i c t i n g s e m a n t i c v e c t o r s . I n P r o c e e d i n g s o f t h e 5 2 n d A n n u a l M e e t i n g o f t h e A s s o c i a t i o n f o r C o m p u t a t i o n a l L i n g u i s t i c s ( V o l u m e 1 : L o n g P a p e r s ) ( V o l . 1 , p p . 2 3 8 - 2 4 7 ) . @galuhsahid
  31. Predictive Methods • Word2Vec • Mikolov, T., Chen, K., Corrado,

    G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • GloVe • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). @galuhsahid
  32. Predictive Methods • FastText • https://research.fb.com/downloads/fasttext/ • ELMo • Peters,

    M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. @galuhsahid
  33. Predictive Methods • Word2Vec • Mikolov, T., Chen, K., Corrado,

    G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • GloVe • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). @galuhsahid
  34. Architecture W O R D 2 V E C •

    Continuous Bag-of-Words (CBOW) • Skip-gram @galuhsahid
  35. Skip-gram W O R D 2 V E C -

    S K I P - G R A M pizza [ 0.89254 , 2.3112 , -0.70036 , 0.76679 , -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] @galuhsahid
  36. Skip-gram W O R D 2 V E C -

    S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner”
  37. Skip-gram W O R D 2 V E C -

    S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner” Window size = 5
  38. Skip-gram W O R D 2 V E C -

    S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner” Window size = 5 Neighbour words: the, leftover, for, dinner
  39. Overview W O R D 2 V E C -

    S K I P - G R A M Output leftover for pizza Projection @galuhsahid Input the dinner • “I ate the leftover pizza for dinner”
  40. Skip-gram W O R D 2 V E C -

    S K I P - G R A M @galuhsahid
  41. Overview W O R D 2 V E C -

    S K I P - G R A M Output leftover for pizza Projection @galuhsahid Input the dinner • “I ate the leftover pizza for dinner”
  42. Architecture W O R D 2 V E C -

    S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 …
  43. Architecture W O R D 2 V E C -

    S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V
  44. Architecture W O R D 2 V E C -

    S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V = vocabulary size
  45. Architecture W O R D 2 V E C -

    S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V = vocabulary size = number of dimensions
  46. Projection Layer W O R D 2 V E C

    - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid
  47. Projection Layer W O R D 2 V E C

    - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid
  48. Architecture W O R D 2 V E C -

    S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V Output leftover for the dinner (Softmax)
  49. Projection Layer W O R D 2 V E C

    - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid
  50. @galuhsahid

  51. Word Vector W O R D 2 V E C

    - S K I P - G R A M … 0.89 2.31 -0.52 Pizza This is our word vector! @galuhsahid
  52. Word Vector W O R D 2 V E C

    - S K I P - G R A M … 0.89 2.31 -0.52 Pizza This is our word vector! [ 0.89254 , 2.3112 , -0.70036 , 0.76679 , -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] @galuhsahid
  53. The Intuition W O R D 2 V E C

    - S K I P - G R A M … 0.89 2.31 -0.52 Pizza the leftover for dinner … 0.76 2.01 -0.47 Chicken some leftover recipes for @galuhsahid “I ate the leftover pizza for dinner” “I need some leftover chicken recipes for dinner”
  54. The Intuition W O R D 2 V E C

    - S K I P - G R A M … 0.89 2.31 -0.52 Pizza the leftover for dinner … 0.76 2.01 -0.47 Chicken some leftover recipes for @galuhsahid … 0.32 0.43 -0.21 Prague embassy in is located “I ate the leftover pizza for dinner” “I need some leftover chicken recipes for dinner” “The Germany embassy in Prague is located in…”
  55. Architecture W O R D 2 V E C -

    S K I P - G R A M @galuhsahid M i k o l o v, T. , C h e n , K . , C o r r a d o , G . , & D e a n , J . ( 2 0 1 3 ) . E f f i c i e n t e s t i m a t i o n o f w o r d r e p r e s e n t a t i o n s i n v e c t o r s p a c e . a r X i v p r e p r i n t a r X i v : 1 3 0 1 . 3 7 8 1 . More details:
  56. Exploring Word Embeddings with Gensim @galuhsahid

  57. Pre-trained Models E X P L O R I N

    G W O R D E M B E D D I N G S • Gensim has an API to download pre-trained word embedding models. The list of available models can be found here. @galuhsahid
  58. Loading the Model E X P L O R I

    N G W O R D E M B E D D I N G S model_w2v = KeyedVectors.load_word2vec_format( './GoogleNews-vectors-negative300.bin', binary=True) @galuhsahid from gensim.models import KeyedVectors
  59. Word Vector E X P L O R I N

    G W O R D E M B E D D I N G S model_w2v[“lake”] array([-8.39843750e-02, 2.02148438e-01, 2.65625000e-01, 1.04980469e-01, -7.95898438e-02, 1.05957031e-01, -5.39550781e-02, 8.11767578e-03, 9.32617188e-02, -7.66601562e-02, 1.56250000e-01, -1.19628906e-01, … -4.15039062e-02, 4.08935547e-03, -2.47070312e-01, -1.78710938e-01, 3.33984375e-01, -1.79687500e-01], dtype=float32) @galuhsahid
  60. Similar Words E X P L O R I N

    G W O R D E M B E D D I N G S model_w2v.most_similar("apple") [('apples', 0.7203598022460938), ('pear', 0.6450696587562561), ('fruit', 0.6410146355628967), ('berry', 0.6302294731140137), ('pears', 0.6133961081504822), ('strawberry', 0.6058261394500732), ('peach', 0.6025873422622681), ('potato', 0.596093475818634), ('grape', 0.5935864448547363), ('blueberry', 0.5866668224334717)] • The distance is calculated using cosine similarity @galuhsahid • Similar words are nearby vectors in a vector space
  61. Get Similarity E X P L O R I N

    G W O R D E M B E D D I N G S model_w2v.similarity("apple", "mango") 0.57518554 @galuhsahid
  62. Odd One Out E X P L O R I

    N G W O R D E M B E D D I N G S model_w2v.doesnt_match(["lake", "forest", "ocean", "river"]) 'forest' @galuhsahid
  63. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["uncle", "woman"], negative=["man"]) • man to uncle is woman to ... @galuhsahid
  64. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["uncle", "woman"], negative=["man"]) [('aunt', 0.8022665977478027), ('mother', 0.7770732045173645), ('niece', 0.768424928188324), ('father', 0.7237852811813354), ('grandmother', 0.722037136554718), ('daughter', 0.7185647487640381), ('sister', 0.7006258368492126), ('husband', 0.6982548236846924), ('granddaughter', 0.6858304738998413), ('nephew', 0.6710714101791382)] • man to uncle is woman to ... Words that are similar to uncle and woman but dissimilar to man uncle + woman - man @galuhsahid
  65. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["Berlin", “France”], negative=["Germany"]) [('Paris', 0.7672388553619385), ('French', 0.6049168109893799), ('Parisian', 0.5810437202453613), ('Colombes', 0.5599985718727112), ('Hopital_Europeen_Georges_Pompidou', 0.555890679359436), ('Melun', 0.551270067691803), ('Dinard', 0.5451847314834595), ('Brussels', 0.5420989990234375), ('Mairie_de', 0.5337448120117188), ('Cagnes_sur_Mer', 0.531246542930603)] • Germany to Berlin is France to... @galuhsahid
  66. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["Berlin", “France”], negative=["Germany"]) • Germany to Berlin is France to... @galuhsahid
  67. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["run", "walking"], negative=["running"]) @galuhsahid • Running to run is walking to…
  68. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S model_w2v.most_similar(positive=["run", "walking"], negative=["running"]) [('walk', 0.7163699865341187), ('walks', 0.5965700745582581), ('walked', 0.5833066701889038), ('stroll', 0.5236037969589233), ('pinch_hitter_Yunel_Escobar', 0.4562637209892273), ('Walking', 0.455409437417984), ('Batterymate_Miguel_Olivo', 0.4483090043067932), ('runs', 0.4462803602218628), ('pinch_hitter_Carlos_Guillen', 0.4402925372123718), ('Justin_Speier_relieved', 0.43528205156326294)] @galuhsahid • Running to run is walking to…
  69. Analogies E X P L O R I N G

    W O R D E M B E D D I N G S https://www.tensorflow.org/tutorials/representation/word2vec @galuhsahid
  70. h t t p : / / b i o

    n l p - w w w . u t u . f i / w v _ d e m o / @galuhsahid
  71. h t t p s : / / i n

    d o n e s i a n - w o r d - e m b e d d i n g . h e r o k u a p p . c o m h t t p : / / g i t h u b . c o m / g a l u h s a h i d / i n d o n e s i a n - w o r d - e m b e d d i n g @galuhsahid
  72. Training Your Own • Sure you can! • When to

    do so? • Specific problem domains • Challenge: training data • Another alternative: continue training a pre-existing word embedding E X P L O R I N G W O R D E M B E D D I N G S @galuhsahid
  73. Search A P P L I C A T I

    O N @galuhsahid
  74. Neural Machine Translation A P P L I C A

    T I O N Q i , Y. , S a c h a n , D . S . , F e l i x , M . , P a d m a n a b h a n , S . J . , & N e u b i g , G . ( 2 0 1 8 ) . W h e n a n d w h y a r e p r e - t r a i n e d w o r d e m b e d d i n g s u s e f u l f o r n e u r a l m a c h i n e t r a n s l a t i o n ? . a r X i v p r e p r i n t a r X i v : 1 8 0 4 . 0 6 3 2 3 . @galuhsahid
  75. Recommendation Engine A P P L I C A T

    I O N https://towardsdatascience.com/using-word2vec-for-music-recommendations-bb9649ac2484 @galuhsahid
  76. Out-of-vocabulary Words C H A L L E N G

    E • Word2vec doesn’t handle this • FastText handles this because it trains n-grams instead of words breaking down each word into n-grams • ELMo also handles this because it trains the model on character-level amazing -> <am>, <ama>, <maz>, <azi>, <zin>, <ing>, <ng> V1 V2 V3 V4 V5 V6 V7 min_n = max_n = 3 amazin -> <am>, <ama>, <maz>, <azi>, <zin>, <in> V1 V2 V3 V4 V5 V9 @galuhsahid
  77. Polysemy C H A L L E N G E

    The word “rock” @galuhsahid
  78. Polysemy C H A L L E N G E

    The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A @galuhsahid
  79. Polysemy C H A L L E N G E

    The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A @galuhsahid
  80. Polysemy C H A L L E N G E

    The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A https://www.muscleandfitness.com/workouts/athletecelebrity- workouts/dwayne-rock-johnsons-shoulder-workout @galuhsahid
  81. Polysemy C H A L L E N G E

    He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday @galuhsahid
  82. Polysemy C H A L L E N G E

    He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday Same word, different meaning @galuhsahid
  83. Polysemy C H A L L E N G E

    He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday Same word, different meaning @galuhsahid More recent word models such as ELMo and BERT will assign different word vectors for the word “bank” because they appear in different contexts
  84. Bias C H A L L E N G E

    https://twitter.com/zeynep/status/799662089740681217 @galuhsahid
  85. Bias C H A L L E N G E

    @galuhsahid
  86. Bias C H A L L E N G E

    “Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names.” “Certainly, caution must be used in incorporating modules constructed via unsupervised machine learning into decision-making systems.” C a l i s k a n , A . , B r y s o n , J . J . , & N a r a y a n a n , A . ( 2 0 1 7 ) . S e m a n t i c s d e r i v e d a u t o m a t i c a l l y f r o m l a n g u a g e c o r p o r a c o n t a i n h u m a n - l i k e b i a s e s . S c i e n c e , 3 5 6 ( 6 3 3 4 ) , 1 8 3 - 1 8 6 . @galuhsahid
  87. Bias C H A L L E N G E

    De-biasing word embeddings: B o l u k b a s i , T. , C h a n g , K . W . , Z o u , J . Y. , S a l i g r a m a , V. , & K a l a i , A . T. ( 2 0 1 6 ) . M a n i s t o c o m p u t e r p r o g r a m m e r a s w o m a n i s t o h o m e m a k e r ? d e b i a s i n g w o r d e m b e d d i n g s . I n A d v a n c e s i n n e u r a l i n f o r m a t i o n p r o c e s s i n g s y s t e m s ( p p . 4 3 4 9 - 4 3 5 7 ) . @galuhsahid
  88. Bias C H A L L E N G E

    “We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.” G o n e n , H . , & G o l d b e r g , Y. ( 2 0 1 9 ) . L i p s t i c k o n a P i g : D e b i a s i n g M e t h o d s C o v e r u p S y s t e m a t i c G e n d e r B i a s e s i n W o r d E m b e d d i n g s B u t d o n o t R e m o v e T h e m . a r X i v p r e p r i n t a r X i v : 1 9 0 3 . 0 3 8 6 2 . @galuhsahid
  89. Bias L I M I T A T I O

    N https://www.blog.google/products/translate/reducing-gender-bias-google-translate/ @galuhsahid
  90. Beyond word2vec • Explore other techniques such as GloVe, fasttext,

    ELMo, BERT… • Train your own word embeddings • Use word embeddings for other tasks outside of NLP tasks (song2vec, perhaps?) @galuhsahid • Use word embeddings in NLP tasks (e.g. text classification with doc2vec)
  91. @galuhsahid