Slide 1

Slide 1 text

Introduction to Word Embeddings Galuh Sahid @galuhsahid | github.com/galuhsahid

Slide 2

Slide 2 text

https://unsplash.com/photos/q0AtbGIOb5k @galuhsahid

Slide 3

Slide 3 text

@galuhsahid

Slide 4

Slide 4 text

How do we represent words? @galuhsahid

Slide 5

Slide 5 text

Language is hard @galuhsahid

Slide 6

Slide 6 text

Lake Forest Ocean River @galuhsahid

Slide 7

Slide 7 text

Lake Forest Ocean River @galuhsahid

Slide 8

Slide 8 text

Lake Forest Ocean River Body of water Not body of water Body of water Body of water @galuhsahid

Slide 9

Slide 9 text

Lake Forest Ocean River Doesn’t have trees Has trees Doesn’t have trees Doesn’t have trees @galuhsahid

Slide 10

Slide 10 text

WordNet https://www.nltk.org/images/wordnet-hierarchy.png @galuhsahid

Slide 11

Slide 11 text

What kind of representation do we want? • Real numbers • What do we want to know about a word? Whether they have the same meaning, semantic relationship… etc. • Can we do it without labelling everything manually? • Ideally it’s not too large! @galuhsahid

Slide 12

Slide 12 text

Word Embeddings to the Rescue Represent words as vectors of real numbers with much lower and thus denser dimensions @galuhsahid We’re putting words that are outside any vector space into a vector space - hence, we’re embedding the words into that vector space

Slide 13

Slide 13 text

Lake [ 0.89254 , 2.3112 , -0.70036 , 0.76679 , -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] (Spoiler alert) @galuhsahid

Slide 14

Slide 14 text

Visualization Lake Ocean River Forest Pizza @galuhsahid

Slide 15

Slide 15 text

Visualization Lake Ocean River Forest Pizza Maybe this dimension represents the concept of whether it is a food or not… @galuhsahid

Slide 16

Slide 16 text

Visualization Lake Ocean River Forest Pizza Maybe this dimension represents the concept of whether it is a food or not… Or it could be something not intuitive to us - we actually have no idea, though. It could be anything @galuhsahid

Slide 17

Slide 17 text

One-Hot Encoding Our vocabulary: lake, forest, ocean, river, pizza Lake 1 0 0 0 0 Forest 0 1 0 0 0 Ocean 0 0 1 0 0 River 0 0 0 1 0 Pizza 0 0 0 0 1 Size: |V| = 5 0 1 2 3 4 0 1 2 3 4 @galuhsahid

Slide 18

Slide 18 text

One-Hot Encoding Our vocabulary: every English word (approx. 171,476 words in use) Lake 0 0 0 … 1 … 0 0 Size: |V| = ~171.476 Aardvark 1 0 0 … 0 … 0 0 Zyzzogeton 0 0 0 … 0 … 0 1 https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/ … … Size: |V| = ~171.476 @galuhsahid

Slide 19

Slide 19 text

Distributional Representation “Tell me who your friends are, and I’ll tell you who you are.” @galuhsahid

Slide 20

Slide 20 text

Distributional Representation “Tell me who your friends are, and I’ll tell you who you are.” “You shall know a word by the company it keeps.” @galuhsahid (Firth, 1957)

Slide 21

Slide 21 text

Distributional Representation “Tell me who your friends are, and I’ll tell you who you are.” “You shall know a word by the company it keeps.” Distributional Hypothesis (Harris, 1954) Words that occur in similar contexts have similar meaning @galuhsahid (Firth, 1957)

Slide 22

Slide 22 text

Distributional Representation A lake is a large body of water in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid

Slide 23

Slide 23 text

Distributional Representation A lake is a large body of water in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid

Slide 24

Slide 24 text

Distributional Representation A lake is a large body of water in a body of land. An ocean is a large area of water between continents. A river is a stream of water that flows through a channel in the surface of the ground. A forest is a piece of land with many trees. Pizza is a type of food that was created in Italy. @galuhsahid

Slide 25

Slide 25 text

No manual annotation! @galuhsahid

Slide 26

Slide 26 text

Approaches W O R D E M B E D D I N G S • Count-based methods • Computes how often a word co-occurs with its neighbour words, then map the counts to a small, dense vector @galuhsahid

Slide 27

Slide 27 text

Count-based Lake large body water body land. Ocean large area water continents. River stream water flows channel surface ground. Forest piece land many trees. Pizza type food created Italy. [large, body, water] [large, area, water] [stream, water, flows] [piece, land, many] [type, food, created] window = 4 Neighbor words A P P R O A C H E S @galuhsahid

Slide 28

Slide 28 text

Lake Ocean River Forest Pizza [large, body, water] [large, area, water] [stream, water, flows] [piece, land, many] [type, food, created] Neighbor words Large Body Water Area Stream Flows Piece Land Many 1 1 1 0 0 0 0 0 0 0 0 0 Type Food Created 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Count-based A P P R O A C H E S @galuhsahid

Slide 29

Slide 29 text

Approaches W O R D E M B E D D I N G S • Count-based methods • Computes how often a word co-occurs with its neighbour words, then map the counts to a small, dense vector • Reduce dimensions using Singular Vector Decomposition (SVD) or Latent Dirichlet Allocation (LDA) @galuhsahid

Slide 30

Slide 30 text

Approaches W O R D E M B E D D I N G S • Predictive methods • Try to predict a word from its neighbors in terms of small and denser embedding vectors B a r o n i , M . , D i n u , G . , & K r u s z e w s k i , G . ( 2 0 1 4 ) . D o n ' t c o u n t , p r e d i c t ! A s y s t e m a t i c c o m p a r i s o n o f c o n t e x t - c o u n t i n g v s . c o n t e x t - p r e d i c t i n g s e m a n t i c v e c t o r s . I n P r o c e e d i n g s o f t h e 5 2 n d A n n u a l M e e t i n g o f t h e A s s o c i a t i o n f o r C o m p u t a t i o n a l L i n g u i s t i c s ( V o l u m e 1 : L o n g P a p e r s ) ( V o l . 1 , p p . 2 3 8 - 2 4 7 ) . @galuhsahid

Slide 31

Slide 31 text

Predictive Methods • Word2Vec • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • GloVe • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). @galuhsahid

Slide 32

Slide 32 text

Predictive Methods • FastText • https://research.fb.com/downloads/fasttext/ • ELMo • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. @galuhsahid

Slide 33

Slide 33 text

Predictive Methods • Word2Vec • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • GloVe • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). @galuhsahid

Slide 34

Slide 34 text

Architecture W O R D 2 V E C • Continuous Bag-of-Words (CBOW) • Skip-gram @galuhsahid

Slide 35

Slide 35 text

Skip-gram W O R D 2 V E C - S K I P - G R A M pizza [ 0.89254 , 2.3112 , -0.70036 , 0.76679 , -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] @galuhsahid

Slide 36

Slide 36 text

Skip-gram W O R D 2 V E C - S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner”

Slide 37

Slide 37 text

Skip-gram W O R D 2 V E C - S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner” Window size = 5

Slide 38

Slide 38 text

Skip-gram W O R D 2 V E C - S K I P - G R A M @galuhsahid “I ate the leftover pizza for dinner” Window size = 5 Neighbour words: the, leftover, for, dinner

Slide 39

Slide 39 text

Overview W O R D 2 V E C - S K I P - G R A M Output leftover for pizza Projection @galuhsahid Input the dinner • “I ate the leftover pizza for dinner”

Slide 40

Slide 40 text

Skip-gram W O R D 2 V E C - S K I P - G R A M @galuhsahid

Slide 41

Slide 41 text

Overview W O R D 2 V E C - S K I P - G R A M Output leftover for pizza Projection @galuhsahid Input the dinner • “I ate the leftover pizza for dinner”

Slide 42

Slide 42 text

Architecture W O R D 2 V E C - S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 …

Slide 43

Slide 43 text

Architecture W O R D 2 V E C - S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V

Slide 44

Slide 44 text

Architecture W O R D 2 V E C - S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V = vocabulary size

Slide 45

Slide 45 text

Architecture W O R D 2 V E C - S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V = vocabulary size = number of dimensions

Slide 46

Slide 46 text

Projection Layer W O R D 2 V E C - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid

Slide 47

Slide 47 text

Projection Layer W O R D 2 V E C - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid

Slide 48

Slide 48 text

Architecture W O R D 2 V E C - S K I P - G R A M pizza Input @galuhsahid 0 0 … 1 0 0 … Projection D V Output leftover for the dinner (Softmax)

Slide 49

Slide 49 text

Projection Layer W O R D 2 V E C - S K I P - G R A M Pizza … 0.23 -0.12 0.27 Aardvark … -0.21 0.35 0.56 Zyzzogeton … … D V … 0.89 2.31 -0.52 @galuhsahid

Slide 50

Slide 50 text

@galuhsahid

Slide 51

Slide 51 text

Word Vector W O R D 2 V E C - S K I P - G R A M … 0.89 2.31 -0.52 Pizza This is our word vector! @galuhsahid

Slide 52

Slide 52 text

Word Vector W O R D 2 V E C - S K I P - G R A M … 0.89 2.31 -0.52 Pizza This is our word vector! [ 0.89254 , 2.3112 , -0.70036 , 0.76679 , -1.0815 , 0.40426 , -1.3462 , 0.71 , 0.90067 , -1.043 , -0.57966 , 0.18669 , 1.0996 , -0.90042 , -0.045962, 0.31492 , 1.4128 , 0.84963 , -1.3389 , -0.32252 , -0.10208 , -0.31783 , 0.33173 , 0.096593, 0.36732 , -1.1466 , 0.3123 , 1.549 , -0.13059 , -0.62003 , 1.774 , -0.62134 , 0.065215, -0.39758 , 0.095832, -0.56289 , -0.39552 , -0.16224 , 1.0035 , 0.39161 , -0.54489 , 0.21744 , 0.10831 , -0.06952 , -1.046 , -0.36096 , -0.48233 , -0.90467 , -0.044913, -0.52132 ] @galuhsahid

Slide 53

Slide 53 text

The Intuition W O R D 2 V E C - S K I P - G R A M … 0.89 2.31 -0.52 Pizza the leftover for dinner … 0.76 2.01 -0.47 Chicken some leftover recipes for @galuhsahid “I ate the leftover pizza for dinner” “I need some leftover chicken recipes for dinner”

Slide 54

Slide 54 text

The Intuition W O R D 2 V E C - S K I P - G R A M … 0.89 2.31 -0.52 Pizza the leftover for dinner … 0.76 2.01 -0.47 Chicken some leftover recipes for @galuhsahid … 0.32 0.43 -0.21 Prague embassy in is located “I ate the leftover pizza for dinner” “I need some leftover chicken recipes for dinner” “The Germany embassy in Prague is located in…”

Slide 55

Slide 55 text

Architecture W O R D 2 V E C - S K I P - G R A M @galuhsahid M i k o l o v, T. , C h e n , K . , C o r r a d o , G . , & D e a n , J . ( 2 0 1 3 ) . E f f i c i e n t e s t i m a t i o n o f w o r d r e p r e s e n t a t i o n s i n v e c t o r s p a c e . a r X i v p r e p r i n t a r X i v : 1 3 0 1 . 3 7 8 1 . More details:

Slide 56

Slide 56 text

Exploring Word Embeddings with Gensim @galuhsahid

Slide 57

Slide 57 text

Pre-trained Models E X P L O R I N G W O R D E M B E D D I N G S • Gensim has an API to download pre-trained word embedding models. The list of available models can be found here. @galuhsahid

Slide 58

Slide 58 text

Loading the Model E X P L O R I N G W O R D E M B E D D I N G S model_w2v = KeyedVectors.load_word2vec_format( './GoogleNews-vectors-negative300.bin', binary=True) @galuhsahid from gensim.models import KeyedVectors

Slide 59

Slide 59 text

Word Vector E X P L O R I N G W O R D E M B E D D I N G S model_w2v[“lake”] array([-8.39843750e-02, 2.02148438e-01, 2.65625000e-01, 1.04980469e-01, -7.95898438e-02, 1.05957031e-01, -5.39550781e-02, 8.11767578e-03, 9.32617188e-02, -7.66601562e-02, 1.56250000e-01, -1.19628906e-01, … -4.15039062e-02, 4.08935547e-03, -2.47070312e-01, -1.78710938e-01, 3.33984375e-01, -1.79687500e-01], dtype=float32) @galuhsahid

Slide 60

Slide 60 text

Similar Words E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar("apple") [('apples', 0.7203598022460938), ('pear', 0.6450696587562561), ('fruit', 0.6410146355628967), ('berry', 0.6302294731140137), ('pears', 0.6133961081504822), ('strawberry', 0.6058261394500732), ('peach', 0.6025873422622681), ('potato', 0.596093475818634), ('grape', 0.5935864448547363), ('blueberry', 0.5866668224334717)] • The distance is calculated using cosine similarity @galuhsahid • Similar words are nearby vectors in a vector space

Slide 61

Slide 61 text

Get Similarity E X P L O R I N G W O R D E M B E D D I N G S model_w2v.similarity("apple", "mango") 0.57518554 @galuhsahid

Slide 62

Slide 62 text

Odd One Out E X P L O R I N G W O R D E M B E D D I N G S model_w2v.doesnt_match(["lake", "forest", "ocean", "river"]) 'forest' @galuhsahid

Slide 63

Slide 63 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["uncle", "woman"], negative=["man"]) • man to uncle is woman to ... @galuhsahid

Slide 64

Slide 64 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["uncle", "woman"], negative=["man"]) [('aunt', 0.8022665977478027), ('mother', 0.7770732045173645), ('niece', 0.768424928188324), ('father', 0.7237852811813354), ('grandmother', 0.722037136554718), ('daughter', 0.7185647487640381), ('sister', 0.7006258368492126), ('husband', 0.6982548236846924), ('granddaughter', 0.6858304738998413), ('nephew', 0.6710714101791382)] • man to uncle is woman to ... Words that are similar to uncle and woman but dissimilar to man uncle + woman - man @galuhsahid

Slide 65

Slide 65 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["Berlin", “France”], negative=["Germany"]) [('Paris', 0.7672388553619385), ('French', 0.6049168109893799), ('Parisian', 0.5810437202453613), ('Colombes', 0.5599985718727112), ('Hopital_Europeen_Georges_Pompidou', 0.555890679359436), ('Melun', 0.551270067691803), ('Dinard', 0.5451847314834595), ('Brussels', 0.5420989990234375), ('Mairie_de', 0.5337448120117188), ('Cagnes_sur_Mer', 0.531246542930603)] • Germany to Berlin is France to... @galuhsahid

Slide 66

Slide 66 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["Berlin", “France”], negative=["Germany"]) • Germany to Berlin is France to... @galuhsahid

Slide 67

Slide 67 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["run", "walking"], negative=["running"]) @galuhsahid • Running to run is walking to…

Slide 68

Slide 68 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S model_w2v.most_similar(positive=["run", "walking"], negative=["running"]) [('walk', 0.7163699865341187), ('walks', 0.5965700745582581), ('walked', 0.5833066701889038), ('stroll', 0.5236037969589233), ('pinch_hitter_Yunel_Escobar', 0.4562637209892273), ('Walking', 0.455409437417984), ('Batterymate_Miguel_Olivo', 0.4483090043067932), ('runs', 0.4462803602218628), ('pinch_hitter_Carlos_Guillen', 0.4402925372123718), ('Justin_Speier_relieved', 0.43528205156326294)] @galuhsahid • Running to run is walking to…

Slide 69

Slide 69 text

Analogies E X P L O R I N G W O R D E M B E D D I N G S https://www.tensorflow.org/tutorials/representation/word2vec @galuhsahid

Slide 70

Slide 70 text

h t t p : / / b i o n l p - w w w . u t u . f i / w v _ d e m o / @galuhsahid

Slide 71

Slide 71 text

h t t p s : / / i n d o n e s i a n - w o r d - e m b e d d i n g . h e r o k u a p p . c o m h t t p : / / g i t h u b . c o m / g a l u h s a h i d / i n d o n e s i a n - w o r d - e m b e d d i n g @galuhsahid

Slide 72

Slide 72 text

Training Your Own • Sure you can! • When to do so? • Specific problem domains • Challenge: training data • Another alternative: continue training a pre-existing word embedding E X P L O R I N G W O R D E M B E D D I N G S @galuhsahid

Slide 73

Slide 73 text

Search A P P L I C A T I O N @galuhsahid

Slide 74

Slide 74 text

Neural Machine Translation A P P L I C A T I O N Q i , Y. , S a c h a n , D . S . , F e l i x , M . , P a d m a n a b h a n , S . J . , & N e u b i g , G . ( 2 0 1 8 ) . W h e n a n d w h y a r e p r e - t r a i n e d w o r d e m b e d d i n g s u s e f u l f o r n e u r a l m a c h i n e t r a n s l a t i o n ? . a r X i v p r e p r i n t a r X i v : 1 8 0 4 . 0 6 3 2 3 . @galuhsahid

Slide 75

Slide 75 text

Recommendation Engine A P P L I C A T I O N https://towardsdatascience.com/using-word2vec-for-music-recommendations-bb9649ac2484 @galuhsahid

Slide 76

Slide 76 text

Out-of-vocabulary Words C H A L L E N G E • Word2vec doesn’t handle this • FastText handles this because it trains n-grams instead of words breaking down each word into n-grams • ELMo also handles this because it trains the model on character-level amazing -> , , , , , , V1 V2 V3 V4 V5 V6 V7 min_n = max_n = 3 amazin -> , , , , , V1 V2 V3 V4 V5 V9 @galuhsahid

Slide 77

Slide 77 text

Polysemy C H A L L E N G E The word “rock” @galuhsahid

Slide 78

Slide 78 text

Polysemy C H A L L E N G E The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A @galuhsahid

Slide 79

Slide 79 text

Polysemy C H A L L E N G E The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A @galuhsahid

Slide 80

Slide 80 text

Polysemy C H A L L E N G E The word “rock” https://unsplash.com/photos/I4zSNSxR8oA https://unsplash.com/photos/xssEs_oCv-A https://www.muscleandfitness.com/workouts/athletecelebrity- workouts/dwayne-rock-johnsons-shoulder-workout @galuhsahid

Slide 81

Slide 81 text

Polysemy C H A L L E N G E He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday @galuhsahid

Slide 82

Slide 82 text

Polysemy C H A L L E N G E He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday Same word, different meaning @galuhsahid

Slide 83

Slide 83 text

Polysemy C H A L L E N G E He caught a fish at the bank of the river The bank at the end of the street was robbed yesterday Same word, different meaning @galuhsahid More recent word models such as ELMo and BERT will assign different word vectors for the word “bank” because they appear in different contexts

Slide 84

Slide 84 text

Bias C H A L L E N G E https://twitter.com/zeynep/status/799662089740681217 @galuhsahid

Slide 85

Slide 85 text

Bias C H A L L E N G E @galuhsahid

Slide 86

Slide 86 text

Bias C H A L L E N G E “Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names.” “Certainly, caution must be used in incorporating modules constructed via unsupervised machine learning into decision-making systems.” C a l i s k a n , A . , B r y s o n , J . J . , & N a r a y a n a n , A . ( 2 0 1 7 ) . S e m a n t i c s d e r i v e d a u t o m a t i c a l l y f r o m l a n g u a g e c o r p o r a c o n t a i n h u m a n - l i k e b i a s e s . S c i e n c e , 3 5 6 ( 6 3 3 4 ) , 1 8 3 - 1 8 6 . @galuhsahid

Slide 87

Slide 87 text

Bias C H A L L E N G E De-biasing word embeddings: B o l u k b a s i , T. , C h a n g , K . W . , Z o u , J . Y. , S a l i g r a m a , V. , & K a l a i , A . T. ( 2 0 1 6 ) . M a n i s t o c o m p u t e r p r o g r a m m e r a s w o m a n i s t o h o m e m a k e r ? d e b i a s i n g w o r d e m b e d d i n g s . I n A d v a n c e s i n n e u r a l i n f o r m a t i o n p r o c e s s i n g s y s t e m s ( p p . 4 3 4 9 - 4 3 5 7 ) . @galuhsahid

Slide 88

Slide 88 text

Bias C H A L L E N G E “We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.” G o n e n , H . , & G o l d b e r g , Y. ( 2 0 1 9 ) . L i p s t i c k o n a P i g : D e b i a s i n g M e t h o d s C o v e r u p S y s t e m a t i c G e n d e r B i a s e s i n W o r d E m b e d d i n g s B u t d o n o t R e m o v e T h e m . a r X i v p r e p r i n t a r X i v : 1 9 0 3 . 0 3 8 6 2 . @galuhsahid

Slide 89

Slide 89 text

Bias L I M I T A T I O N https://www.blog.google/products/translate/reducing-gender-bias-google-translate/ @galuhsahid

Slide 90

Slide 90 text

Beyond word2vec • Explore other techniques such as GloVe, fasttext, ELMo, BERT… • Train your own word embeddings • Use word embeddings for other tasks outside of NLP tasks (song2vec, perhaps?) @galuhsahid • Use word embeddings in NLP tasks (e.g. text classification with doc2vec)

Slide 91

Slide 91 text

@galuhsahid