Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

fatml
November 18, 2016
610

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

fatml

November 18, 2016
Tweet

Transcript

  1. computer programmer (kəәmˈpjuːtəә ˈprəәʊɡræməә) n., pl., -s 1. A  man

     who  writes  programs  for  the   opera4on  of  computers,  especially  as  an  occupa4on. nurse ('nəәrs) n., pl., -s 1. A  woman  trained  to  care   for  the  sick  or  infirm,  especially  in  a  hospital.
  2. programmer   javascript   python   engineer computer_science   Mary

      John   she   he   soAball   football   Mary  Jones   She  was  valedictorian   So-ball  team  captain   Javascript,  python…   John  Jones   He  was  valedictorian   Football  team  captain   Javascript,  python…   1)           2)    
  3. MALE   FEMALE   SEXIST   DEFINITIONAL   (related  [Schmidt

     ‘15])   The  embedding  captures   gender  stereotypes  and  sexism.  
  4. MALE   FEMALE   SEXIST   DEFINITIONAL   tote  

    browsing   tanning   scrimmage   dress   sewing   brilliant   nurse   cocky   genius   homemaker   Easier  to  debias  an  embedding   than  to  debias  a  human   (related  [Schmidt  ‘15])  
  5. Based  on  word2vec  trained     on  Google  News  corpus

      Analogies Parallelograms capture semantics: [MikolovYZ 13] •  Man:King :: Woman:Queen •  Paris:France :: Tokyo:Japan •  He:Blue :: She:Pink •  He:Brother :: She:Sister •  He:Doctor :: She:Nurse •  He:Sausage :: She:Buns •  He:Realist :: She:Feminist •  She:Pregnancy :: He:Kidney stone •  He:Computer programmer :: She:Homemaker
  6. Metric1: occupations. 327 gender neutral occupations. Project on to she—he

    direction. she     he     homemaker   nurse   recep4onist   maestro   boss   philosopher  
  7. Metric1: occupations. 327 gender neutral occupations. Project on to she—he

    direction. 'SVV(TVSNIGXMSR WLI LI , GVS[HVEXMRK) = . Crowdworkers rate each occup. for gender stereotype she     he     homemaker   nurse   recep4onist   maestro   boss   philosopher  
  8. Consistency of embedding stereotype word2vec trained on Google news GloVe

    trained on web crawl Each dot is an occupation; Spearman = 0.8
  9. Metric 2: analogies. Automatically generate he : x :: she

    : y analogies. she     he     sister   brother   x  =     y  =    
  10. Metric 2: analogies. Automatically generate he : x :: she

    : y analogies. she     he     sister   brother   min cos(LI WLI, \ ]) WYGLXLEX ||\ ]|| < small angle x  =     y  =    
  11. Metric 2: analogies. Automatically generate he : x :: she

    : y analogies. she     he     sister   brother   min cos(LI WLI, \ ]) WYGLXLEX ||\ ]|| < homemaker   programmer   cupcake   pizza  
  12. Metric 2: analogies. Automatically generate he : x :: she

    : y analogies. she     he     sister   brother   min cos(LI WLI, \ ]) WYGLXLEX ||\ ]|| < cupcake   pizza   29/150  analogies  rated   as  gender  stereotypic   by  majority  of   crowdworkers  
  13. Metric 3: indirect bias. so-ball   football   pitcher  

    recep4onist   footballer   maestro   •  Gender stereotype could affect the geometry between words that should be gender-neutral. •  Project occupations onto softball—football axis.
  14. The geometry of gender she     he    

    her   his     woman   man   female   male   Mary   John   Select  pairs  of  words  that  reflect  gender  opposites.    
  15. The geometry of gender she     he    

    her   his     woman   man   female   male   Mary   John   Principal components Select  pairs  of  words  that  reflect  gender  opposites.    
  16. Geometry of gender 1 2 3 4 5 6 7

    8 9 10 % of variance explained Principal components The top PC seems to capture the gender subspace B.
  17. Debiasing algorithm (ver.1) 1.  Identify words that are gender-neutral N

    and gender- definitional S. 2.  Project away the gender subspace from the gender- neutral words. 3.  Normalize vectors. [ := [ [ · & JSV [ 2 & MWXLIKIRHIVWYFWTEGI
  18. Identify gender-definitional words he     she     king

      queen   programmer   homemaker   smart   cute   blue   pink   Linear SVM 218 gender-definitional words
  19. Projecting away gender component he     she    

    king   queen   programmer   homemaker   smart   cute   blue   pink   & &
  20. Projecting away gender component he     she    

    king   queen   homemaker   smart   cute   blue   pink   & programmer   & 299 dimensions “hard debiasing”
  21. Advanced debiasing Find a linear transformation T of the gender-neutral

    words to reduce the gender component while not moving the words too much. min 8 ||(8;)8(8;) ;8;|| * + ||(82)8(8&)|| * ; = QEXVM\SJEPP[SVHZIGXSVW 2 = QEXVM\SJRIYXVEP[SVHZIGXSVW don’t move too much minimize gender component
  22. Debiasing results: analogies Debiasing reduced stereotypic analogies while preserving the

    utilities of the embedding. # analogies generated # stereotypic analogies # analogies generated # appropriate analogies
  23. Debiasing results: indirect bias so-ball   football   pitcher  

    recep4onist   footballer   maestro   Original  embedding  
  24. Debiasing results: indirect bias so-ball   football   pitcher  

    recep4onist   footballer   maestro   Original  embedding   so-ball   football   pitcher   major  leaguer   footballer   midfielder   Debiased  embedding  
  25. Other types of embedding biases Embedding captures other types of

    biases: cultural, religious, ethnic, etc. Example: •  Occupation words closest to minorities: butler, footballer. •  Occupation words closest to whites: legislator, lawyer. Talk  to  Max  Leiserson!  
  26. computer programmer (kəәmˈpjuːtəә ˈprəәʊɡræməә) n., pl., -s 1. A  man

     who  writes  programs  for  the   opera4on  of  computers,  especially  as  an  occupa4on. nurse ('nəәrs) n., pl., -s 1. A  woman  trained  to  care   for  the  sick  or  infirm,  especially  in  a  hospital. Debias word-embedding à debias machine learning’s dictionary
  27. Discussion points •  Who’s responsible: data, algorithm or user? • 

    Geometry captures bias. •  Fairness and transparency of embeddings (interpretable). •  Using debiased embedding for downstream applications. Paper: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. NIPS’16.
  28. Thanks! •  Who’s responsible: data, algorithm or user? •  Geometry

    captures bias. •  Fairness and transparency of embeddings (interpretable). •  Using debiased embedding for downstream applications. Paper: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. NIPS’16.