Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender stereotypes are reflected in the distributional structure of 25 languages

mllewis
May 21, 2020

Gender stereotypes are reflected in the distributional structure of 25 languages

mllewis

May 21, 2020
Tweet

More Decks by mllewis

Other Decks in Research

Transcript

  1. 1
    Gender stereotypes are reflected in the
    distributional structure of 25 languages
    Molly Lewis
    Carnegie Mellon
    University
    Computational Social
    Science Workshop @
    U Chicago
    21 May 2020

    View Slide

  2. Gender disparities are pervasive
    e.g., STEM fields
    (Huang, et al., 2020)

    View Slide

  3. 3
    Stereotypes develop in childhood…
    % Male scientist drawn
    Age (years) (Miller et al, 2018)
    0
    100
    50

    View Slide

  4. Where do stereotypes come from?
    Learn a lot from language:
    The earth is round.
    Mongolia is really cold.
    Octopi have three hearts.
    Boys are better at math than girls.
    What about more implicit messages in
    language?
    4

    View Slide

  5. Semantic information from word co-occurrences
    Distributional semantics: Semantic similarity between two words A and B is a
    function of the similarity of the linguistic contexts in which they appear.
    Sam ate the
    red apple
    near the
    red barn...
    Sam ate the red apple near barn
    Sam 0 1 0 0 0 0 0
    ate 1 0 1 0 0 0 0
    the 0 1 0 2 0 1 0
    red 0 0 2 0 1 0 1
    apple 0 0 0 1 0 1 0
    near 0 0 1 0 1 0 0
    barn 0 0 0 1 0 0 0
    .
    .
    .
    ...
    5

    View Slide

  6. Reducing dimensionality of co-occurrence statistics
    extracts semantic information
    • Represent all words from
    a corpus within a k-
    dimensional space
    • Preserving distances
    between words in their
    local contexts
    • Similar to factor analysis
    6
    k = 3
    (Kozlowski, Taddy, & Evans, 2019)

    View Slide

  7. Distributional models as learning models
    7
    HAL (Lund & Burgess, 1996)
    LSA (Landauer & Dumais, 1997)
    Word2vec (Mikolov, Chen, Corrado, & Dean, 2013)
    GloVe (Pennington, Socher, & Manning, 2014)

    Cognitive Theory (Cognitive Science)
    Solving language tasks (Machine Learning)

    View Slide

  8. Humans are good at learning statistics
    • Co-occurrence statistics to identify
    words (Saffran, Aslin, & Newport, 1996)
    • Co-occurrence statistics to identify
    meanings (Smith & Yu, 2008)
    • Co-occurrence statistics in the
    visual domain (Kirkham, Slemmer, &
    Johnson, 2002)
    • Distributional statistics about
    everyday events (Griffiths & Tenenbaum,
    2006)
    8

    View Slide

  9. Do humans learn social stereotypes by
    tracking distributional statistics in language?
    9

    View Slide

  10. Gender stereotype
    10
    Men - career Women - family

    View Slide

  11. Implicit Association Test (IAT)
    Categories
    X = {man, male, he, him, boy}
    Y = {woman, female, she, her, girl}
    Attributes
    A = {career, salary, office, business, professional}
    B = {family, home, parents, children, cousins}
    Participants slower for incongruent mapping (right), suggesting
    bias to associate men with career.
    man
    career
    woman
    family
    compare reaction time
    man
    career
    woman
    family
    11

    View Slide

  12. Quantifying the implicit bias: IAT effect size
    Effect size = mean RT in incongruent condition – mean RT in congruent condition
    12
    Bigger effect size -> bigger bias

    View Slide

  13. (male−
    family)
    −0.04
    −0.02
    0.00
    0.02
    (male−
    career)
    Implicit psychological gender bias by country
    Implicit gender bias by country
    (male−
    family)
    −0.04
    −0.02
    0.00
    0.02
    (male−
    career)
    N = 764,520 participants
    (Project Implicit:
    Nosek, Banaji, & Greenwald, 2002)
    https://implicit.harvard.edu/implicit/
    13

    View Slide

  14. (male−
    family)
    −0.04
    −0.02
    0.00
    0.02
    (male−
    career)
    Implicit psychological gender bias by country
    Does bias in language predict bias in IAT?
    Psychological measure (IAT)
    Language measure (word-occurrences)
    Word embedding models trained on
    the 25 languages spoken by participants in
    the sample of countries
    14

    View Slide

  15. +
    Measure gender bias
    from human judgements
    (≈ 4,500 words; Scott et al., 2018)
    Word embedding model
    trained on corpus of
    movie and TV subtitles in
    English (Lison & Tiedemann,
    2016; Van Paridon & Thompson,
    in prep.).
    Quantify gender
    association in semantic
    space as cosine distance
    Validating distributional statistics as encoding gender
    semantics
    +
    + +
    +
    +
    he
    son male
    boy
    his
    him
    brother
    ”home”
    +
    +
    man
    +
    +
    +
    +
    +
    hers
    daughter female
    girl she
    her
    sister
    +
    +
    woman
    +
    15
    Very masculine Very feminine
    Rate the gender association of the word “home”

    View Slide

  16. 16
    r = 0.63
    1
    (male)
    2
    3
    4
    5
    6
    7
    (female)
    −0.15
    (male)
    −.1 −.05 0.0 .05 .1 .15
    (female)
    Linguistic Gender Association
    Human Judgement of Gender Association
    Replicated on
    model trained on
    English Wikipedia
    (r = .59)

    View Slide

  17. Implicit Association Test (IAT)
    Categories
    X = {man, male, he, him, boy}
    Y = {woman, female, she, her, girl}
    Attributes
    A = {career, salary, office, business, professional}
    B = {family, home, parents, children, cousins}
    man
    career
    woman
    family
    compare reaction time
    man
    career
    woman
    family
    …based on word co-occurrences
    (using the same method as Caliskan, Bryson, & Narayanan, 2017)
    compare distance
    in semantic space
    17

    View Slide

  18. Language IAT effect size
    18
    Categories
    X* = {man, male, he, him, boy}
    Y* = {woman, female, she, her, girl}
    Attributes
    A* = {career, salary, office, business, professional}
    B* = {family, home, parents, children, cousins}
    * translated into each language

    View Slide

  19. Arabic
    Danish
    German
    English
    Spanish
    Persian
    Finnish French
    Hebrew
    Hindi
    Croatian
    Indonesian
    Italian
    Japanese
    Korean
    Malay
    Dutch
    Norwegian
    Polish
    Portuguese
    Romanian
    Swedish
    Filipino
    Turkish
    Mandarin
    r = 0.48
    −.075
    (weaker)
    −.05
    −.025
    0
    .025
    .05
    (stronger)
    −.25
    (weaker)
    0 .25 .5 .75 1
    (stronger)
    Language Male−Career Association
    (effect size)
    Implicit Male−Career Association
    (residualized effect size)
    N participants
    1,000
    10,000
    100,000
    Implicit and Linguistic
    Male−Career Association
    a
    Athletic−Intelligent
    Avoiding−Approaching
    Career−Family
    Cold−Hot
    Friends−Family
    Helpers−Leaders
    Innocence−Wisdom
    Jocks−Nerd
    Lawyers−Po
    Money−Love
    Defense−E
    Labor−Management
    Protein−Carbs.
    Punishment−Forgiveness
    Rebellious−Conforming
    Security−Freedom
    Skeptical−Tru
    State−Church
    Tall−Short
    Team−Individual
    Technology−Nature
    Urban−Rural





















    −.2
    (US
    Greater)
    −.1
    0
    .1
    .2
    .3
    (UK
    Greater)
    −1.5
    (US Greater)
    −1 −.5 0
    Language Association
    (effect size)
    Implicit Association Difference
    (residualized effect size)
    Implicit and Linguistic Associa
    in British and American Partic
    b

    View Slide

  20. What about gender information encoded
    more explicitly in language?
    Grammatical gender
    “enfermero”
    (nurse-MASC)
    “enfermera”
    (nurse-FEM)
    20
    Lexicalized gender
    “waiter”
    (waiter-MASC)
    “waitress”
    (waiter-FEM)

    View Slide

  21. Explicit linguistic gender and implicit bias
    • 12/25 languages encoded
    gender grammatical
    • Languages that encoded
    gender grammatically did not
    have more implicit bias.
    (Mdiff
    = 0.01 [-0.01, 0.03])
    • In contrast, languages with
    more lexicalized gender forms
    have more implicit bias.
    21
    Romanian
    Portuguese
    German
    Spanish
    Hebrew
    Polish
    Croatian
    Italian
    French
    Dutch
    Danish
    Swedish
    Norwegian
    English
    Japanese
    Korean
    Mandarin
    Finnish
    Hindi
    Turkish
    Arabic
    Persian
    Indonesian
    Malay
    Filipino
    r = 0.57
    −.075
    (weaker)
    −.05
    −.025
    0
    .025
    .05
    (stronger)
    0.00 0.25 0.50 .75 1.00
    Prop. Gender−Specific Occupation Terms
    Implicit Male−Career Association
    (residualized effect size)
    Implicit Male−Career Association
    and Gendered Occupation Terms
    a
    (
    Implicit Male−Career Association
    (residualized effect size)
    b

    View Slide

  22. Summary
    Statistical associations in language reflect human
    judgements of gender associations
    Speakers of languages with stronger statistical
    associations between men-career and women-
    family tend to have a stronger bias when
    measured via the IAT
    Speakers of languages with more lexicalized
    gender distinctions (but not grammatical gender)
    tend to have a stronger bias in the IAT.
    22
    Romanian
    Portuguese
    German
    Spanish
    Hebrew
    Polish
    Croatian
    Italian
    French
    Dutch
    Danish
    Swedish
    Norwegian
    English
    Japanese
    Korean
    Mandarin
    Finnish
    Hindi
    Turkish
    Arabic
    Persian
    Indonesian
    Malay
    Filipino
    r = 0.57
    −.075
    (weaker)
    −.05
    −.025
    0
    .025
    .05
    (stronger)
    0.00 0.25 0.50 .75 1.00
    Prop. Gender−Specific Occupation Terms
    Implicit Male−Career Association
    (residualized effect size)
    Implicit Male−Career Association
    and Gendered Occupation Terms
    a
    Romanian
    Portuguese
    German
    Spanish
    Hebrew
    Polish
    Croatian
    Italian
    French
    Dutch
    Danish
    Swedish
    Norwegian
    English
    Japanese
    Korean
    Mandarin
    Finnish
    Hindi
    Turkish Arabic
    Persian
    Indonesian
    Malay
    Filipino
    r = 0.49
    −.075
    (weaker)
    −.05
    −.025
    0
    .025
    .05
    (stronger)
    0
    (netural)
    .025 .05 .075 .1
    (gendered)
    Genderness of Occupation Terms
    Implicit Male−Career Association
    (residualized effect size)
    N participants
    1,000
    10,000
    100,000
    Implicit Male−Career Association
    and Gender Associations for Occupation Terms
    b
    Arabic
    Danish
    German
    English
    Spanish
    Persian
    Finnish French
    Hebrew
    Hindi
    Croatian
    Indonesian
    Italian
    Japanese
    Korean
    Malay
    Dutch
    Norwegian
    Polish
    Portuguese
    Romanian
    Swedish
    Filipino
    Turkish
    Mandarin
    r = 0.48
    −.075
    (weaker)
    −.05
    −.025
    0
    .025
    .05
    (stronger)
    −.25
    (weaker)
    0 .25 .5 .75 1
    (stronger)
    Language Male−Career Association
    (effect size)
    Implicit Male−Career Association
    (residualized effect size)
    N participants
    1,000
    10,000
    100,000
    Implicit and Linguistic
    Male−Career Association
    a
    Athletic−Intelligent
    Avoiding−Approaching
    Career−Family
    Chaos−Order
    Cold−Hot
    Determinism−Free Will
    Friends−Family
    Helpers−Leaders
    Innocence−Wisdom
    Jocks−Nerds
    Lawyers−Politicians
    Money−Love
    Defense−Education
    Labor−Management
    Poor−Rich
    Private−Public
    Protein−Carbs.
    Punishment−Forgiveness
    Rebellious−Conforming
    Rich−Beautiful
    Security−Freedom
    Skeptical−Trusting
    Speed−Accuracy
    Stable−Flexible
    State−Church
    Tall−Short
    Team−Individual
    Technology−Nature
    Tradition−Progress
    Urban−Rural
    Winter−Summer































    r = 0.3
    −.2
    (US
    Greater)
    −.1
    0
    .1
    .2
    .3
    (UK
    Greater)
    −1.5
    (US Greater)
    −1 −.5 0 .5 1
    (UK Greater)
    Language Association Difference
    (effect size)
    Implicit Association Difference
    (residualized effect size)
    Implicit and Linguistic Associations
    in British and American Participants
    b
    r = 0.63
    1
    (male)
    2
    3
    4
    5
    6
    7
    (female)
    −0.15
    (male)
    −.1 −.05 0.0 .05 .1 .15
    (female)
    Linguistic Gender Association
    Human Judgement of Gender Association

    View Slide

  23. Do humans learn social stereotypes by
    tracking distributional statistics in language?
    23
    Evidence for a correspondence between human semantic
    knowledge and distributional statistics (necessary but not
    sufficient)
    Testing the causal question and other implications

    View Slide

  24. Is the link causal?
    • All the evidence I’ve presented so far is correlational
    • Likely bi-directional
    • What kind of evidence might we bring to bear on this?
    • Longitudinal analyses: e.g., testing whether changes in language statistics
    predict or follow changes in measured implicit associations (Greenwald, 2017;
    Charlesworth & Banaji, 2019)
    • Quasi-experimental tests: e.g., measuring implicit associations in bilinguals
    using stimuli in languages that embed different linguistic associations
    • Experimental designs: measure the effect of manipulating language
    statistics on people’s implicit associations.
    24
    Distributional statistics Human semantic representations

    View Slide

  25. Gender bias in children’s books
    If biases are learned from language, expect them to be present
    in the input to people who are learning the biases (i.e. children)
    25
    −0.5
    0.0
    0.5
    1.0
    1.5
    male−career male−math
    (vs. language)
    male−math
    (vs. art)
    Bias Type
    Language IAT effect size
    Languag
    Wikipe
    Adult F
    Childre
    Language IAT Bias in Different Corpora

    View Slide

  26. Language gender bias and other causal
    forces contributing to gender differences

    View Slide

  27. Students to report on how easy they thought it would be
    for them to:
    • recognize the science question that underlies a
    newspaper report on a health issue
    • explain why earthquakes occur more frequently in
    some areas than in others
    • describe the role of antibiotics in the treatment of
    disease
    • etc.
    Language gender bias and other causal
    forces contributing to gender differences

    View Slide

  28. Do humans learn social stereotypes by
    tracking distributional statistics in language?
    • Evidence for a close correspondence
    between human semantic knowledge
    and distributional statistics in the case
    of a particular stereotype: men-career;
    women-family
    • Consistent with the idea that language is playing a causal role in shaping
    social stereotypes.
    • Strongly correlated with other hypothesized “psychological” causal factors
    • Additional work needs to be done to more directly test causality, and
    relationship to other causal forces
    • Suggests that intervening on language input could change biases

    View Slide

  29. Thanks!
    Gary Lupyan
    (U. of Wisconsin-Madison)

    View Slide