Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

Devney
January 30, 2017

Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

Machine learning and predictive statistics systems learn human biases when they rely on data generated by humans. Humans generate data like text, search terms, browser histories, grades and employment status, criminal and financial records, etc. This talk uses exciting work at MIT about how Google’s word2vec learns gender bias in Google News training data. Then comes the good news that we can use vector math ‘debias’ word2vec’s learned representation of English words and their relationships. The second half of the talk explores the use of predictive statistics in criminal sentencing. It opens up questions we need to ask when applying machine learning to humans: 1) Will the future resemble the past? Do we want it to? 2) How objective is raw data? 3) Is there an acceptable margin of error? 4) Is more data always better? 4) How and when is a slight bias in raw amplified in a machine learning application? The audience is encouraged to go after the opportunity to use de-biasing in such sensitive applications, and the opportunity for deep discussions about how we want to treat each other in light of the data and biases we live with.

Devney

January 30, 2017
Tweet

More Decks by Devney

Other Decks in Technology

Transcript

  1. Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES? A:

    We give machine learning algorithms and statistical models raw data that was generated by human behavior . . .
  2. Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES? A:

    We give machine learning algorithms and statistical models raw data that was generated by human behavior . . . . . . humans that have biases.
  3. Q: CAN WE UNTEACH THEM? A: Yes . . .

    . . . with some math, more research, and commitment to each other.
  4. WHAT’S AT STAKE? When we make decisions about a person’s

    life based on machine learning, we risk amplifying any bias in the raw data.
  5. WHAT’S AT STAKE? When we make decisions about a person’s

    life based on machine learning, we risk amplifying any bias in the raw data. In some applications, our shared humanity is at stake.
  6. THE ROUTE FOR TODAY o Machine learning in five minutes

    o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system
  7. QUESTIONS ALONG THE WAY o Will the future resemble the

    past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .
  8. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. 3. 4. 5. Hmm. . . . what movies does this internet user like?
  9. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. Find data where the answer is known 3. 4. 5. . . . like browser histories of a lot of internet users and what movies they like.
  10. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. 5. y = f(x) . . . like the browser history of an internet user and what movies they like.
  11. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. y = f(x) f is what we’re ‘learning’ in machine learning. There are many techniques for learning it.
  12. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. ? = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.
  13. PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question

    2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. y = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.
  14. EXAMPLE MACHINE LEARNING APPLICATIONS APPLICATION y x Self-driving vehicles Optimum

    next state of vehicle controls Sensor input, history, etc Targeted ads Ads user will click Internet history Statistics-based translation Best translation Input in starting language y = f(x)
  15. ASSUMPTIONS IN THIS PROCESS o The future will resemble the

    past o Some error rate is acceptable
  16. WHERE ARE WE NOW? o Machine learning in five minutes

    o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system having fun yet?
  17. THE WHAT . . . trained Google’s Word2Vec tool using

    Google News as raw data. Word2Vec builds a word embedding representing the relationships between words.
  18. WHAT IS A WORD EMBEDDING? Words are arranged in vector

    space so that the closer two words are, the more similar they are.
  19. WHAT IS A WORD EMBEDDING? Words are arranged in vector

    space so that the closer two words are, the more similar they are. Directions in the space can correspond to abstract concepts.
  20. WHAT IS A WORD EMBEDDING? y = f(x) similar words

    = f(a given word) a word = f(some given words) f is the learned vector space and some cosine math
  21. BACK TO THE CURIOUS CASE In the word embedding, the

    concepts of ‘she’ and ‘he’ have directions. SHE-ALIGNED WORDS HE-ALIGNED WORDS Homemaker Guidance Counselor Housekeeper Librarian Nurse Philosopher Financier Fighter pilot Magician Architect Boss
  22. MORE OF CURIOUS CASE When applied to creating analogies, it

    came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king
  23. MORE OF THE CURIOUS CASE When applied to creating analogies,

    it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football
  24. MORE OF THE CURIOUS CASE When applied to creating analogies,

    it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football GENDER STEREOTYPES
  25. . . . . so what’s the big deal? It’s

    just reflecting the state of the world . . .
  26. QUESTIONS ALONG THE WAY o Will the future resemble the

    past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .
  27. WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT

    TO? she:he::registered nurse: physician Currently: ~91% of nurses are women, 9% men ~33% of physicians are women, ~67% men.
  28. HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,

    teenager, girl] Ummmm . . . Google News data. 65-75% of abductions are of girls. 81% of non-family abductions are of teenagers.
  29. RAW DATA WITH LITTLE TO NO HUMAN BIAS Weather Outer

    space Geology . . . RAW DATA WITH MORE HUMAN BIAS Search terms Text Medical, criminal, educational, financial records . . .
  30. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’
  31. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs”
  32. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs” 5) There’s only one top spot on the search page. Who gets it - John? or Mary?
  33. HOW DO WE FIX IT? We can ‘debias’ the vector

    space with some geometry . . . . . . position a gender-neutral term like ‘computer programmer’ so that it’s equally distant from ‘she’ and ‘he.’
  34. WHAT WE HAVE LEARNED SO FAR: o Human biases enter

    raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.
  35. WHERE ARE WE NOW? o Machine learning in five minutes

    o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system
  36. WHAT WE’RE BRINGING WITH US FROM THE WORD2VEC CASE o

    Human biases enter raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.
  37. THE WHAT Risk assessments such as COMPAS, which assign a

    risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.
  38. PAUSE - QUESTIONS THAT COME TO MIND? Risk assessments such

    as COMPAS, which assign a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.
  39. THE WHAT y = f(x) risk of future arrest =

    f(arrest & conviction history, education, social relationships, employment, zip code)
  40. THE WHEN - arrest, to assign bail (y is risk

    of failing to appear) - release on parole - sentencing?
  41. THE OPPORTUNITIES - keep low-risk people out of the criminal

    system - make decisions more consistent
  42. THE DOUBLE-EDGED SWORDS - accuracy and errors are systematic -

    a computer program instead of a series of people making judgement calls - limited input, no individualized assessment
  43. THE HOLY GRAIL “We are at unique time in history.

    We are being presented with the chance of a generation, and perhaps a lifetime, to reform sentencing and unwind mass incarceration in a scientific way and that opportunity is slipping away because of misinformation and misunderstanding about [risk assessment tools like COMPAS]” - Flores et al 2016.
  44. THE CONTROVERSY Makers of COMPAS: “no racial bias.” ProPublica’s investigation:

    “significant racial disparities.” Anthony Flores, Prof of Criminology UC Bakersfield: "We found no bias of any kind when we re-analysed [ProPublica’s] data . . . We didn't necessarily disagree with their findings, we just disagreed with their conclusion."
  45. SOURCES OF CONTROVERSY - The model and algorithm are considered

    private intellectual property. - Racial disparities in the model’s predictions reflect real racial disparities in arrests. - What is an acceptable margin in of error?
  46. THE POINT OF AGREEMENT If black people in general are

    more likely to be re- arrested, then a black defendant is more likely to be given a higher risk score.
  47. THE POINT OF AGREEMENT People’s behavior is biased. People’s behavior

    produces biased raw data. Even an un-biased mathematical model trained on biased raw data produces results that perpetuate the bias.
  48. QUESTIONS ALONG THE WAY o Will the future resemble the

    past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .
  49. WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT

    TO? High risk parolee -> regular parole officer -> re-arrest WITHOUT RISK ASSESSMENT
  50. WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT

    TO? High risk parolee -> better parole officer -> succeeds with re-entry WITH RISK ASSESSMENT
  51. HOW OBJECTIVE IS RAW DATA? The raw data is: -

    arrest record - conviction record - employment - education - zip code - gender - age - friends & family’s criminal history - friends & family’s substance use
  52. HOW OBJECTIVE IS RAW DATA? Part of the argument is

    that individual officers, corrections staff, and judges have bias, and statistics are more objective. These individual people are making decisions that create a person’s arrest record and conviction record.
  53. HOW OBJECTIVE IS RAW DATA? Part of the argument is

    that individual officers, corrections staff, and judges have bias, and statistics are more objective. These individual people are making decisions that create a person’s arrest record and conviction record.
  54. HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real

    world 2) Raw data (with biases in it)
  55. HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real

    world 2) Raw data (with biases in it) 3) Unbiased tool
  56. HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real

    world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number
  57. HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real

    world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life
  58. HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real

    world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life (and their family’s life)
  59. THE OPPORTUNITY Almost all data can be represented in a

    vector space. We have a precedent for de-biasing vector space.
  60. THE OPPORTUNITY Almost all data can be represented in a

    vector space. We have a precedent for de-biasing vector space. Can we de-bias risk assessment data?
  61. WHERE ARE WE NOW? o Machine learning in five minutes

    o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system ready for more fun?
  62. THE BIG POINT Raw data that is generated by humans

    as we go about our daily lives contains whatever biases we have. Machine learning algorithms learn these biases. There are possibilities for controlling for them in machine learning. Our shared humanity depends on us doing so.
  63. WHAT’S NEW WHAT’S NOT - The mask of ‘objective algorithm’

    - Awareness and ability to mitigate bias in our systems - Laws, curriculum, art, and custom have encoded biases & passed them down to future generations before. - We have a choice.
  64. SO WHAT DO WE DO? if (SCALED) if (DESTRUCTIVE_POSSIBILITIES >

    0) require(TRANSPARENT) else allow(SECRET) end end -paraphrased from Cathy O’Neil
  65. SOURCES Alexander, Michelle. The New Jim Crow: Mass Incarceration in

    the Age of Colorblindness. The New Press. 16 January 2012. Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias. ProPublica. 23 May 2016. Barry-Jester, Anna Maria, Ben Casselman, and Dana Goldstein. Should Prison Sentences Be Based on Crimes That Haven’t Been Committed Yet?. FiveThirtyEight Blog. 4 Aug 2015. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 21 July 2016. Dettmers, Tim. Deep Learning in a Nutshell: Sequence Learning. NVIDIA Blog. 7 March 2016. Ehrenfreund, Max. The Machines That Could Rid Courtrooms of Racism. Wonkblog at www.washingtonpost.com. 18 Aug 2016. Flores, Anthony, Christopher T. Lowencamp, and Kristin Bechtel. False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.” Community Resources for Justice. Giovanni, Nikki. Quilting the Black-Eyed Pea (We're Going to Mars) in Quilting the Black-Eyed Pea (Poems and Not Quite Poems). Harper Perennial. 2010. Larson, Jeff, Surya Mattu, Lauren Kirchner, and Julia Angwin. How We Analyzed the COMPAS Recidivism Algorithm. ProPublica. 23 May 2016.
  66. MORE SOURCES Matthews, Dylan. The black/white marijuana arrest gap, in

    nine charts. Wonkblog at www.washingtonpost.com. 4 June 2013. Oberoi, Pri. Optimizing Failure Through Machine Learning​ at 2016 Lesbians Who Tech New York Summit. 23 October 2016. Vera Institute of Justice. The Price of Prisons Fact Sheet January 2012. Zhang, Christie. Cathy O’Neil, author of Weapons of Math Destruction, on the dark side of big data. Los Angeles Times. 30 December 2016.