Upgrade to Pro — share decks privately, control downloads, hide ads and more …

May 2017: Machines Learning Human Biases: How ...

May 2017: Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

This are slides from my talk at Self.Conference 2017. Machine learning techniques rely on some assumptions, like that the future will resemble the past, and that data is objective. Those assumptions have held up well in machine learning applications like advertising and self driving cars. But what about applications that predict a person’s future actions and use that prediction to make a big decision about that person’s life? What if we train our machine learning systems on data containing human biases that we do not want to reinforce in the future? This talk first dives into how Google's Word2Vec learns gender biases from input data, and promising work from MIT on how we can use math to 'unteach' the system these biases. It then looks at statistics-based prediction techniques used to make decisions in criminal sentencing - how racial bias comes into these systems, the risks and consequences of exacerbating that bias, and the possibility of accounting for it in a way that the systems can 'unlearn' the bias. Throughout, we consider a set of questions we must ask when applying machine learning to make decisions about one another. The audience is invited to take apply these questions to other human-focused applications such as in health, hiring, insurance, finance, education, and media.

Devney

May 20, 2017
Tweet

More Decks by Devney

Other Decks in Technology

Transcript

  1. [email protected] Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES?

    A: We give machine learning algorithms and statistical models raw data that was generated by human behavior . . . .
  2. [email protected] Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES?

    A: We give machine learning algorithms and statistical models raw data that was generated by human behavior. . . . . . . . and we humans have biases shaped in a sexist, white supremacist, *ist society.
  3. [email protected] Q: CAN WE UNTEACH THEM? A: Yes . .

    . . . . with some math, more research, and commitment to each other.
  4. [email protected] WHAT’S AT STAKE? When we make decisions about a

    person’s life based on machine learning, we risk amplifying any bias in the raw data.
  5. [email protected] WHAT’S AT STAKE? When we make decisions about a

    person’s life based on machine learning, we risk amplifying any bias in the raw data. In some applications, our shared humanity is at stake.
  6. [email protected] THE ROUTE FOR TODAY o Machine learning in five

    minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system
  7. [email protected] QUESTIONS ALONG THE WAY o Will the future resemble

    the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .
  8. [email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a

    question 2. 3. 4. 5. Hmm. . . . what movies does this internet user like?
  9. [email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a

    question 2. Find data where the answer is known 3. 4. 5. . . . like browser histories of a lot of internet users and what movies they like.
  10. [email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a

    question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. 5. y = f(x) . . . like the browser history of an internet user and what movies they like.
  11. [email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a

    question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. y = f(x) f is what we’re ‘learning’ in machine learning. There are many techniques for learning it.
  12. [email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a

    question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. ? = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.
  13. [email protected] EXAMPLE MACHINE LEARNING APPLICATIONS APPLICATION y x Self-driving vehicles

    Optimum next state of vehicle controls Sensor input, history, etc Targeted ads Ads user will click Internet history Statistics-based translation Best translation Input in starting language y = f(x)
  14. [email protected] WHERE ARE WE NOW? o Machine learning in five

    minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system having fun yet?
  15. THE WHAT . . . trained Google’s Word2Vec tool using

    Google News as raw data. Word2Vec builds a word embedding representing the relationships between words.
  16. WHAT IS A WORD EMBEDDING? Words are arranged in vector

    space so that the closer two words are, the more similar they are.
  17. WHAT IS A WORD EMBEDDING? Words are arranged in vector

    space so that the closer two words are, the more similar they are. Directions in the space can correspond to abstract concepts.
  18. [email protected] WHAT IS A WORD EMBEDDING? y = f(x) similar

    words = f(a given word) a word = f(some given words) f is the learned vector space and some cosine math
  19. BACK TO THE CURIOUS CASE In the word embedding, the

    concepts of ‘she’ and ‘he’ have directions. SHE-ALIGNED WORDS HE-ALIGNED WORDS Homemaker Guidance Counselor Housekeeper Librarian Nurse Philosopher Financier Fighter pilot Magician Architect Boss
  20. MORE OF CURIOUS CASE When applied to creating analogies, it

    came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king
  21. MORE OF THE CURIOUS CASE When applied to creating analogies,

    it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football
  22. MORE OF THE CURIOUS CASE When applied to creating analogies,

    it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football GENDER STEREOTYPES
  23. . . . . so what’s the big deal? It’s

    just reflecting the state of the world . . .
  24. [email protected] QUESTIONS ALONG THE WAY o Will the future resemble

    the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .
  25. WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT

    TO? she:he::registered nurse: physician Currently: ~91% of nurses are women, 9% men ~33% of physicians are women, ~67% men.
  26. HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,

    teenager, girl] Ummmm . . . Google News data. 65-75% of abductions are of girls. 81% of non-family abductions are of teenagers.
  27. [email protected] RAW DATA WITH LITTLE TO NO HUMAN BIAS Weather

    Outer space Geology . . . RAW DATA WITH MORE HUMAN BIAS Search terms Text Medical, criminal, educational, financial records . . .
  28. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’
  29. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs”
  30. HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)

    word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs” 5) There’s only one top spot on the search page. Who gets it - John? or Mary?
  31. HOW DO WE FIX IT? We can ‘debias’ the vector

    space with some geometry . . . . . . position a gender-neutral term like ‘computer programmer’ so that it’s equally distant from ‘she’ and ‘he.’
  32. WHAT WE HAVE LEARNED SO FAR: o Human biases enter

    raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.
  33. [email protected] WHERE ARE WE NOW? o Machine learning in five

    minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system
  34. WHAT WE’RE BRINGING WITH US FROM THE WORD2VEC CASE o

    Human biases enter raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.
  35. [email protected] THE WHAT Risk assessments such as COMPAS, which assign

    a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.
  36. [email protected] QUESTIONS THAT COME TO MIND? Risk assessments such as

    COMPAS, which assign a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.
  37. [email protected] THE WHAT y = f(x) risk of future arrest

    = f(arrest & conviction history, education, social relationships, employment, zip code)
  38. [email protected] THE WHEN - at arrest, to assign bail (y

    is risk of failing to appear) - release on parole - sentencing?
  39. [email protected] THE OPPORTUNITIES - keep low-risk people out of the

    criminal system - make decisions more consistent
  40. [email protected] THE DOUBLE-EDGED SWORDS - accuracy and errors are systematic

    - a computer program instead of a series of people making judgement calls - limited input, no individualized assessment
  41. [email protected] THE HOLY GRAIL “We are at unique time in

    history. We are being presented with the chance of a generation, and perhaps a lifetime, to reform sentencing and unwind mass incarceration in a scientific way and that opportunity is slipping away because of misinformation and misunderstanding about [risk assessment tools like COMPAS]” - Flores et al 2016.
  42. [email protected] THE CONTROVERSY Makers of COMPAS: “no racial bias.” ProPublica’s

    investigation: “significant racial disparities.” Anthony Flores, Prof of Criminology UC Bakersfield: "We found no bias of any kind when we re-analysed [ProPublica’s] data . . . We didn't necessarily disagree with their findings, we just disagreed with their conclusion."
  43. [email protected] SOURCES OF CONTROVERSY - The model and algorithm are

    considered private intellectual property. - Racial disparities in the model’s predictions reflect real racial disparities in arrests. - What is an acceptable margin in of error?
  44. [email protected] THE POINT OF AGREEMENT If black people in general

    are more likely to be re-arrested than white people, then a black defendant is more likely to be given a higher risk score than a white defendant.
  45. [email protected] THE POINT OF AGREEMENT People’s behavior is biased. People’s

    behavior produces biased raw data. Even an un-biased mathematical model trained on biased raw data produces results that perpetuate the bias.
  46. [email protected] REPHRASED FOR WHITE SUPREMACY . . . People learn

    to favor white people in our decisions . . . . . . . producing data points where white people have fewer arrests, fewer charges, fewer convictions, lighter sentences. Even an un-biased mathematical model trained on that aggregated data produces results that perpetuate the white supremacy
  47. [email protected] QUESTIONS ALONG THE WAY o Will the future resemble

    the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is bias amplified? When we apply machine learning to people . . .
  48. [email protected] WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT

    IT TO? High risk parolee -> regular parole officer -> re-arrest WITHOUT RISK ASSESSMENT
  49. [email protected] WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT

    IT TO? High risk parolee -> better parole officer -> succeeds with re-entry WITH RISK ASSESSMENT
  50. [email protected] HOW OBJECTIVE IS RAW DATA? The raw data is:

    - arrest record - conviction record - employment - education - zip code - gender - age - friends & family’s criminal history - friends & family’s substance use
  51. [email protected] HOW OBJECTIVE IS RAW DATA? The raw data is:

    - arrest record - conviction record - employment - education - zip code - gender - age - friends & family’s criminal history - friends & family’s substance use
  52. [email protected] HOW OBJECTIVE IS RAW DATA? Part of the argument

    is that individual officers, corrections staff, and judges have bias, and statistics are more objective.
  53. [email protected] HOW OBJECTIVE IS RAW DATA? Part of the argument

    is that individual officers, corrections staff, and judges have bias, and statistics are more objective. These individual people are making decisions that create a person’s arrest record and conviction record.
  54. [email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The

    real world 2) Raw data (with biases in it)
  55. [email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The

    real world 2) Raw data (with biases in it) 3) Unbiased tool
  56. [email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The

    real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number
  57. [email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The

    real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number (with decimals) 5) A decision about a person’s life
  58. [email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The

    real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life (and their family’s life)
  59. [email protected] THE OPPORTUNITY Almost all data can be represented in

    a vector space. We have a precedent for de-biasing vector space.
  60. [email protected] THE OPPORTUNITY Almost all data can be represented in

    a vector space. We have a precedent for de-biasing vector space. Can we de-bias risk assessment data?
  61. [email protected] WHERE ARE WE NOW? o Machine learning in five

    minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system Almost there . . .
  62. [email protected] THE BIG POINT Raw data that is generated by

    humans as we go about our daily lives contains whatever biases we have. Machine learning algorithms learn these biases. There are possibilities for controlling for them in machine learning. Our shared humanity depends on us doing so.
  63. [email protected] WHAT’S NEW WHAT’S NOT - The mask of ‘objective

    algorithm’ - Awareness and ability to mitigate bias in our systems - Laws, curriculum, art, and custom have encoded biases & passed them down to future generations before. - We have a choice.
  64. SO WHAT DO WE DO? if (SCALED) if (DESTRUCTIVE_POSSIBILITIES >

    0) require(TRANSPARENT) else allow(SECRET) end end -paraphrased from Cathy O’Neil
  65. SOURCES Alexander, Michelle. The New Jim Crow: Mass Incarceration in

    the Age of Colorblindness. The New Press. 16 January 2012. Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias. ProPublica. 23 May 2016. Barry-Jester, Anna Maria, Ben Casselman, and Dana Goldstein. Should Prison Sentences Be Based on Crimes That Haven’t Been Committed Yet?. FiveThirtyEight Blog. 4 Aug 2015. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 21 July 2016. Dettmers, Tim. Deep Learning in a Nutshell: Sequence Learning. NVIDIA Blog. 7 March 2016. Ehrenfreund, Max. The Machines That Could Rid Courtrooms of Racism. Wonkblog at www.washingtonpost.com. 18 Aug 2016. Flores, Anthony, Christopher T. Lowencamp, and Kristin Bechtel. False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.” Community Resources for Justice. Giovanni, Nikki. Quilting the Black-Eyed Pea (We're Going to Mars) in Quilting the Black-Eyed Pea (Poems and Not Quite Poems). Harper Perennial. 2010.
  66. MORE SOURCES Matthews, Dylan. The black/white marijuana arrest gap, in

    nine charts. Wonkblog at www.washingtonpost.com. 4 June 2013. Oberoi, Pri. Optimizing Failure Through Machine Learning​ at 2016 Lesbians Who Tech New York Summit. 23 October 2016. Vera Institute of Justice. The Price of Prisons Fact Sheet January 2012. Zhang, Christie. Cathy O’Neil, author of Weapons of Math Destruction, on the dark side of big data. Los Angeles Times. 30 December 2016.