May 2017: Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

Machines Learning Human Biases: How Does It Happen? Can We
Unteach Them? Devney Hamilton [email protected] sassafras.coop

How Do Machine Learning Tools Perpetuate Patriarchy & Systemic Racism
and How Do We Stop It? Devney Hamilton [email protected] sassafras.coop

[email protected] Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES?
A: We give machine learning algorithms and statistical models raw data that was generated by human behavior . . . .

[email protected] Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES?
A: We give machine learning algorithms and statistical models raw data that was generated by human behavior. . . . . . . . and we humans have biases shaped in a sexist, white supremacist, *ist society.

[email protected] Q: CAN WE UNTEACH THEM? A: Yes . .
.

[email protected] Q: CAN WE UNTEACH THEM? A: Yes . .
. . . . with some math, more research, and commitment to each other.

[email protected] WHAT’S AT STAKE? When we make decisions about a
person’s life based on machine learning, we risk amplifying any bias in the raw data.

[email protected] WHAT’S AT STAKE? When we make decisions about a
person’s life based on machine learning, we risk amplifying any bias in the raw data. In some applications, our shared humanity is at stake.

[email protected] THE ROUTE FOR TODAY o Machine learning in five
minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system

[email protected] QUESTIONS ALONG THE WAY o Will the future resemble
the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .

DESTINATION Intrigued Curious Inspired Connected

[email protected] WHO ARE WE TRAVELING WITH?

[email protected] MACHINE LEARNING* IN < 5 MINUTES *aka Artificial Intelligence
including Predictive Statistical Analysis

[email protected] PREDICTIVE STATS IN < 5 MINUTES 1. Ask a
question 2. 3. 4. 5. Hmm. . . . what movies does this internet user like?

question 2. Find data where the answer is known 3. 4. 5. . . . like browser histories of a lot of internet users and what movies they like.

question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. 5. y = f(x) . . . like the browser history of an internet user and what movies they like.

question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. y = f(x) f is what we’re ‘learning’ in machine learning. There are many techniques for learning it.

question 2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. ? = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.

[email protected] EXAMPLE MACHINE LEARNING APPLICATIONS APPLICATION y x Self-driving vehicles
Targeted ads Statistics-based translation y = f(x)

[email protected] EXAMPLE MACHINE LEARNING APPLICATIONS APPLICATION y x Self-driving vehicles
Optimum next state of vehicle controls Sensor input, history, etc Targeted ads Ads user will click Internet history Statistics-based translation Best translation Input in starting language y = f(x)

[email protected] ASSUMPTIONS IN THIS PROCESS o The future will resemble
the past o Some error rate is acceptable

[email protected] WHERE ARE WE NOW? o Machine learning in five
minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system having fun yet?

[email protected] THE CURIOUS CASE OF GOOGLE’S WORD2VEC

THE WHO A research team at MIT . . .

THE WHAT . . . trained Google’s Word2Vec tool using
Google News as raw data. Word2Vec builds a word embedding representing the relationships between words.

WHAT IS A WORD EMBEDDING? Words are arranged in vector
space so that the closer two words are, the more similar they are.

BUT WHAT DOES IT LOOK LIKE?

BUT WHAT DOES IT LOOK LIKE? Kinda.

WHAT IS A WORD EMBEDDING? Words are arranged in vector
space so that the closer two words are, the more similar they are. Directions in the space can correspond to abstract concepts.

[email protected] WHAT IS A WORD EMBEDDING? y = f(x) similar
words = f(a given word) a word = f(some given words) f is the learned vector space and some cosine math

BACK TO THE CURIOUS CASE In the word embedding, the
concepts of ‘she’ and ‘he’ have directions. SHE-ALIGNED WORDS HE-ALIGNED WORDS Homemaker Guidance Counselor Housekeeper Librarian Nurse Philosopher Financier Fighter pilot Magician Architect Boss

[email protected] less gendered more gendered

MORE OF CURIOUS CASE When applied to creating analogies, it
came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king

MORE OF THE CURIOUS CASE When applied to creating analogies,
it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football

MORE OF THE CURIOUS CASE When applied to creating analogies,
it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football GENDER STEREOTYPES

. . . . so what’s the big deal? It’s
just reflecting the state of the world . . .

[email protected] ASSUMPTIONS IN MACHINE LEARNING o The future will resemble
the past o Some error rate is acceptable

the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .

WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT
TO? she:he::registered nurse: physician Currently: ~91% of nurses are women, 9% men ~33% of physicians are women, ~67% men.

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,
teenager, girl]

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, attempted abduction,
teenager, girl] Ummmm . . .

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,
teenager, girl] Ummmm . . . Google News data. 65-75% of abductions are of girls. 81% of non-family abductions are of teenagers.

[email protected] RAW DATA WITH LITTLE TO NO HUMAN BIAS Weather
Outer space Geology . . . RAW DATA WITH MORE HUMAN BIAS Search terms Text Medical, criminal, educational, financial records . . .

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)
word embeddings improve search results

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs”

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs” 5) There’s only one top spot on the search page. Who gets it - John? or Mary?

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer

HOW DO WE FIX IT?

HOW DO WE FIX IT? MATH!

HOW DO WE FIX IT? We can ‘debias’ the vector
space with some geometry . . . . . . position a gender-neutral term like ‘computer programmer’ so that it’s equally distant from ‘she’ and ‘he.’

[email protected] less gendered more gendered

WHAT WE HAVE LEARNED SO FAR: o Human biases enter
raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.

minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system

[email protected] HIGH STAKES PREDICTIVE STATISTICS IN THE CRIMINAL SYSTEM

WHAT WE’RE BRINGING WITH US FROM THE WORD2VEC CASE o
Human biases enter raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.

[email protected] THE WHAT Risk assessments such as COMPAS, which assign
a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.

[email protected]

[email protected] QUESTIONS THAT COME TO MIND? Risk assessments such as
COMPAS, which assign a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.

[email protected] THE WHAT y = f(x) risk of future arrest
= f(arrest & conviction history, education, social relationships, employment, zip code)

[email protected] THE WHEN - at arrest, to assign bail (y
is risk of failing to appear) - release on parole - sentencing?

[email protected] THE OPPORTUNITIES - keep low-risk people out of the
criminal system - make decisions more consistent

[email protected] THE DOUBLE-EDGED SWORDS - accuracy and errors are systematic
- a computer program instead of a series of people making judgement calls - limited input, no individualized assessment

[email protected] THE HOLY GRAIL

[email protected] THE HOLY GRAIL “We are at unique time in
history. We are being presented with the chance of a generation, and perhaps a lifetime, to reform sentencing and unwind mass incarceration in a scientific way and that opportunity is slipping away because of misinformation and misunderstanding about [risk assessment tools like COMPAS]” - Flores et al 2016.

[email protected] THE CONTROVERSY Makers of COMPAS: “no racial bias.” ProPublica’s
investigation: “significant racial disparities.” Anthony Flores, Prof of Criminology UC Bakersfield: "We found no bias of any kind when we re-analysed [ProPublica’s] data . . . We didn't necessarily disagree with their findings, we just disagreed with their conclusion."

[email protected] SOURCES OF CONTROVERSY - The model and algorithm are
considered private intellectual property. - Racial disparities in the model’s predictions reflect real racial disparities in arrests. - What is an acceptable margin in of error?

[email protected] THE POINT OF AGREEMENT If black people in general
are more likely to be re-arrested than white people, then a black defendant is more likely to be given a higher risk score than a white defendant.

[email protected]

[email protected] THE POINT OF AGREEMENT People’s behavior is biased.

[email protected] THE POINT OF AGREEMENT People’s behavior is biased. People’s
behavior produces biased raw data.

[email protected] THE POINT OF AGREEMENT People’s behavior is biased. People’s
behavior produces biased raw data. Even an un-biased mathematical model trained on biased raw data produces results that perpetuate the bias.

[email protected] REPHRASED FOR WHITE SUPREMACY . . . People learn
to favor white people in our decisions . . . . . . . producing data points where white people have fewer arrests, fewer charges, fewer convictions, lighter sentences. Even an un-biased mathematical model trained on that aggregated data produces results that perpetuate the white supremacy

the past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is bias amplified? When we apply machine learning to people . . .

[email protected] WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT
IT TO?

IT TO? High risk parolee -> regular parole officer -> re-arrest WITHOUT RISK ASSESSMENT

IT TO? High risk parolee -> better parole officer -> succeeds with re-entry WITH RISK ASSESSMENT

[email protected] HOW OBJECTIVE IS RAW DATA?

[email protected] HOW OBJECTIVE IS RAW DATA? The raw data is:
- arrest record - conviction record - employment - education - zip code - gender - age - friends & family’s criminal history - friends & family’s substance use

[email protected] HOW OBJECTIVE IS RAW DATA? Part of the argument
is that individual officers, corrections staff, and judges have bias, and statistics are more objective.

[email protected] HOW OBJECTIVE IS RAW DATA? Part of the argument
is that individual officers, corrections staff, and judges have bias, and statistics are more objective. These individual people are making decisions that create a person’s arrest record and conviction record.

[email protected] IS THERE AN ACCEPTABLE MARGIN OF ERROR?

[email protected] IS THERE AN ACCEPTABLE MARGIN OF ERROR? - False
negatives okay?

[email protected] IS THERE AN ACCEPTABLE MARGIN OF ERROR? - False
negatives okay? - False positives okay?

[email protected] IS MORE DATA ALWAYS BETTER?

[email protected] IS MORE DATA ALWAYS BETTER? More samples -> usually.

[email protected] IS MORE DATA ALWAYS BETTER? More samples -> usually.
More attributes -> usually not.

[email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED?

[email protected] HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The
real world

real world 2) Raw data (with biases in it)

real world 2) Raw data (with biases in it) 3) Unbiased tool

real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number

real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number (with decimals) 5) A decision about a person’s life

real world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life (and their family’s life)

[email protected] THE OPPORTUNITY Almost all data can be represented in
a vector space.

a vector space. We have a precedent for de-biasing vector space.

a vector space. We have a precedent for de-biasing vector space. Can we de-bias risk assessment data?

minutes o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system Almost there . . .

[email protected] THE BIG POINT Raw data that is generated by
humans as we go about our daily lives contains whatever biases we have. Machine learning algorithms learn these biases. There are possibilities for controlling for them in machine learning. Our shared humanity depends on us doing so.

[email protected] WHAT’S NEW WHAT’S NOT - The mask of ‘objective
algorithm’ - Awareness and ability to mitigate bias in our systems - Laws, curriculum, art, and custom have encoded biases & passed them down to future generations before. - We have a choice.

SO WHAT DO WE DO? if (SCALED) if (DESTRUCTIVE_POSSIBILITIES >
0) require(TRANSPARENT) else allow(SECRET) end end -paraphrased from Cathy O’Neil

WE HAVE A CHOICE

DESTINATION Intrigued Curious Inspired Connected

SOURCES Alexander, Michelle. The New Jim Crow: Mass Incarceration in
the Age of Colorblindness. The New Press. 16 January 2012. Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias. ProPublica. 23 May 2016. Barry-Jester, Anna Maria, Ben Casselman, and Dana Goldstein. Should Prison Sentences Be Based on Crimes That Haven’t Been Committed Yet?. FiveThirtyEight Blog. 4 Aug 2015. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 21 July 2016. Dettmers, Tim. Deep Learning in a Nutshell: Sequence Learning. NVIDIA Blog. 7 March 2016. Ehrenfreund, Max. The Machines That Could Rid Courtrooms of Racism. Wonkblog at www.washingtonpost.com. 18 Aug 2016. Flores, Anthony, Christopher T. Lowencamp, and Kristin Bechtel. False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.” Community Resources for Justice. Giovanni, Nikki. Quilting the Black-Eyed Pea (We're Going to Mars) in Quilting the Black-Eyed Pea (Poems and Not Quite Poems). Harper Perennial. 2010.

MORE SOURCES Matthews, Dylan. The black/white marijuana arrest gap, in
nine charts. Wonkblog at www.washingtonpost.com. 4 June 2013. Oberoi, Pri. Optimizing Failure Through Machine Learning at 2016 Lesbians Who Tech New York Summit. 23 October 2016. Vera Institute of Justice. The Price of Prisons Fact Sheet January 2012. Zhang, Christie. Cathy O’Neil, author of Weapons of Math Destruction, on the dark side of big data. Los Angeles Times. 30 December 2016.

[email protected]

[email protected] THANK YOU! Join #talk-machine-learning on Slack for links!

May 2017: Machines Learning Human Biases: How ...

May 2017: Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

More Decks by Devney

Other Decks in Technology

Featured

Transcript