Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

DEVNEY HAMILTON Machines Learning Human Biases: How Does It Happen?
Can We Unteach Them?

Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES? A:
We give machine learning algorithms and statistical models raw data that was generated by human behavior . . .

Q: HOW DO MACHINE LEARNING TOOLS LEARN HUMAN BIASES? A:
We give machine learning algorithms and statistical models raw data that was generated by human behavior . . . . . . humans that have biases.

Q: CAN WE UNTEACH THEM? A: Yes . . .

Q: CAN WE UNTEACH THEM? A: Yes . . .
. . . with some math, more research, and commitment to each other.

WHAT’S AT STAKE? When we make decisions about a person’s
life based on machine learning, we risk amplifying any bias in the raw data.

WHAT’S AT STAKE? When we make decisions about a person’s
life based on machine learning, we risk amplifying any bias in the raw data. In some applications, our shared humanity is at stake.

THE ROUTE FOR TODAY o Machine learning in five minutes
o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system

QUESTIONS ALONG THE WAY o Will the future resemble the
past? Do we want it to? o How objective is raw data? o What is the acceptable margin of error? o Is more data always better? o How and when is potential bias amplified? When we apply machine learning to people . . .

DESTINATION Intrigued Curious Inspired Connected

WHO ARE WE TRAVELING WITH?

MACHINE LEARNING* IN < 5 MINUTES *aka Artificial Intelligence including
Predictive Statistical Analysis

PREDICTIVE STATS IN < 5 MINUTES 1. Ask a question
2. 3. 4. 5. Hmm. . . . what movies does this internet user like?

2. Find data where the answer is known 3. 4. 5. . . . like browser histories of a lot of internet users and what movies they like.

2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. 5. y = f(x) . . . like the browser history of an internet user and what movies they like.

2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. y = f(x) f is what we’re ‘learning’ in machine learning. There are many techniques for learning it.

2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. ? = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.

2. Find data where the answer is known 3. Call each answer ‘y’ and the other data ‘x’ 4. Find a function f good at predicting y given x 5. In the future when you only know x, compute f(x) to reasonably predict y. y = f(x) Say I have a new internet user’s browser history. I can use the f I learned to make a reasonable prediction of what movies they like.

EXAMPLE MACHINE LEARNING APPLICATIONS APPLICATION y x Self-driving vehicles Optimum
next state of vehicle controls Sensor input, history, etc Targeted ads Ads user will click Internet history Statistics-based translation Best translation Input in starting language y = f(x)

ASSUMPTIONS IN THIS PROCESS o The future will resemble the
past o Some error rate is acceptable

WHERE ARE WE NOW? o Machine learning in five minutes
o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system having fun yet?

THE CURIOUS CASE OF GOOGLE’S WORD2VEC

THE WHO A research team at MIT . . .

THE WHAT . . . trained Google’s Word2Vec tool using
Google News as raw data. Word2Vec builds a word embedding representing the relationships between words.

WHAT IS A WORD EMBEDDING? Words are arranged in vector
space so that the closer two words are, the more similar they are.

BUT WHAT DOES IT LOOK LIKE?

BUT WHAT DOES IT LOOK LIKE? Kinda.

WHAT IS A WORD EMBEDDING? Words are arranged in vector
space so that the closer two words are, the more similar they are. Directions in the space can correspond to abstract concepts.

WHAT IS A WORD EMBEDDING? y = f(x) similar words
= f(a given word) a word = f(some given words) f is the learned vector space and some cosine math

BACK TO THE CURIOUS CASE In the word embedding, the
concepts of ‘she’ and ‘he’ have directions. SHE-ALIGNED WORDS HE-ALIGNED WORDS Homemaker Guidance Counselor Housekeeper Librarian Nurse Philosopher Financier Fighter pilot Magician Architect Boss

less gendered more gendered

MORE OF CURIOUS CASE When applied to creating analogies, it
came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king

MORE OF THE CURIOUS CASE When applied to creating analogies,
it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football

MORE OF THE CURIOUS CASE When applied to creating analogies,
it came up with analogies reflecting: GENDER SEMANTICS IN LANGUAGE she:he::mother:father she:he::convent:monastery she:he::queen:king she:he::sewing:carpentry she:he::nurse:surgeon she:he::volleyball:football GENDER STEREOTYPES

. . . . so what’s the big deal? It’s
just reflecting the state of the world . . .

ASSUMPTIONS IN MACHINE LEARNING o The future will resemble the
past o Some error rate is acceptable

WILL THE FUTURE RESEMBLE THE PAST? DO WE WANT IT
TO? she:he::registered nurse: physician Currently: ~91% of nurses are women, 9% men ~33% of physicians are women, ~67% men.

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,
teenager, girl]

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, attempted abduction,
teenager, girl] Ummmm . . .

HOW OBJECTIVE IS OUR RAW DATA? king:queen:man: [woman, Attempted abduction,
teenager, girl] Ummmm . . . Google News data. 65-75% of abductions are of girls. 81% of non-family abductions are of teenagers.

RAW DATA WITH LITTLE TO NO HUMAN BIAS Weather Outer
space Geology . . . RAW DATA WITH MORE HUMAN BIAS Search terms Text Medical, criminal, educational, financial records . . .

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer 1)
word embeddings improve search results

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs”

word embeddings improve search results 2) ‘computer science’ aligns with stereotypically male names like ‘John’ 3) Grad student pages are often identical except for names 4) You search “CMU Computer Science PhDs” 5) There’s only one top spot on the search page. Who gets it - John? or Mary?

HOW AND WHEN IS POTENTIAL BIAS AMPLIFIED? she:he::homemaker:computer programmer

HOW DO WE FIX IT?

HOW DO WE FIX IT? MATH!

HOW DO WE FIX IT? We can ‘debias’ the vector
space with some geometry . . . . . . position a gender-neutral term like ‘computer programmer’ so that it’s equally distant from ‘she’ and ‘he.’

less gendered more gendered

WHAT WE HAVE LEARNED SO FAR: o Human biases enter
raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.

o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system

HIGH STAKES PREDICTIVE STATISTICS IN THE CRIMINAL SYSTEM

WHAT WE’RE BRINGING WITH US FROM THE WORD2VEC CASE o
Human biases enter raw data generated by humans. o Biases in raw data can be ‘learned’ by a machine learning algorithm o If those biases are represented in vector space, there’s a way to remedy that.

THE WHAT Risk assessments such as COMPAS, which assign a
risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.

PAUSE - QUESTIONS THAT COME TO MIND? Risk assessments such
as COMPAS, which assign a risk score representing the risk that someone in the criminal system will be arrested and/or convicted again in the future.

THE WHAT y = f(x) risk of future arrest =
f(arrest & conviction history, education, social relationships, employment, zip code)

THE WHEN - arrest, to assign bail (y is risk
of failing to appear) - release on parole - sentencing?

THE OPPORTUNITIES - keep low-risk people out of the criminal
system - make decisions more consistent

THE DOUBLE-EDGED SWORDS - accuracy and errors are systematic -
a computer program instead of a series of people making judgement calls - limited input, no individualized assessment

THE HOLY GRAIL

THE HOLY GRAIL “We are at unique time in history.
We are being presented with the chance of a generation, and perhaps a lifetime, to reform sentencing and unwind mass incarceration in a scientific way and that opportunity is slipping away because of misinformation and misunderstanding about [risk assessment tools like COMPAS]” - Flores et al 2016.

THE CONTROVERSY Makers of COMPAS: “no racial bias.” ProPublica’s investigation:
“significant racial disparities.” Anthony Flores, Prof of Criminology UC Bakersfield: "We found no bias of any kind when we re-analysed [ProPublica’s] data . . . We didn't necessarily disagree with their findings, we just disagreed with their conclusion."

SOURCES OF CONTROVERSY - The model and algorithm are considered
private intellectual property. - Racial disparities in the model’s predictions reflect real racial disparities in arrests. - What is an acceptable margin in of error?

THE POINT OF AGREEMENT If black people in general are
more likely to be re- arrested, then a black defendant is more likely to be given a higher risk score.

THE POINT OF AGREEMENT People’s behavior is biased.

THE POINT OF AGREEMENT People’s behavior is biased. People’s behavior
produces biased raw data.

THE POINT OF AGREEMENT People’s behavior is biased. People’s behavior
produces biased raw data. Even an un-biased mathematical model trained on biased raw data produces results that perpetuate the bias.

TO?

TO? High risk parolee -> regular parole officer -> re-arrest WITHOUT RISK ASSESSMENT

TO? High risk parolee -> better parole officer -> succeeds with re-entry WITH RISK ASSESSMENT

HOW OBJECTIVE IS RAW DATA?

HOW OBJECTIVE IS RAW DATA? The raw data is: -
arrest record - conviction record - employment - education - zip code - gender - age - friends & family’s criminal history - friends & family’s substance use

HOW OBJECTIVE IS RAW DATA? Part of the argument is
that individual officers, corrections staff, and judges have bias, and statistics are more objective. These individual people are making decisions that create a person’s arrest record and conviction record.

IS THERE AN ACCEPTABLE MARGIN OF ERROR?

IS THERE AN ACCEPTABLE MARGIN OF ERROR? - False negatives
okay?

IS THERE AN ACCEPTABLE MARGIN OF ERROR? - False negatives
okay? - False positives okay?

IS MORE DATA ALWAYS BETTER?

IS MORE DATA ALWAYS BETTER? More samples -> usually.

IS MORE DATA ALWAYS BETTER? More samples -> usually. More
attributes -> usually not.

HOW AND WHEN MIGHT BIAS BE AMPLIFIED?

HOW AND WHEN MIGHT BIAS BE AMPLIFIED? 1) The real
world

world 2) Raw data (with biases in it)

world 2) Raw data (with biases in it) 3) Unbiased tool

world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number

world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life

world 2) Raw data (with biases in it) 3) Unbiased tool 4) Some real number 5) A decision about a person’s life (and their family’s life)

THE OPPORTUNITY Almost all data can be represented in a
vector space.

vector space. We have a precedent for de-biasing vector space.

vector space. We have a precedent for de-biasing vector space. Can we de-bias risk assessment data?

o The curious case of Google’s Word2Vec o High stakes predictive statistics in the US criminal system ready for more fun?

THE BIG POINT Raw data that is generated by humans
as we go about our daily lives contains whatever biases we have. Machine learning algorithms learn these biases. There are possibilities for controlling for them in machine learning. Our shared humanity depends on us doing so.

WHAT’S NEW WHAT’S NOT - The mask of ‘objective algorithm’
- Awareness and ability to mitigate bias in our systems - Laws, curriculum, art, and custom have encoded biases & passed them down to future generations before. - We have a choice.

SO WHAT DO WE DO? if (SCALED) if (DESTRUCTIVE_POSSIBILITIES >
0) require(TRANSPARENT) else allow(SECRET) end end -paraphrased from Cathy O’Neil

WE HAVE A CHOICE

DESTINATION Intrigued Curious Inspired Connected

SOURCES Alexander, Michelle. The New Jim Crow: Mass Incarceration in
the Age of Colorblindness. The New Press. 16 January 2012. Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias. ProPublica. 23 May 2016. Barry-Jester, Anna Maria, Ben Casselman, and Dana Goldstein. Should Prison Sentences Be Based on Crimes That Haven’t Been Committed Yet?. FiveThirtyEight Blog. 4 Aug 2015. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 21 July 2016. Dettmers, Tim. Deep Learning in a Nutshell: Sequence Learning. NVIDIA Blog. 7 March 2016. Ehrenfreund, Max. The Machines That Could Rid Courtrooms of Racism. Wonkblog at www.washingtonpost.com. 18 Aug 2016. Flores, Anthony, Christopher T. Lowencamp, and Kristin Bechtel. False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.” Community Resources for Justice. Giovanni, Nikki. Quilting the Black-Eyed Pea (We're Going to Mars) in Quilting the Black-Eyed Pea (Poems and Not Quite Poems). Harper Perennial. 2010. Larson, Jeff, Surya Mattu, Lauren Kirchner, and Julia Angwin. How We Analyzed the COMPAS Recidivism Algorithm. ProPublica. 23 May 2016.

MORE SOURCES Matthews, Dylan. The black/white marijuana arrest gap, in
nine charts. Wonkblog at www.washingtonpost.com. 4 June 2013. Oberoi, Pri. Optimizing Failure Through Machine Learning at 2016 Lesbians Who Tech New York Summit. 23 October 2016. Vera Institute of Justice. The Price of Prisons Fact Sheet January 2012. Zhang, Christie. Cathy O’Neil, author of Weapons of Math Destruction, on the dark side of big data. Los Angeles Times. 30 December 2016.

THANK YOU!

Machines Learning Human Biases: How Does It Hap...

Machines Learning Human Biases: How Does It Happen? Can We Unteach Them?

More Decks by Devney

Other Decks in Technology

Featured

Transcript