An AI with an Agenda: How Our Biases Leak Into Machine Learning (NDC Minnesota 2019)

Arthur Doler @arthurdoler [email protected] Slides: Handout: AN AI WITH AN
AGENDA How Our Biases Leak Into Machine Learning bit.ly/art-ai-with-agenda None

LET’S ALL PLAY A GAME

“THE NURSE SAID”

“THE SOFTWARE ENGINEER SAID”

REAL CONSEQUENCES

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/ Aylin Caliskan-Islam1 , Joanna J. Bryson1,2, and Arvind Narayanan1,
2016

http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/

SIX CLASSES OF PROBLEM WITH AI/ML

Class I – Phantoms of False Correlation Class II –
Specter of Biased Sample Data Class III – Shade of Overly-Simplistic Maximization Class V – The Simulation Surprise Class VI – Apparition of Fairness Class VII – The Feedback Devil

http://www.tylervigen.com/spurious-correlations - Data sources: Centers for Disease Control & Prevention
and Internet Movie Database

http://www.tylervigen.com/spurious-correlations - Data sources: National Vital Statistics Reports and U.S.
Department of Agriculture

http://www.tylervigen.com/spurious-correlations - Data sources: National Spelling Bee and Centers for
Disease Control & Prevention

KNOW WHAT QUESTION YOU’RE ASKING UP FRONT

USE CONDITIONAL PROBABILITY OVER CORRELATION

https://versionone.vc/correlation-probability/

MORTGAGE LENDING ANALYSIS

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

Twitter - @quantoidasaurus (Used with permission)

YOUR SAMPLE MIGHT NOT BE REPRESENTATIVE

YOUR DATA MIGHT NOT BE REPRESENTATIVE

MODELS REPRESENT WHAT WAS THEY DON’T TELL YOU WHAT SHOULD
BE

FIND A BETTER DATA SET! CONCEPTNET.IO

BUILD A BETTER DATA SET!

BEWARE SHADOW COLUMNS

MAKE SURE YOUR SAMPLE SET IS REPRESENTATIVE

IBM’S AI FAIRNESS TOOLKIT

https://aif360.mybluemix.net AI FAIRNESS TOOLKIT

https://aif360.mybluemix.net

https://pair-code.github.io/what-if-tool

HAVE A GOOD PROCESS

KEEP IN MIND YOU NEED TO KNOW WHO CAN BE
AFFECTED IN ORDER TO UN-BIAS

PRICING ALGORITHMS

Calvano, Calzolari, Denicolò and Pastorello (2018)

WHAT IF AMAZON BUILT A SALARY TOOL INSTEAD?

THE BRATWURST PROBLEM

HUMANS ARE RARELY SINGLE-MINDED

https://www.alexirpan.com/2018/02/14/rl-hard.html; Gu, Lillicrap, Sutskever, & Levine, 2016

BE

DON’T TRUST ALGORITHMS TO MAKE SUBTLE OR LARGE MULTI-VARIABLE JUDGEMENTS

MORE COMPLEX ALGORITHMS THAT INCLUDE OUTSIDE INFLUENCE

Lehman, Clune, & Misevic, 2018

Cheney, MacCurdy, Clune, Lipson, 2013

BE READY

DON’T CONFUSE THE MAP WITH THE TERRITORY

VERIFY AND CHECK SOLUTIONS DERIVED FROM SIMULATION

BUT WHAT HAPPENS WITH DIALECTAL LANGUAGE? Blodgett, Green, and O’Connor,
2016

MANY AI/ML TOOLS ARE TRAINED TO MINIMIZE AVERAGE LOSS

REPRESENTATION DISPARITY Hashimoto, Srivastava, Namkoong, and Liang, 2018

CONSIDER PREDICTIVE ACCURACY AS A RESOURCE TO BE ALLOCATED Hashimoto,
Srivastava, Namkoong, and Liang, 2018

DISTRIBUTIONALLY ROBUST OPTIMIZATION Hashimoto, Srivastava, Namkoong, and Liang, 2018

LET’S BUILD A PRODUCT WITH OUR TWITTER NLP

WHAT HAPPENS TO PEOPLE WHO USE DIALECT?

PREDICTIVE POLICING

Image via Reddit, Author user u/jakeroot

Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

IGNORE OR ADJUST FOR ALGORITHM-SUGGESTED RESULTS

LOOK TO CONTROL ENGINEERING

By Arturo Urquizo - http://commons.wikimedia.org/wiki/File:PID.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17633925

CLASS I - PHANTOMS OF FALSE CORRELATION Know what question
you’re asking Trust conditional probability over straight correlation

CLASS II - SPECTER OF BIASED SAMPLE DATA Recognize data
is biased even at rest Make sure your sample set is crafted properly Excise problematic predictors, but beware their shadow columns Build a learning system that can incorporate false positives and false negatives as you find them Try using adversarial techniques to detect bias

CLASS III - SHADE OF OVERLY-SIMPLISTIC MAXIMIZATION Remember models tell
you what was, not what should be Try combining dependent columns and predicting that Try complex algorithms that allow more flexible reinforcement

CLASS V – THE SIMULATION SURPRISE Don’t confuse the map
with the territory Always reality-check solutions from simulations

CLASS VI - APPARITION OF FAIRNESS Consider predictive accuracy as
a resource to be allocated Possibly seek external auditing of results, or at least another team

CLASS VII - THE FEEDBACK DEVIL Ignore or adjust for
algorithm-suggested results Look to control engineering for potential answers

BE

OR GET TRAINING

Bootcamps Coursera Udemy Actual Universities

AI Now Institute Georgetown Law Center on Privacy and Technology
Knight Foundation’s AI ethics initiative fast.ai

ABIDE BY ETHICS GUIDELINES

Privacy / Consent Transparency of Use Transparency of Algorithms Ownership

https://www.accenture.com/_acnmedia/PDF-24/Accenture-Universal-Principles-Data-Ethics.pdf

Slides: Arthur Doler @arthurdoler [email protected] Handout: bit.ly/art-ai-with-agenda None

An AI with an Agenda: How Our Biases Leak Into ...

An AI with an Agenda: How Our Biases Leak Into Machine Learning (NDC Minnesota 2019)

More Decks by Arthur Doler

Other Decks in Technology

Featured

Transcript