Arthur Doler
@arthurdoler
arthurdoler@gmail.com
Slides:
Handout:
AN AI WITH AN AGENDA
How Our Biases Leak Into Machine Learning
bit.ly/art-ai-with-agenda
None
Class I – Phantoms of False Correlation
Class II – Specter of Biased Sample Data
Class III – Shade of Overly-Simplistic Maximization
Class V – The Simulation Surprise
Class VI – Apparition of Fairness
Class VII – The Feedback Devil
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
http://www.tylervigen.com/spurious-correlations - Data sources: Centers for Disease Control & Prevention and Internet Movie Database
Slide 31
Slide 31 text
http://www.tylervigen.com/spurious-correlations - Data sources: National Vital Statistics Reports and U.S. Department of Agriculture
Slide 32
Slide 32 text
http://www.tylervigen.com/spurious-correlations - Data sources: National Spelling Bee and Centers for Disease Control & Prevention
By Arturo Urquizo - http://commons.wikimedia.org/wiki/File:PID.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17633925
Slide 115
Slide 115 text
No content
Slide 116
Slide 116 text
CLASS I - PHANTOMS OF FALSE CORRELATION
Know what question you’re asking
Trust conditional probability over straight
correlation
Slide 117
Slide 117 text
CLASS II - SPECTER OF BIASED SAMPLE DATA
Recognize data is biased even at rest
Make sure your sample set is crafted properly
Excise problematic predictors, but beware their shadow columns
Build a learning system that can incorporate false positives and false
negatives as you find them
Try using adversarial techniques to detect bias
Slide 118
Slide 118 text
CLASS III - SHADE OF OVERLY-SIMPLISTIC MAXIMIZATION
Remember models tell you what was, not what
should be
Try combining dependent columns and
predicting that
Try complex algorithms that allow more flexible
reinforcement
Slide 119
Slide 119 text
CLASS V – THE SIMULATION SURPRISE
Don’t confuse the map with the territory
Always reality-check solutions from
simulations
Slide 120
Slide 120 text
CLASS VI - APPARITION OF FAIRNESS
Consider predictive accuracy as a resource
to be allocated
Possibly seek external auditing of results, or
at least another team
Slide 121
Slide 121 text
CLASS VII - THE FEEDBACK DEVIL
Ignore or adjust for algorithm-suggested
results
Look to control engineering for potential
answers
Slide 122
Slide 122 text
No content
Slide 123
Slide 123 text
No content
Slide 124
Slide 124 text
MODELS REPRESENT
WHAT WAS
THEY DON’T TELL YOU WHAT
SHOULD BE
Slide 125
Slide 125 text
No content
Slide 126
Slide 126 text
OR GET
TRAINING
Slide 127
Slide 127 text
Bootcamps
Coursera
Udemy
Actual Universities
Slide 128
Slide 128 text
No content
Slide 129
Slide 129 text
AI Now Institute
Georgetown Law Center on Privacy and
Technology
Knight Foundation’s AI ethics initiative
fast.ai
Slide 130
Slide 130 text
ABIDE BY
ETHICS GUIDELINES
Slide 131
Slide 131 text
Privacy / Consent
Transparency of Use
Transparency of Algorithms
Ownership