An AI with an Agenda: How Our Biases Leak Into Machine Learning (KCDC 2019)

An AI with an Agenda: How Our Biases Leak Into Machine Learning (KCDC 2019)

In the glorious AI-assisted future, all decisions are objective and perfect, and there’s no such thing as cognitive biases. That’s why we created AI and machine learning, right? Because humans can make mistakes, and computers are perfect. Well, there’s some bad news: humans make those AIs and machine learning models, and as a result humanity’s biases and missteps can subtly work their way into our AI and models.

All hope isn’t lost, though! In this talk you’ll learn how science and statistics have already solved some of these problems and how a robust awareness of cognitive biases can help with many of the rest. Come learn what else we can do to protect ourselves from these old mistakes, because we owe it to the people who’ll rely on our algorithms to deliver the best possible intelligence!

6f6662ecab8176c54c3ad89ec158842c?s=128

Arthur Doler

July 19, 2019
Tweet

Transcript

  1. Arthur Doler @arthurdoler arthurdoler@gmail.com Slides: Handout: AN AI WITH AN

    AGENDA How Our Biases Leak Into Machine Learning bit.ly/art-ai-agenda-kcdc2019 None
  2. Titanium Sponsors Platinum Sponsors Gold Sponsors

  3. LET’S ALL PLAY A GAME

  4. “THE NURSE SAID”

  5. “THE SOFTWARE ENGINEER SAID”

  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. REAL CONSEQUENCES

  15. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  16. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  17. http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/ Aylin Caliskan-Islam1 , Joanna J. Bryson1,2, and Arvind Narayanan1,

    2016
  18. http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/

  19. None
  20. SIX CLASSES OF PROBLEM WITH AI/ML

  21. None
  22. None
  23. None
  24. Class I – Phantoms of False Correlation Class II –

    Specter of Biased Sample Data Class III – Shade of Overly-Simplistic Maximization (Class IV is boring) Class V – The Simulation Surprise Class VI – Apparition of Fairness Class VII – The Feedback Devil
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. http://www.tylervigen.com/spurious-correlations - Data sources: Centers for Disease Control & Prevention

    and Internet Movie Database
  32. http://www.tylervigen.com/spurious-correlations - Data sources: National Vital Statistics Reports and U.S.

    Department of Agriculture
  33. http://www.tylervigen.com/spurious-correlations - Data sources: National Spelling Bee and Centers for

    Disease Control & Prevention
  34. None
  35. KNOW WHAT QUESTION YOU’RE ASKING UP FRONT

  36. USE CONDITIONAL PROBABILITY OVER CORRELATION

  37. https://versionone.vc/correlation-probability/

  38. None
  39. None
  40. MORTGAGE LENDING ANALYSIS

  41. None
  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

  49. Twitter - @quantoidasaurus (Used with permission)

  50. YOUR SAMPLE MIGHT NOT BE REPRESENTATIVE

  51. YOUR DATA MIGHT NOT BE REPRESENTATIVE

  52. None
  53. MODELS REPRESENT WHAT WAS THEY DON’T TELL YOU WHAT SHOULD

    BE
  54. FIND A BETTER DATA SET! CONCEPTNET.IO

  55. BUILD A BETTER DATA SET!

  56. None
  57. BEWARE SHADOW COLUMNS

  58. MAKE SURE YOUR SAMPLE SET IS REPRESENTATIVE

  59. None
  60. None
  61. IBM’S AI FAIRNESS TOOLKIT

  62. https://aif360.mybluemix.net AI FAIRNESS TOOLKIT

  63. https://aif360.mybluemix.net

  64. None
  65. None
  66. None
  67. https://aif360.mybluemix.net

  68. https://aif360.mybluemix.net

  69. https://pair-code.github.io/what-if-tool

  70. HAVE A GOOD PROCESS

  71. KEEP IN MIND YOU NEED TO KNOW WHO CAN BE

    AFFECTED IN ORDER TO UN-BIAS
  72. None
  73. PRICING ALGORITHMS

  74. Calvano, Calzolari, Denicolò and Pastorello (2018)

  75. None
  76. None
  77. Calvano, Calzolari, Denicolò and Pastorello (2018)

  78. WHAT IF AMAZON BUILT A SALARY TOOL INSTEAD?

  79. THE BRATWURST PROBLEM

  80. HUMANS ARE RARELY SINGLE-MINDED

  81. None
  82. None
  83. https://www.alexirpan.com/2018/02/14/rl-hard.html; Gu, Lillicrap, Sutskever, & Levine, 2016

  84. None
  85. MODELS REPRESENT WHAT WAS THEY DON’T TELL YOU WHAT SHOULD

    BE
  86. DON’T TRUST ALGORITHMS TO MAKE SUBTLE OR LARGE MULTI-VARIABLE JUDGEMENTS

  87. None
  88. MORE COMPLEX ALGORITHMS THAT INCLUDE OUTSIDE INFLUENCE

  89. None
  90. Lehman, Clune, & Misevic, 2018

  91. Cheney, MacCurdy, Clune, Lipson, 2013

  92. None
  93. BE READY

  94. DON’T CONFUSE THE MAP WITH THE TERRITORY

  95. VERIFY AND CHECK SOLUTIONS DERIVED FROM SIMULATION

  96. None
  97. None
  98. BUT WHAT HAPPENS WITH DIALECTAL LANGUAGE? Blodgett, Green, and O’Connor,

    2016
  99. MANY AI/ML TOOLS ARE TRAINED TO MINIMIZE AVERAGE LOSS

  100. REPRESENTATION DISPARITY Hashimoto, Srivastava, Namkoong, and Liang, 2018

  101. None
  102. CONSIDER PREDICTIVE ACCURACY AS A RESOURCE TO BE ALLOCATED Hashimoto,

    Srivastava, Namkoong, and Liang, 2018
  103. DISTRIBUTIONALLY ROBUST OPTIMIZATION Hashimoto, Srivastava, Namkoong, and Liang, 2018

  104. None
  105. LET’S BUILD A PRODUCT WITH OUR TWITTER NLP

  106. WHAT HAPPENS TO PEOPLE WHO USE DIALECT?

  107. PREDICTIVE POLICING

  108. Image via Reddit, Author user u/jakeroot

  109. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  110. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  111. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  112. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  113. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  114. Ensign, Friedler, Neville, Scheidegger, & Venkatasubramanian, 2017

  115. None
  116. IGNORE OR ADJUST FOR ALGORITHM-SUGGESTED RESULTS

  117. LOOK TO CONTROL ENGINEERING

  118. By Arturo Urquizo - http://commons.wikimedia.org/wiki/File:PID.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17633925

  119. None
  120. CLASS I - PHANTOMS OF FALSE CORRELATION Know what question

    you’re asking Trust conditional probability over straight correlation
  121. CLASS II - SPECTER OF BIASED SAMPLE DATA Recognize data

    is biased even at rest Make sure your sample set is crafted properly Excise problematic predictors, but beware their shadow columns Build a learning system that can incorporate false positives and false negatives as you find them Try using adversarial techniques to detect bias
  122. CLASS III - SHADE OF OVERLY-SIMPLISTIC MAXIMIZATION Remember models tell

    you what was, not what should be Try combining dependent columns and predicting that Try complex algorithms that allow more flexible reinforcement
  123. CLASS V – THE SIMULATION SURPRISE Don’t confuse the map

    with the territory Always reality-check solutions from simulations
  124. CLASS VI - APPARITION OF FAIRNESS Consider predictive accuracy as

    a resource to be allocated Possibly seek external auditing of results, or at least another team
  125. CLASS VII - THE FEEDBACK DEVIL Ignore or adjust for

    algorithm-suggested results Look to control engineering for potential answers
  126. None
  127. None
  128. MODELS REPRESENT WHAT WAS THEY DON’T TELL YOU WHAT SHOULD

    BE
  129. None
  130. OR GET TRAINING

  131. Bootcamps Coursera Udemy Actual Universities

  132. None
  133. AI Now Institute Georgetown Law Center on Privacy and Technology

    Knight Foundation’s AI ethics initiative fast.ai Algorithmic Justice League
  134. ABIDE BY ETHICS GUIDELINES

  135. Privacy / Consent Transparency of Use Transparency of Algorithms Ownership

  136. https://www.accenture.com/_acnmedia/PDF-24/Accenture-Universal-Principles-Data-Ethics.pdf

  137. Slides: Arthur Doler @arthurdoler arthurdoler@gmail.com Handout: bit.ly/art-ai-agenda-kcdc2019 None