Pro Yearly is on sale from $80 to $50! »

Data Ethics: Bias and Big Data

837b357dc46c47fc99560e03b8841a27?s=47 Dorothea Salo
June 24, 2019
7

Data Ethics: Bias and Big Data

For LIS 640 "[Big] Data Ethics." Content note: racial slurs used in reference to bias problems in the commonly-used WordNet training set.

837b357dc46c47fc99560e03b8841a27?s=128

Dorothea Salo

June 24, 2019
Tweet

Transcript

  1. BIAS AND BIG DATA

  2. ON THE SHOULDERS OF GIANTS • I couldn’t teach this

    course without drawing from the hard work of so many, many people. Let me acknowledge some of them here. •Ifeoma Ajunwa •danah boyd •Khiara M. Bridges •Joy Buolamwini •Casey Fiesler •Chris Gilliard •Jacob Metcalf •Arvind Narayanan •Safiya U. Noble •Ellen Pao •Irina Raicu •Latanya Sweeney •Shannon Vallor
  3. DEFINING “BIAS” • I mean, a lot of us kind

    of know it when we see it? But for our purposes… • … it’s worth thinking of it as in opposition to deontological concepts “fairness” and “beneficence.” It’s harming one class of people by systematically and causelessly treating them worse than another. •The intent of that treatment doesn’t matter (so “benevolent sexism” still fails fairness forever). It is also absolutely possible (universal!) to create and/or act on bias without being aware of it. • Bias operates along fracture lines of societal power. •(This is, of course, why “reverse racism/sexism” is not a thing.) • Virtue ethics’s opposition to bias should be fairly obvious; this is not the kind of people we want to be. • Consequentialism would note the amazingly horrific consequences (individual and society-wide) of bias. I probably don’t need to say any more than that.
  4. A NOTE ABOUT BIAS IN ETHICS • The “common good”

    is… tricky. Is there really one “common” to refer to? •When misused, “common good” analyses can boil down to “good for those who paint only their own experience as ‘common.’” Which, not coincidentally, tends to be powerful and privileged people! • Utilitarian analyses can be deeply distorted by bias. •If an ethicist doesn’t even manage to SEE a group of people who will be harmed by an action… how will they take that group into account in their ethical analysis? • Virtue ethics deployed by the profoundly unvirtuous is… yeah. •And the way in which it is… yeah… is often conditioned by unexamined or even embraced biases. • This tendency can be fought, but you have to be conscious and deliberate about it. You also have to pay attention to who’s at ethics-debating tables. •You may not entirely win the fight. (I doubt I ever will.) Fight anyway.
  5. SYSTEMATIZING BIAS AROUND BIG DATA •(I had to think about

    this taxonomy… a lot. It’s tentative. Call me on it.) • Biaswashing via Big Data’s/AI’s/ML’s supposed neutrality •That is, biasing Big Data analyses from the get-go, then justifying use of the results with “it’s the computer! the computer can’t be wrong!” • Patterns of bias in training data yielding (sometimes opaquely, sometimes obviously) biased results •AI/ML is pattern detection at heart… but it can’t tell patterns caused by bias from any other patterns it detects. •Training data may be biased by the trainers’ selection, or by bias in society at large. (Hold this thought; I’ll get back to it.) •(also, I am aware of the irony of building a taxonomy around a pattern-matching phenomenon notorious for bias!) • Implementing Big Data regimes against the powerless only •or, at least, only to START • Implementing Big Data regimes without assessing bias in results • Helping biased people act on bias
  6. EXAMPLES

  7. BIASWASHING

  8. None
  9. None
  10. None
  11. TRAINING-SET BIAS INTENTIONAL AND UN-

  12. HOW AI/ML USES BIG DATA

  13. WHERE DOES TRAINING DATA COME FROM?

  14. “TAY” CHATBOT: MICROSOFT FAILS CONSEQUENTIALISM FOREVER

  15. A BIASED WAY THIS BREAKS

  16. TRAINING-SET BIAS

  17. None
  18. None
  19. FIXABLE… MAYBE? IF THE DATASET IS TRANSPARENT.

  20. BIASED APPLICATION OF BIG DATA INTENTIONAL AND UN-

  21. None
  22. None
  23. None
  24. Like, read everything Virginia Eubanks has ever written?

  25. COMMON CLUES TO MISAPPLIED BIG DATA • Applied to Them,

    not Us, for relatively powerless values of “Them” and relatively privilege-clueless values of “Us” • Impulse purchase, likely by someone who doesn’t understand what they’re buying •I don’t understand how this happens in bureaucracies… but it does. • No testing runs of the model, no assessment for bias (or even effectiveness) at any time during purchase or implementation •(At least, that’s how it looks to me, though it’s quite possible that some implementations shrug and live with—or even welcome—the bias.) • No appeal or dispute processes •commonest when the system is intended to “efficiently” replace human labor
  26. HELPING BIASED PEOPLE ACT ON BIAS

  27. FACEBOOK AND AD TARGETING

  28. REDLINING

  29. None
  30. “BUT HUMANS ARE BIASED TOO!” • Yeah. Unquestionably true! But

    that doesn’t mean Big Data/AI/ML is the only or right fix for that problem. • Why not? •The ethics problems with surveillance, just for starters—founding a supposedly “more ethical” system on something with serious ethical problems just weirds me out. How do people think this is okay? •The inability of AI/ML to notice, much less be held (or hold itself) accountable for bias. We just do not know how to fix this at this point. (If we ever figure it out, maybe it’ll be time to reassess.) •The use of these technologies to enable, foster, and exacerbate bias
  31. THANKS! This presentation copyright 2019 by Dorothea Salo. It is

    available under a Creative Commons Attribution 4.0 International license.