Data Ethics: Bias and Big Data

BIAS AND BIG DATA

ON THE SHOULDERS OF GIANTS • I couldn’t teach this
course without drawing from the hard work of so many, many people. Let me acknowledge some of them here. •Ifeoma Ajunwa •danah boyd •Khiara M. Bridges •Joy Buolamwini •Casey Fiesler •Chris Gilliard •Jacob Metcalf •Arvind Narayanan •Safiya U. Noble •Ellen Pao •Irina Raicu •Latanya Sweeney •Shannon Vallor

DEFINING “BIAS” • I mean, a lot of us kind
of know it when we see it? But for our purposes… • … it’s worth thinking of it as in opposition to deontological concepts “fairness” and “beneficence.” It’s harming one class of people by systematically and causelessly treating them worse than another. •The intent of that treatment doesn’t matter (so “benevolent sexism” still fails fairness forever). It is also absolutely possible (universal!) to create and/or act on bias without being aware of it. • Bias operates along fracture lines of societal power. •(This is, of course, why “reverse racism/sexism” is not a thing.) • Virtue ethics’s opposition to bias should be fairly obvious; this is not the kind of people we want to be. • Consequentialism would note the amazingly horrific consequences (individual and society-wide) of bias. I probably don’t need to say any more than that.

A NOTE ABOUT BIAS IN ETHICS • The “common good”
is… tricky. Is there really one “common” to refer to? •When misused, “common good” analyses can boil down to “good for those who paint only their own experience as ‘common.’” Which, not coincidentally, tends to be powerful and privileged people! • Utilitarian analyses can be deeply distorted by bias. •If an ethicist doesn’t even manage to SEE a group of people who will be harmed by an action… how will they take that group into account in their ethical analysis? • Virtue ethics deployed by the profoundly unvirtuous is… yeah. •And the way in which it is… yeah… is often conditioned by unexamined or even embraced biases. • This tendency can be fought, but you have to be conscious and deliberate about it. You also have to pay attention to who’s at ethics-debating tables. •You may not entirely win the fight. (I doubt I ever will.) Fight anyway.

SYSTEMATIZING BIAS AROUND BIG DATA •(I had to think about
this taxonomy… a lot. It’s tentative. Call me on it.) • Biaswashing via Big Data’s/AI’s/ML’s supposed neutrality •That is, biasing Big Data analyses from the get-go, then justifying use of the results with “it’s the computer! the computer can’t be wrong!” • Patterns of bias in training data yielding (sometimes opaquely, sometimes obviously) biased results •AI/ML is pattern detection at heart… but it can’t tell patterns caused by bias from any other patterns it detects. •Training data may be biased by the trainers’ selection, or by bias in society at large. (Hold this thought; I’ll get back to it.) •(also, I am aware of the irony of building a taxonomy around a pattern-matching phenomenon notorious for bias!) • Implementing Big Data regimes against the powerless only •or, at least, only to START • Implementing Big Data regimes without assessing bias in results • Helping biased people act on bias

EXAMPLES

BIASWASHING

TRAINING-SET BIAS INTENTIONAL AND UN-

HOW AI/ML USES BIG DATA

WHERE DOES TRAINING DATA COME FROM?

“TAY” CHATBOT: MICROSOFT FAILS CONSEQUENTIALISM FOREVER

A BIASED WAY THIS BREAKS

TRAINING-SET BIAS

FIXABLE… MAYBE? IF THE DATASET IS TRANSPARENT.

BIASED APPLICATION OF BIG DATA INTENTIONAL AND UN-

Like, read everything Virginia Eubanks has ever written?

COMMON CLUES TO MISAPPLIED BIG DATA • Applied to Them,
not Us, for relatively powerless values of “Them” and relatively privilege-clueless values of “Us” • Impulse purchase, likely by someone who doesn’t understand what they’re buying •I don’t understand how this happens in bureaucracies… but it does. • No testing runs of the model, no assessment for bias (or even effectiveness) at any time during purchase or implementation •(At least, that’s how it looks to me, though it’s quite possible that some implementations shrug and live with—or even welcome—the bias.) • No appeal or dispute processes •commonest when the system is intended to “efficiently” replace human labor

HELPING BIASED PEOPLE ACT ON BIAS

FACEBOOK AND AD TARGETING

REDLINING

“BUT HUMANS ARE BIASED TOO!” • Yeah. Unquestionably true! But
that doesn’t mean Big Data/AI/ML is the only or right fix for that problem. • Why not? •The ethics problems with surveillance, just for starters—founding a supposedly “more ethical” system on something with serious ethical problems just weirds me out. How do people think this is okay? •The inability of AI/ML to notice, much less be held (or hold itself) accountable for bias. We just do not know how to fix this at this point. (If we ever figure it out, maybe it’ll be time to reassess.) •The use of these technologies to enable, foster, and exacerbate bias

THANKS! This presentation copyright 2019 by Dorothea Salo. It is
available under a Creative Commons Attribution 4.0 International license.

Data Ethics: Bias and Big Data

Data Ethics: Bias and Big Data

Dorothea Salo

More Decks by Dorothea Salo

Featured

Transcript

BIAS AND BIG DATA

ON THE SHOULDERS OF GIANTS • I couldn’t teach this

DEFINING “BIAS” • I mean, a lot of us kind

A NOTE ABOUT BIAS IN ETHICS • The “common good”

SYSTEMATIZING BIAS AROUND BIG DATA •(I had to think about

EXAMPLES

BIASWASHING

TRAINING-SET BIAS INTENTIONAL AND UN-

HOW AI/ML USES BIG DATA

WHERE DOES TRAINING DATA COME FROM?

“TAY” CHATBOT: MICROSOFT FAILS CONSEQUENTIALISM FOREVER

A BIASED WAY THIS BREAKS

TRAINING-SET BIAS

FIXABLE… MAYBE? IF THE DATASET IS TRANSPARENT.

BIASED APPLICATION OF BIG DATA INTENTIONAL AND UN-

Like, read everything Virginia Eubanks has ever written?

COMMON CLUES TO MISAPPLIED BIG DATA • Applied to Them,

HELPING BIASED PEOPLE ACT ON BIAS

FACEBOOK AND AD TARGETING

REDLINING

“BUT HUMANS ARE BIASED TOO!” • Yeah. Unquestionably true! But

THANKS! This presentation copyright 2019 by Dorothea Salo. It is