Pro Yearly is on sale from $80 to $50! »

Data Ethics: Manipulation through Big Data

Data Ethics: Manipulation through Big Data

For one-credit LIS 640 "[Big] Data Ethics."


Dorothea Salo

July 01, 2019



  2. IS MANIPULATION BAD? THINKING IT THROUGH • Manipulation: a deliberate,

    usually sustained/systematic, attempt to change someone’s beliefs and/or behavior. •Beliefs and behavior are not necessarily separable; sometimes belief change is what fuels behavior change, and vice versa. • Advertising is manipulation. Twitter and Facebook bots are manipulators. So is a recommender engine (hi, YouTube!) that leads people to ever-more- hateful content. • … But the calendar reminders I set for myself, which are innocuous, are also manipulation! Another case of “we can’t outlaw this; we have to draw lines.” • So… when is manipulation ethically okay, and when is it not? •(if you think you’re about to get bright lines from me, um… but I suspect you all know better than that by now.)
  3. SOME CONSIDERATIONS • At whose behest is the manipulation happening?

    Who benefits? Who is harmed? •I’m ethically allowed to manipulate myself! And to use tools (including tools made by others) to do so. I’m looking out for my own interests; who better? •When others are manipulating me, they have to think harder about ethics… and about people who are not them. Naked self-regard is never a good look. • How transparent is the manipulation attempt? How escapable is it? •Compare Twitter and Facebook. Twitter has been somewhat more transparent about timeline manipulation by algorithm, and it’s escapable if you know how. Facebook has neither been transparent nor offered a genuine escape hatch. •(Is Twitter off the hook ethically? Nah. The escape hatch—lists—isn’t easy.) • Are there side effects/externalities of the manipulation? •Such as otherwise-unneeded surveillance, or biased outcomes, or traveling data… • What is the goal of the manipulation? •I put this last for a reason: the ends do not always justify the means! •Still, if the goal is something putrid (e.g. voter suppression in a democracy), that puts the whole situation on the “probably unethical” side of the fence. •Watch out for skeevy self-justifying rationalization here. It’s pretty common.
  4. MANIPULATION FACILITATORS • Cognitive heuristics and other habits of thought

    •Heuristic: a cognitive shortcut we (as cognitively-limited human beings) use to keep our info-gathering and decision making from bogging down •E.g. “social proof:” we tend to do and believe what we see others around us doing and believing. •These are well-enumerated and documented by psychology research. (I like Gerd Gigerenzer’s book Simple Heuristics That Make Us Smart.) •Please don’t make the mistake of assuming heuristics are thought errors, or escapable via “rationality.” They’re not. They’re fundamentally how we think! • Ignorance and ignorant trust • Power. As always. But we’ll see some exercises of power that maybe aren’t what most people expect when I say the word.

    decisions that landed us in the manipulation morass we’re in? Why did they make the decisions they did? Are those decisions ethically justifiable? When they’re not, why not? • How do we notice situations that are liable to evolve into unacceptable manipulation while course change is still possible? •How do we keep ourselves from going-along-to-get-along? • When we do notice such situations (as insiders to them), what do we do? •I don’t have great answers for this, by the way. This is one I am definitely still working on…
  6. SO LET’S START WITH AN EASY ONE • Apps, games,

    social media manipulating kids into purchases, clicks, and time spent • At whose behest? The company’s. Who benefits? The company. •And it’s important to keep in mind that we protect kids more than adults because kids are not yet fully able to look after themselves. These companies are manipulating a known-vulnerable population! • Transparent? Not to kids. Escapable? Not readily. •See the work of the iSchool’s own Dr. Rebekah Willett regarding what kids understand about online/app marketing, data capture. Spoiler: not much. •And these companies used what psych researchers know about kids’ heuristics and thought habits to keep the kids buying and clicking. Do kids know what psych researchers know about them? Of course not, and that asymmetry is troubling. • Side effects/externalities? •What happens to these kids’ social and educational lives is not good. Plus, you know, surveillance. • Goal? If it’s anything other than “make money,” I’m not seeing it.

  8. BOTS, PROPAGANDA, MISINFORMATION • At one end, we have the

    anti-vaxxers, many of whom are sincere (though wrong) in their belief that vaccination is harmful. To what extent can we call their campaigns manipulative? Unethical? •There’s no question that their techniques and tactics often rely on heuristics manipulation (particularly social proof) and even dodgier stuff like Twitter bots and scholarly-journal manipulation. •One indicator, perhaps: protecting their own kids at the expense of others? •Another indicator: contempt for and dehumanization of opponents (and I say this knowing that contempt is one of my own bad habits!) •No conclusion, but suggestions: 1) sincerity matters but is not by itself ethically determinative, and 2) unethical tactics can be deployed in service of a goal not necessarily itself unethical • At the other end… well. By now we’ve all seen it, and its effects. I don’t need to belabor the point!
  9. SOCIAL-MEDIA PLATFORMS THEMSELVES • Between various rocks and hard places

    •Legally, in the US they are not liable for illegality in their users’ speech. As we know, though, lack of legal responsibility does not automatically imply lack of ethical responsibility! (Also, other parts of the world have different laws here: Germany, for example, forbids much more hate speech than the US, and China straitly controls political speech.) •Much too large for human moderation to work (and when they try, the toll on the moderators is absolutely horrific, another ethical consideration) •AI/ML moderation is impractical for various reasons—bias is one, of course, but in addition, the complexity of correctly parsing human speech content and context is well beyond computers presently •Private groups (e.g. on Facebook) present another ethical dilemma: used by people who reasonably want to protect themselves from observation (e.g. folks with serious illnesses), also used by terrible people to avoid consequences for their terribleness (e.g. racist, sexist, homophobic law- enforcement officers)
  10. HOW DID THEY GET INTO THIS MESS? • Terrifying “who

    could have known this would be bad?” naiveté. •E.g. The Zuck: “Only connect.” Including horrible people, Zuck? Did you think horrible people would not use Facebook, or what? •Arguably, the naiveté is a smokescreen in at least some cases for knowing and just not caring. Former Facebook employee Mark Luckie made this point forcefully about Facebook’s handling of racism. •At some point just not knowing is ethically indefensible. Social media platforms are a long way over the line. The ethical thing is to do one’s historical, psychological, sociological, economic… homework. • Designing the service and its affordances to manipulate people into spending more time there: “engagement” •This was an explicit decision, in all cases! No way to dodge responsibility for its side effects. (Which are a long story, but tl;dr: “what engages us is not always good, or good for us.”) •The lesson I want you to take from this is that design is a form of power that, like all power, must be deployed ethically and carefully.
  11. IS THAT ALL? • No. Repeated, stubborn pattern of not

    noticing problems (especially when those harmed are concentrated among marginalized or otherwise voiceless populations) and letting them fester. •Also not learning from mistakes. After many episodes of “hoovering up somebody’s email contacts includes people they don’t want contacting them,” Google still made it inescapable in Google Buzz. •Result? Someone had her stalker become her GBuzz contact without her knowledge. • When those problems become a PR problem for them: • 1) lying, 2) pulling a “gosh, who knew?” (often falsely), 3) casting blame everywhere but themselves, and 4) implementing hasty poorly-thought-out band-aid “solutions” that often leave the burden on those worst harmed. •Fixing problems costs money! Ignoring them doesn’t! • There’s a certain tendency in ethical analysis to pay attention to actions. That’s not wrong, but inaction can also be unethical.
  12. WHAT CAN WE LEARN FROM THAT? • Naïve optimism about

    human nature… doesn’t end well. •In infosec, there’s the concept of “adversarial thinking”—to defend against hackers, you have to think about how they’ll attack you and what they’ll want. •Same deal with any system involving humans, really. How can it be used for harm? How can those harms be prevented and defended against? •Key information here: patterns of past and present harms! This is one reason “why we gotta study history? ain’t no jobs in history!” grumbling irks me. • Fix problems when everything is small—because the problems are smaller and more tractable too. Scale should scare us. It needs preparing for. • Fix externalities—problems you’re creating or exacerbating, then shoving off on other people, especially those harmed.

    Sandberg, Jack Dorsey, etc… own a lot of the ethics problems here, no question about that. • “But I’m just a working stiff,” says a developer at FaceTwitGoogaZon. “It’s not MY fault! I’d have been fired if I hadn’t!” •“I was ordered to” is colloquially known as the “Nuremberg defense,” after many defendants in the international trials over the Holocaust used it. •The courts at Nuremberg soundly rejected it. Own what you do! •An early model of how people respond to conflicts with organizations they belong to is “exit, voice, loyalty.” In cases of clearly unethical behavior by the organization, loyalty is rarely the ethical choice. •Virtue ethics can be a fruitful intervention here. If you ask people “is this the kind of person you want to be?” no few will realize that… it isn’t. • Notably, (ex-)employees of e.g. Facebook and Uber are facing trouble finding new jobs, as employers question their personal ethics.
  14. SOCIAL NORMS, SOCIAL PRESSURE, SLIPPERY SLOPES • “Abilene paradox:” everybody

    goes to Abilene when nobody wants to, because nobody realizes everybody else doesn’t want to. Social proof in action! •Our implicit beliefs about the social norms around us can be wrong! •This includes beliefs about the ethics of collective endeavors. •Palliative: Explicitly ask about beliefs, often one-on-one with people. • That said, social and occupational groups absolutely can and do use social pressure to silence ethics-based dissent. Not okay! • We can also gradually traduce ourselves by not checking in with our ethics often enough, assuming somebody else will object, letting social proof rule us. •Western European fable: The Emperor’s New Clothes • The FOMOngering ethics smell runs on these phenomena. •I wish I had a quick-and-easy counter for it. I don’t.
  15. WHAT IF YOU HONESTLY DIDN’T KNOW? • This one does

    happen. •Parceling out a large system-programming job such that few working on it understand what they’re building and how it will be used? •It’s been done, including with surveillance systems (see: Google’s Project Maven, Salesforce’s cooperation with immigration enforcement). •Ethical responsibility for that is largely on the higher-ups who hid it. • You do have to ask yourself whether and how you could have found out. Wilful ignorance, as we’ve discussed, is an ethics smell. • When you find out: •If you’re still working on the project and/or for the organization, you have an exit-voice-loyalty decision to make! (Depending on the issue and its present transparency, you may also have to decide about whistleblowing. I don’t envy you.) •Virtue ethics would also suggest that even though you didn’t knowingly cause harm, you still caused harm… and likely owe amends, if you want to be your best self.
  16. CALLING OUT ETHICS FAILURES • I um. This is where

    I admit that if there’s a right way to do this, I don’t practice it and am not even sure what it might look like. •Wrong ways? Oh yeah. I know a lot about those, from experience. • It is—I will not mince words—so damn hard. Hard to do at all, hard to do well, hard to do compassionately, hard to do effectively. • And the cost can be very, very high. •I’m not even talking Edward “exiled, likely permanently” Snowden levels of cost! •Career cost, relationship cost, reputational cost, financial cost… • I’m not saying “don’t do it.” I’m saying I understand how hard it is, and because of that, I respect people who do it anyway very, very highly.

  18. THANKS! This presentation copyright 2019 by Dorothea Salo. It is

    available under a Creative Commons Attribution 4.0 International license.
  19. “PERSONALIZATION,” INFERENCE, MANIPULATION (n.b. if you’ve taken my 510 you’ve

    seen most of this; I’m fine if you skip it this time.)
  20. WHAT’S “PERSONALIZATION?” • Modifying a system’s behavior toward you based

    on what the system knows about you and your previous behavior. •In theory, and often in reality, the more the system knows about you, the more effectively (from its point of view) it can personalize. •Hi, Big Data! Hi, surveillance capitalism! • Examples: •online ads, of course, especially real-time bidding •algorithmic social-media filtering (Twitter, Facebook) •search-engine results personalization (to avoid this, use DuckDuckGo instead of Google) •most cases of “you might also like…” e.g. product recommenders, news recommenders, YouTube recommender •education: “adaptive learning,” “personalized” advising

    personalization is aimed at manipulating your behavior!!!!!!!! •Social media, for example, uses personalization to manipulate you into staying longer. Even when that’s not good for you or anyone else. • With product advertising, we understand that and we tend to be okay with it? •Arguably in many situations we shouldn’t be! For example, tricking children into overspending on games. • But what about “personalized” education? Or “personalized” news? Or “personalized” politics? •We can end up with a dangerously skewed sense of the world this way… and that can lead us to do dangerously messed-up things. •(I say “us” because no one is immune to this! Not me, not you, not anyone. That’s not how human brains work.)

    tell you: •“Ads/content tailored to your interests!” (Not… exactly. Ads/content they believe you will click on, whether it’s good for you or not. People get bamboozled into believing conspiracy theories via “tailoring.”) •“A better experience!” (Better for whom? Who really benefits here?) • What they don’t tell you is all the surveillance-capitalism stuff: •Outright data sale, without your knowledge or consent •Inferring further data (including stuff you’d rather keep private) about you •Manipulating you (especially though not exclusively financially and politically) •Rolling over on you to governments and law-enforcement agencies •Lots of other things! They won’t tell you what they’re doing! • “Own the good, ignore the bad” ethics smell: “personalization” is good, so we’ll flog it at every turn; the rest, as Hamlet said, is silence.

    if the computer discovers that students like you shouldn’t be in your major/program because they fail it a lot? •What if the computer is actually noticing your gender or your race/ethnicity, and the REAL problem is that the department offering that major behaves in racist and sexist ways? Do you trust UW-Madison to handle this correctly? • What if the computer discovers that students who eat that thing you often order at the Union don’t do well on final exams? Or have mental-health issues? •(Such a correlation would almost certainly be spurious! But would that necessarily stop UW-Madison from acting on it?) • Basically, what if the computer thinks You Are Studenting Wrong? •What is your recourse if that’s used against you? Or it’s incorrect? Or the problem is real but it’s the university’s responsibility/error, not yours?
  24. BUT IT’S OKAY IF IT’S THE TRUTH, RIGHT? • Inferences

    especially can be wrong, and often are. •Garbage (data) in, garbage (inferences) out. •Bias in, bias out. (For example, Amazon tried to infer who would be a good hire from analyzing resumes. The result? The computer marked down anything showing an applicant was female, because Amazon’s longstanding gender biases in hiring showed up in the data!) •The data we can collect—even Big Data—can never paint a whole picture. (Term of art: “availability bias”—we judge on the data we can easily collect, which is not always the best or most useful data.) •Correlation is not causation, and all the other cautions that come from a good statistics or research-methods course! • Even truths can harm you unfairly. •Ask anyone who’s dealt with discrimination based on a personal characteristic.
  25. THANKS! This presentation copyright 2019 by Dorothea Salo. It is

    available under a Creative Commons Attribution 4.0 International license.


  28. DARK PATTERNS • Just as researchers have studied our cognitive

    shortcuts, they have studied how to leverage them to manipulate us. •When this is done directly, yes, it gets significant IRB going-over. •Quite often natural experiments are feasible, however. • Who’s been paying attention? E-commerce, social media, game, and mobile-app designers. Dark patterns is a term of art for manipulative (sometimes actually deceptive) interaction/UX design patterns that operate, often unbeknownst to the user, to that user’s detriment. • It’s hard to make a case that these are ethical. •Deontologists: they’re not beneficent, fair, transparent, dignified… •Consequentialists: the consequences so far have included malinvested consumer spending, viral misinformation, harmed children… •Virtue ethicists: manipulating others is not exactly being our best selves.
  29. FUN (?) WITH DARK PATTERNS • Tricky Sites: •

    Based on research into (and catalogs of) dark patterns, Princeton CS (hi, Arvind Narayanan!) developed a dark-pattern detector for websites. •Relies on Big Data to an extent… but not Big Data about individuals. Instead, Big Data about websites and B2B web services. • Also has a nice glossary of dark patterns on the front page of the site.
  30. None


    Like search engines, recommender systems try to match users with the items most relevant to them. •“Relevant” is a weasel word, of course. It’s not always possible to assume that the system considers the users’ needs to be paramount. • Like categorization systems (thesauri, subject headings, etc), recommender systems categorize items in various ways to help with this. •Often along many different, sometimes rather surprising facets!
  33. NOT EXACTLY LIKE EITHER! • Unlike search engines, recommender systems

    don’t have to start with a text query. •They can also start with an item they know you are interested in (because you bought/read/listened to/watched/liked/5-starred/clicked on it). • Unlike categorization systems, recommender systems can spontaneously build categories. •So they’re not stuck with bad decisions and inaccurate ideas about users and how users think. They can also evolve much faster (though this is a double-edged sword). •How do they build categories? Fundamentally, through storing “this is like that” comparisons. Over time, likenesses tend to cluster. •It may not even matter WHY one thing is like another for the system to work! This is hugely different from pre-made categorization systems, which are always predicated on some kind of “why.”
  34. WHY IS THIS LIKE THAT? • Because other people similar

    to you think it is: user-user comparisons •The more the system knows about each user, the easier it is to compare users effectively. I hope your surveillance-capitalism nerves are twitching. •This isn’t limited to knowing things that are obviously correlated with whatever the site is trying to recommend. (Comparisons are social!) • Because users often pair this with that, or rate this like that: item-item comparisons • Because it belongs to one or more categories a user likes: dimensionality reduction • Item-item comparisons can be stored and analyzed without reference to people, but the other two types of comparisons require Big Data via surveillance.
  35. WHEN RECOMMENDER SYSTEMS GO BAD • Optimizing for “engagement” problematic

    for the same reasons as on social media •We get engaged by anger and self-righteousness, but over time we become acclimated to it, meaning we need more intense stimuli. Big Data analysis is capable of noticing this and feeding us ever-more-hateful stuff! •Unfortunately, as YouTube has abundantly demonstrated, a nonstop diet of this style of engagement has really, really bad individual and societal consequences. Political radicalization is just one. • One recommender does not fit all. •Some vulnerable populations, such as children, need better filters and curation compared to the rest of us. YouTube, again, either didn’t realize this, didn’t consider it important enough to instill into their recommender… or wasn't sure how to pick kids out of the mix (though I doubt that last one).

    and occasionally honorable history! •“Googlebombing:” using a canned phrase in links across many websites to make Google surface a specific site in response to that phrase. Used satirically, often. • Manipulation via “data voids,” however, is not honorable at all. •(Hat tip: danah boyd, who I believe coined the term) •Come up with a phrase that doesn’t have much Google presence. •Make a bunch of propaganda websites that use the phrase heavily. Wait for Google to index them. •Now pump the phrase hard on social media (and broadcast media if possible). When people (naturally) search for it, they’ll end up at your sites because there is nothing else! •Used for political, commercial, health-info propaganda and misinformation. • Uses people’s trust in search engines against them: poisoning the well •Worth noting that this isn’t an especially ethical way to treat the search engine.
  37. THANKS! This presentation copyright 2019 by Dorothea Salo. It is

    available under a Creative Commons Attribution 4.0 International license.