Surveillance capitalism

Surveillance capitalism Dorothea Salo

Why am I talking about this? ✦ Well, it does
hit the “privacy” part of the course title, of course. ✦ Fundamentally, though? Because I can’t expect that people come into this course understanding how all this works. It’s important that you do! ✦ This expectation of mine is research-based. Happy to show you the research if you’re curious. ✦ There are very strong societal discourses trying to make a large constellation of privacy violations okay with us. I am NOT okay with that! ✦ You get to decide for yourself—but I at least want you to do so with facts in mind and threat models and other litmus tests to test against.

Lecture will be shortish this week! (Yay?) ✦ You will
spend most class time this week: ✦ working through your own reactions to hypothetical (but real-world- based) situations ✦ testing those reactions against litmus tests I point you to ✦ checking those reactions against real-world situations ✦ No matter where you are on the privacy-vs.- convenience spectrum, I’m asking you to keep an open mind. ✦ I’m also asking you to think not only about yourself, but other people —people you know AND people you don’t—as well as the health of society generally. ✦ What works for you may not work for others.

“Join FaceTwitInstaZon! It’s free!” ✦ It is NOT FREE. You’re
paying with your data. ✦ Data you give the service about yourself (often PII) ✦ Data the service finds out about you by observing your behavior as you interact with it ✦ Data the service finds out about you by observing your behavior elsewhere online, through ad networks and/or other online-behavior surveillance networks ✦ Data the service can INFER (figure out) from your behavior (or, in especially creepy cases, by trying to manipulate your behavior) ✦ Data the service can buy about you from other services, from ISPs and cell-service providers, or from data clearinghouses called “data brokers” (which in turn are happy to buy data about you from services you use…)

“Big Data” ✦ When you hear “Big Data” in the
media, quite often huge conglomerations of Data About Individual People is what they mean. ✦ Just to be perfectly clear: it doesn’t have to be. Weather forecasting uses Big Data, for example, but it’s not data about people. ✦ When you hear “AI” or “machine learning” in the media, they’re talking about analyzing Big Data. ✦ This is often that “inference” step I mentioned. “We know these umpty-bajillion things about Dorothea; if we put them together and compare that to the umpty-bajillion things we know about other people who resemble her, what else can we learn about her?” ✦ Hold this thought. I will deﬁnitely come back to it.

Who collects data? ✦ Online advertising relies on data collection.
✦ “Real-time ad bidding:” 1) You visit website. 2) Website ﬁgures out who you are and what it already knows about you, and 3) sends that (plus new info about you) to an ad network asking “what ad should I show this person?” ✦ QUITE PERSONAL INFORMATION, e.g. race/ethnicity, sexuality, health status, MAY TRAVEL BETWEEN THE PARTIES HERE. ✦ Online media (including journalism) relies on online advertising as well as other kinds/uses of surveillance. ✦ Social media, mobile apps, and data brokers rely on data collection, exploitation, and (re)sale. ✦ Increasingly, workplaces, schools (K-12 through higher ed), and whole societies are gearing up to surveil their employees/students/citizens! ✦ In some cases, theoretically to “serve them better.” Some, “to justify our existence.” Some, it’s plain old naked authoritarian control.

So all data collection is online? ✦ Thanks for asking
that! … No. ✦ Your physical location over time is heavily tracked. ✦ Yes, this happens substantially through your devices… but also via camera and voice surveillance, license-plate surveillance, credit-card and other ID records, etc. ✦ Interactions you have in the physical world also leave traces in databases. ✦ Non-cash purchases, of course, but also… ✦ … through facial and voice recognition technologies. ✦ “Brick-and-mortar” stores are actively researching how to track you more.

“Wait, don’t they have to tell you first?” ✦ Sort
of, some places (GDPR!) Not in the US. ✦ In the US, this is mostly governed through contract law, which online means those Terms of Service and Privacy Policy things you never read. ✦ Don’t lie; I know you don’t read them. There isn’t enough time in the universe to read them all! Which is part of the problem here! ✦ US law is perfectly happy to let you agree to ToSes and PPs that are terrible for you. You’re an adult, you are supposed to have read it! If you didn’t and you’re hosed, OH WELL. ✦ (US law does have the concept of “contract of adhesion,” which is “somebody had you backed into a corner and forced you to sign something really bad for you.” No ToS or PP has ever, to my knowledge, been successfully challenged as a contract of adhesion.)

“But it’s all anonymized, so no big deal, right?” ✦
Wrong. Given enough data, removal of PII is meaningless. Big Data knows it’s you. ✦ Pet peeve of mine: removal of PII is not “anonymization,” but “DEIDENTIFICATION.” ✦ “ANONYMIZATION” is “ensuring that no one in the data can be identiﬁed from it”—and most security researchers believe it to be impossible. ✦ “REIDENTIFICATION” is “determining someone’s identity from deidentiﬁed data.” There are several ways to do it, and it’s often not even hard. ✦ Common weasel phrase in ToSes and PPs: “We guard your PII carefully.” ✦ What this actually means: “All the rest of the data is fair game for whatever we want to do with it and whoever we want to sell it to!”

Can you say no? ✦ Yes… and then again, no.
✦ Some (not all) data brokers allow you to opt out of their databases. ✦ They don’t have to make it easy… and they don’t. ✦ This also doesn’t necessarily mean that they don’t collect and analyze your data! ✦ Social media: only if you leave the service ✦ And possibly not even then: see Facebook’s “shadow profiles.” ✦ Ad brokers etc.: you can use ad blockers, but that’s not a total solution (though it helps). ✦ Circumventing ad blockers via such techniques as “browser/device fingerprinting” is a major research-and-development effort.

No, really, can you say no? ✦ No, because big
companies that track you run too much underlying Internet infrastructure. ✦ Did you know that Amazon makes more money from selling computing capacity than from sales? ✦ Journalist Kashmir Hill investigated this in 2019 in Gizmodo: Facebook, Apple, Microsoft, Google, Amazon. ✦ Hill’s conclusion: even if you WANT to avoid Big Tracking Corps, you pretty much can’t. They run too much infrastructure for others. ✦ (This is something you should tell people who tell you “well, if you don’t like it, just leave!”)

How did we let this happen? ✦ We like (monetarily)
free stuﬀ. Of course we do. ✦ Ignorance… combined with many of the responsible parties concealing truths or even lying to us outright about it ✦ Facebook. FAAAAAAACEBOOOOOOOOOOK. But also Google, Uber, Amazon, Twitter… the lies are legion, frankly. ✦ This kind of privacy threat doesn’t feel real to many people. ✦ Most of us would get creeped out really fast if we noticed someone following us around all day and taking notes on where we go when. ✦ But when our phones do EXACTLY THIS, and cell phone companies (or whoever) capture, store, and sell/share EXACTLY THIS, we… don’t get as creeped out, somehow? Out of sight, out of mind.

Questions? Please ask! This lecture is copyright 2019 by Dorothea
Salo. It is available under a Creative Commons Attribution 4.0 International license.

Uses and abuses of Big Data About People

What are data about you used for? ✦ What they’ll
tell you: ✦ “Ads/content tailored to your interests!” (Not… exactly. Ads/content they believe you will click on, whether it’s good for you or not. People get bamboozled into believing conspiracy theories via “tailoring.”) ✦ “A better experience!” (Whatever that even means.) ✦ What they won’t tell you: ✦ Outright data sale, without your knowledge or consent ✦ Inferring further data (including stuﬀ you’d rather keep private) about you ✦ Manipulating you (especially though not exclusively ﬁnancially and politically) ✦ Rolling over on you to governments and law-enforcement agencies ✦ Lots of other things! They won’t tell you what they’re doing! (FACEBOOK!)

Inference ✦ Once the computer shows you patterns in Big
Data, you can INFER things about individuals that they didn’t actually tell you. ✦ Real-world inference has included: gender, race/ethnicity, biological- family relationships (including ones unknown to the people involved), romantic relationships, age, country of origin, sexuality, religion, marital status, income, indebtedness, political preferences/beliefs, educational level, (lots of variations on) health status (including mental health), veteran status, pregnancy status… ✦ Computers detect patterns humans can’t. ✦ So we can never know exactly what it is in the data collected about us that will give away something we don’t want known (or don’t want used against us). WE CANNOT KNOW. And at present we have no realistic way to challenge the conclusions or stop the inferencing.

Take me, for example ✦ I happen to be a
fan of competitive figure skating. ✦ Ladies’, men’s, pairs, ice dance—I enjoy it all. ✦ I don’t think this says anything worrisome about me? But I don’t know what a computer using a Big Data store might infer based on it. ✦ I don’t know! I can’t know! I can’t even guess! ✦ And the Big Data store might get it wrong; they often do. (Many figure- skating fans are Japanese. I’m not. Does Big Data know the difference?) ✦ Maybe figure-skating fans are politically dangerous? Or unhealthy? ✦ So “I don’t have anything to worry about, so I don’t care about data collection” is wrongheaded. ✦ You can’t know what patterns might turn up that implicate you in something dangerous or unfair to you—rightly or wrongly. ✦ You can’t know when the computer gets it wrong.

But how would Big Data know? ✦ Easily. Trivially. ✦
You mostly can’t watch ﬁgure skating on broadcast TV any more. That leaves: ✦ Cable or satellite TV services, trackable by the cable/satellite companies, as well as by “smart TVs” ✦ Subscriptions to streaming sports services, traceable through my credit- card purchases, my Internet Service Provider, and my Roku (and my “smart TV” if I had one WHICH I DON’T AND WON’T) ✦ My social media (I do occasionally tweet about competitions I’m watching) ✦ Someday, my travel records (I’ve never gone to a live competition, but it’s on my bucket list to go to Skate America or US Nationals) ✦ Go ahead, think about your hobbies and how trackable they are.

How else can Big Data be used against me? ✦
Deny you opportunity ✦ Facebook patented a system to test your loanworthiness based on analysis of your friend network ✦ Colleges and universities monitor social media during application season. Students have been denied admission over social-media actions. ✦ Deny you services ✦ Health insurers want to kick people who may have expensive health situations oﬀ their rolls. They’ll do almost anything (short of violating HIPAA) to ﬁnd out if you’re in such a situation. ✦ Yes, in the US Big Data can deny you health care! ✦ Get you in legal or reputational trouble ✦ Employers, for example, also want to know if you’re a “health risk” or if you’re liable not to be the perfect employee for some reason.

But I’m just a student… ✦ What if the computer
discovers that students like you shouldn’t be in your major because they fail it a lot? ✦ What if the computer is actually noticing your gender or your race/ethnicity, and the REAL problem is that the department oﬀering that major behaves in racist and sexist ways? Do you trust UW-Madison to handle this correctly? ✦ What if the computer discovers that students who eat that thing you often order at the Union don’t do well on ﬁnal exams? Or have mental-health issues? ✦ (Such a correlation would almost certainly be spurious! But would that necessarily stop UW-Madison from acting on it?) ✦ Basically, what if the computer thinks You Are Studenting Wrong? ✦ What is your recourse if that’s used against you? Or it’s incorrect? Or the problem isn’t actually your fault, but the university’s?

But it’s okay if it’s the truth, right? ✦ Inferences
especially can be wrong, and often are. ✦ Garbage (data) in, garbage (inferences) out. ✦ Bias in, bias out. (For example, Amazon tried to infer who would be a good hire from analyzing resumes. The result? The computer marked down anything showing an applicant was female, because Amazon’s longstanding gender biases in hiring showed up in the data!) ✦ The data we can collect—even Big Data—can never paint a whole picture. (Term of art: “AVAILABILITY BIAS”—we judge on the data we can easily collect, which is not always the best or most useful data.) ✦ Correlation is not causation, and all the other cautions that come from a good statistics or research-methods course! ✦ Even truths can harm you unfairly. ✦ Ask anyone who’s dealt with discrimination based on a personal characteristic.

When does personalization become manipulation? ✦ Well, always, really. All
personalization is aimed at manipulating your behavior!!!!!!!! ✦ With product advertising, we understand that and we tend to be okay with it? ✦ Arguably in many situations we shouldn’t be! For example, Facebook tricking children into overspending on Facebook games. ✦ But what about “personalized” education? Or “personalized” news? Or “personalized” politics? ✦ We can end up with a dangerously skewed sense of the world this way… and that can lead us to do dangerously messed-up things. ✦ (I say “us” because no one is immune to this! Not me, not you, not anyone. That’s not how human brains work.)

Big Data About People and individual information security

Wait, I’m less secure because of Big Data? ✦ Yeah.
✦ One big reason: Remember how I said that attackers attack the easy way? And the easy way is often attacking people, not tech? ✦ We’ll talk more about this later. What I want you to remember NOW is that the more an attacker knows about someone, the easier it is to trick that person into trusting or obeying the attacker. ✦ E.g. 2018-19 “sextortion” email scams that rely on citing someone’s already-breached password ✦ Or the “grandparent scam:” hi grandpa, it’s little Billy, I’m in trouble, please send money! (Doesn’t work without knowing who someone’s grandkids are.)

Big Data is hackable. ✦ Any digital data is! ✦
(Same goes for unintentional leaks.) ✦ So not only can information about you be sold to whoever to be used for whatever, it can end up with ruthless bad guys for… ✦ Identity theft (with associated financial/reputational losses) ✦ Conning you into stuff you shouldn’t do ✦ Conning your loved ones into stuff they shouldn’t do, based on their belief that they’re helping you ✦ Manipulating you or your loved ones politically or financially or romantically (romance scams are big just now) ✦ “Research” into you and people like you that you wouldn’t agree to (or with), and that might even endanger you or people like you

How is data collection and analysis at this scale even
legal?

That thing where… ✦ … unlike other countries, the United
States doesn’t have broad privacy laws, only narrow “sectoral” laws like HIPAA (health info) and FERPA (student info) and ECPA (email)? ✦ That thing. That’s how this is legal. ✦ Anything that is not actually forbidden is allowed. ✦ And it’s often possible to do an end-run around existing US privacy laws. For example, my doctor can’t legally reveal my health information to you… but a health- or ﬁtness-tracking mobile app can! HIPAA only governs health-care professionals! ✦ (Later on in the course I’ll teach you how to surveil email legally in the US. It’s totally doable. Never trust email to be secure or private!)

Notice and consent ✦ NOTICE AND CONSENT: A common legal
maneuver (particularly for US-based online businesses) in which privacy concerns are considered satisﬁed if ✦ the business NOTIFIES its customers how their data will be collected and used ✦ and has them CONSENT to it somehow (e.g. via clickthrough). ✦ It doesn’t work; gives people false trust. ✦ Impenetrable legalese allowed! Vague weasel-wording allowed! Changes without renotiﬁcation allowed! That is not communication. ✦ Misleading language is allowed in the gaining-consent process. ✦ Research: many people wrongly think that the mere existence of a privacy policy means that the service does not share or sell data. WRONG! A privacy policy can absolutely say “we share and/or sell your data” and most of them do!

The sharing/sale problem ✦ You “consent” to let a service
collect data on you. ✦ Once that service sells your data to Someone Else (which you consented to let it do), Someone Else can do whatever they feel like with it. ✦ They don’t have to get your consent. They don’t even have to tell you! ✦ If that service is ever bought by Someone Else (including via bankruptcy), your original agreement is basically null and void. ✦ Data sharing and sale is ubiquitous. You have no control over it, or over the data sold about you.

“But we don’t sell your data!” ✦ Facebook, Google, others
have said this. ✦ There’s a Spanish phrase “engañar con la verdad.” ✦ To mislead [somebody] with the truth. ✦ One meaning: “We’re not selling your PII (but we are selling everything else).” ✦ Another meaning: “We’re not selling data as such; we analyze it internally and sell the results.” ✦ This is how Facebook operates: “microtargeting” of advertising. ✦ One problem is that the targeting is SO micro- that if an advertiser wants to target to you and already knows stuﬀ about you, they can. ✦ Another problem is that Facebook lets advertisers target based on PII they already have. “Target to the person with this email address… oh, and tell me what else you know about them!” And Facebook does.

Sweetheart deals: “It’s not a breach!” ✦ Sometimes they don’t
sell the data. They give it away to their “special friends.” ✦ This is how the Facebook / Cambridge Analytica thing worked. ✦ This can lead to disingenuous miscommunication: ✦ News media: “GIANT FACEBOOK / CAMBRIDGE ANALYTICA DATA BREACH!” ✦ Facebook: It wasn’t a breach! Cambridge Analytica had permission to see the data. They just, uh, um, well, kind of misused it? ✦ Cambridge Analytica: Facebook never told us we couldn’t. ✦ Entire Facebook-using world: OH, COME ON. ✦ (This is a classic contextual-integrity situation. We felt that our data got misused. Facebook and Cambridge Analytica stuck to the letter of their agreements. Pretty much nobody felt that made this okay.)

What about data ownership? ✦ Well, if all these entities
are making money collecting and analyzing your data, maybe the way to make this fair is to make clear legally that it’s YOUR data and YOU OWN IT, right? ✦ So those companies should be paying you to use it! ✦ Sounds good in theory, but in practice… ✦ It doesn’t solve the notice-and-consent problem. (You can still sign over the right to use “your” data, for free even, via a crappy Privacy Policy you didn’t even read.) ✦ It doesn’t really give you rights over how your data get (ab)used. ✦ It doesn’t stop anyone from inferring data about you based on data from other people. And is “inferred” data even yours? ✦ It doesn’t prevent most unfair uses of data about you.

Don’t these people have any ethics? ✦ Often they don’t.
Or they care about money more than they care about you, or about us. ✦ Or they believe total garbage like “technology is neutral” or “it’s just a tool (so the implications aren’t my problem)” or “anything not actually illegal is obviously okay.” ✦ (If you believe any of these things, PLEASE STOP. Learn better.) ✦ There’s serious Fear of Missing Out (“FOMO”) around Big Data collection and analysis… even in non-proﬁt sectors like public education. ✦ Yes, even in libraries. This DEVASTATES me as a librarian. I was taught to do better! But it’s true.

Surveillance capitalism

Surveillance capitalism

More Decks by Dorothea Salo

Other Decks in Technology

Featured

Transcript