hit the “privacy” part of the course title, of course. ✦ Fundamentally, though? Because I can’t expect that people come into this course understanding how all this works. It’s important that you do! ✦ This expectation of mine is research-based. Happy to show you the research if you’re curious. ✦ There are very strong societal discourses trying to make a large constellation of privacy violations okay with us. I am NOT okay with that! ✦ You get to decide for yourself—but I at least want you to do so with facts in mind and threat models and other litmus tests to test against.
spend most class time this week: ✦ working through your own reactions to hypothetical (but real-world- based) situations ✦ testing those reactions against litmus tests I point you to ✦ checking those reactions against real-world situations ✦ No matter where you are on the privacy-vs.- convenience spectrum, I’m asking you to keep an open mind. ✦ I’m also asking you to think not only about yourself, but other people —people you know AND people you don’t—as well as the health of society generally. ✦ What works for you may not work for others.
paying with your data. ✦ Data you give the service about yourself (often PII) ✦ Data the service ﬁnds out about you by observing your behavior as you interact with it ✦ Data the service ﬁnds out about you by observing your behavior elsewhere online, through ad networks and/or other online-behavior surveillance networks ✦ Data the service can INFER (ﬁgure out) from your behavior (or, in especially creepy cases, by trying to manipulate your behavior) ✦ Data the service can buy about you from other services, from ISPs and cell-service providers, or from data clearinghouses called “data brokers” (which in turn are happy to buy data about you from services you use…)
media, quite often huge conglomerations of Data About Individual People is what they mean. ✦ Just to be perfectly clear: it doesn’t have to be. Weather forecasting uses Big Data, for example, but it’s not data about people. ✦ When you hear “AI” or “machine learning” in the media, they’re talking about analyzing Big Data. ✦ This is often that “inference” step I mentioned. “We know these umpty-bajillion things about Dorothea; if we put them together and compare that to the umpty-bajillion things we know about other people who resemble her, what else can we learn about her?” ✦ Hold this thought. I will deﬁnitely come back to it.
✦ “Real-time ad bidding:” 1) You visit website. 2) Website ﬁgures out who you are and what it already knows about you, and 3) sends that (plus new info about you) to an ad network asking “what ad should I show this person?” ✦ QUITE PERSONAL INFORMATION, e.g. race/ethnicity, sexuality, health status, MAY TRAVEL BETWEEN THE PARTIES HERE. ✦ Online media (including journalism) relies on online advertising as well as other kinds/uses of surveillance. ✦ Social media, mobile apps, and data brokers rely on data collection, exploitation, and (re)sale. ✦ Increasingly, workplaces, schools (K-12 through higher ed), and whole societies are gearing up to surveil their employees/students/citizens! ✦ In some cases, theoretically to “serve them better.” Some, “to justify our existence.” Some, it’s plain old naked authoritarian control.
that! … No. ✦ Your physical location over time is heavily tracked. ✦ Yes, this happens substantially through your devices… but also via camera and voice surveillance, license-plate surveillance, credit-card and other ID records, etc. ✦ Interactions you have in the physical world also leave traces in databases. ✦ Non-cash purchases, of course, but also… ✦ … through facial and voice recognition technologies. ✦ “Brick-and-mortar” stores are actively researching how to track you more.
Wrong. Given enough data, removal of PII is meaningless. Big Data knows it’s you. ✦ Pet peeve of mine: removal of PII is not “anonymization,” but “DEIDENTIFICATION.” ✦ “ANONYMIZATION” is “ensuring that no one in the data can be identiﬁed from it”—and most security researchers believe it to be impossible. ✦ “REIDENTIFICATION” is “determining someone’s identity from deidentiﬁed data.” There are several ways to do it, and it’s often not even hard. ✦ Common weasel phrase in ToSes and PPs: “We guard your PII carefully.” ✦ What this actually means: “All the rest of the data is fair game for whatever we want to do with it and whoever we want to sell it to!”
✦ Some (not all) data brokers allow you to opt out of their databases. ✦ They don’t have to make it easy… and they don’t. ✦ This also doesn’t necessarily mean that they don’t collect and analyze your data! ✦ Social media: only if you leave the service ✦ And possibly not even then: see Facebook’s “shadow proﬁles.” ✦ Ad brokers etc.: you can use ad blockers, but that’s not a total solution (though it helps). ✦ Circumventing ad blockers via such techniques as “browser/device ﬁngerprinting” is a major research-and-development eﬀort.
companies that track you run too much underlying Internet infrastructure. ✦ Did you know that Amazon makes more money from selling computing capacity than from sales? ✦ Journalist Kashmir Hill investigated this in 2019 in Gizmodo: Facebook, Apple, Microsoft, Google, Amazon. ✦ Hill’s conclusion: even if you WANT to avoid Big Tracking Corps, you pretty much can’t. They run too much infrastructure for others. ✦ (This is something you should tell people who tell you “well, if you don’t like it, just leave!”)
free stuﬀ. Of course we do. ✦ Ignorance… combined with many of the responsible parties concealing truths or even lying to us outright about it ✦ Facebook. FAAAAAAACEBOOOOOOOOOOK. But also Google, Uber, Amazon, Twitter… the lies are legion, frankly. ✦ This kind of privacy threat doesn’t feel real to many people. ✦ Most of us would get creeped out really fast if we noticed someone following us around all day and taking notes on where we go when. ✦ But when our phones do EXACTLY THIS, and cell phone companies (or whoever) capture, store, and sell/share EXACTLY THIS, we… don’t get as creeped out, somehow? Out of sight, out of mind.
tell you: ✦ “Ads/content tailored to your interests!” (Not… exactly. Ads/content they believe you will click on, whether it’s good for you or not. People get bamboozled into believing conspiracy theories via “tailoring.”) ✦ “A better experience!” (Whatever that even means.) ✦ What they won’t tell you: ✦ Outright data sale, without your knowledge or consent ✦ Inferring further data (including stuﬀ you’d rather keep private) about you ✦ Manipulating you (especially though not exclusively ﬁnancially and politically) ✦ Rolling over on you to governments and law-enforcement agencies ✦ Lots of other things! They won’t tell you what they’re doing! (FACEBOOK!)
Data, you can INFER things about individuals that they didn’t actually tell you. ✦ Real-world inference has included: gender, race/ethnicity, biological- family relationships (including ones unknown to the people involved), romantic relationships, age, country of origin, sexuality, religion, marital status, income, indebtedness, political preferences/beliefs, educational level, (lots of variations on) health status (including mental health), veteran status, pregnancy status… ✦ Computers detect patterns humans can’t. ✦ So we can never know exactly what it is in the data collected about us that will give away something we don’t want known (or don’t want used against us). WE CANNOT KNOW. And at present we have no realistic way to challenge the conclusions or stop the inferencing.
fan of competitive ﬁgure skating. ✦ Ladies’, men’s, pairs, ice dance—I enjoy it all. ✦ I don’t think this says anything worrisome about me? But I don’t know what a computer using a Big Data store might infer based on it. ✦ I don’t know! I can’t know! I can’t even guess! ✦ And the Big Data store might get it wrong; they often do. (Many ﬁgure- skating fans are Japanese. I’m not. Does Big Data know the diﬀerence?) ✦ Maybe ﬁgure-skating fans are politically dangerous? Or unhealthy? ✦ So “I don’t have anything to worry about, so I don’t care about data collection” is wrongheaded. ✦ You can’t know what patterns might turn up that implicate you in something dangerous or unfair to you—rightly or wrongly. ✦ You can’t know when the computer gets it wrong.
You mostly can’t watch ﬁgure skating on broadcast TV any more. That leaves: ✦ Cable or satellite TV services, trackable by the cable/satellite companies, as well as by “smart TVs” ✦ Subscriptions to streaming sports services, traceable through my credit- card purchases, my Internet Service Provider, and my Roku (and my “smart TV” if I had one WHICH I DON’T AND WON’T) ✦ My social media (I do occasionally tweet about competitions I’m watching) ✦ Someday, my travel records (I’ve never gone to a live competition, but it’s on my bucket list to go to Skate America or US Nationals) ✦ Go ahead, think about your hobbies and how trackable they are.
Deny you opportunity ✦ Facebook patented a system to test your loanworthiness based on analysis of your friend network ✦ Colleges and universities monitor social media during application season. Students have been denied admission over social-media actions. ✦ Deny you services ✦ Health insurers want to kick people who may have expensive health situations oﬀ their rolls. They’ll do almost anything (short of violating HIPAA) to ﬁnd out if you’re in such a situation. ✦ Yes, in the US Big Data can deny you health care! ✦ Get you in legal or reputational trouble ✦ Employers, for example, also want to know if you’re a “health risk” or if you’re liable not to be the perfect employee for some reason.
discovers that students like you shouldn’t be in your major because they fail it a lot? ✦ What if the computer is actually noticing your gender or your race/ethnicity, and the REAL problem is that the department oﬀering that major behaves in racist and sexist ways? Do you trust UW-Madison to handle this correctly? ✦ What if the computer discovers that students who eat that thing you often order at the Union don’t do well on ﬁnal exams? Or have mental-health issues? ✦ (Such a correlation would almost certainly be spurious! But would that necessarily stop UW-Madison from acting on it?) ✦ Basically, what if the computer thinks You Are Studenting Wrong? ✦ What is your recourse if that’s used against you? Or it’s incorrect? Or the problem isn’t actually your fault, but the university’s?
especially can be wrong, and often are. ✦ Garbage (data) in, garbage (inferences) out. ✦ Bias in, bias out. (For example, Amazon tried to infer who would be a good hire from analyzing resumes. The result? The computer marked down anything showing an applicant was female, because Amazon’s longstanding gender biases in hiring showed up in the data!) ✦ The data we can collect—even Big Data—can never paint a whole picture. (Term of art: “AVAILABILITY BIAS”—we judge on the data we can easily collect, which is not always the best or most useful data.) ✦ Correlation is not causation, and all the other cautions that come from a good statistics or research-methods course! ✦ Even truths can harm you unfairly. ✦ Ask anyone who’s dealt with discrimination based on a personal characteristic.
personalization is aimed at manipulating your behavior!!!!!!!! ✦ With product advertising, we understand that and we tend to be okay with it? ✦ Arguably in many situations we shouldn’t be! For example, Facebook tricking children into overspending on Facebook games. ✦ But what about “personalized” education? Or “personalized” news? Or “personalized” politics? ✦ We can end up with a dangerously skewed sense of the world this way… and that can lead us to do dangerously messed-up things. ✦ (I say “us” because no one is immune to this! Not me, not you, not anyone. That’s not how human brains work.)
✦ One big reason: Remember how I said that attackers attack the easy way? And the easy way is often attacking people, not tech? ✦ We’ll talk more about this later. What I want you to remember NOW is that the more an attacker knows about someone, the easier it is to trick that person into trusting or obeying the attacker. ✦ E.g. 2018-19 “sextortion” email scams that rely on citing someone’s already-breached password ✦ Or the “grandparent scam:” hi grandpa, it’s little Billy, I’m in trouble, please send money! (Doesn’t work without knowing who someone’s grandkids are.)
(Same goes for unintentional leaks.) ✦ So not only can information about you be sold to whoever to be used for whatever, it can end up with ruthless bad guys for… ✦ Identity theft (with associated ﬁnancial/reputational losses) ✦ Conning you into stuﬀ you shouldn’t do ✦ Conning your loved ones into stuﬀ they shouldn’t do, based on their belief that they’re helping you ✦ Manipulating you or your loved ones politically or ﬁnancially or romantically (romance scams are big just now) ✦ “Research” into you and people like you that you wouldn’t agree to (or with), and that might even endanger you or people like you
States doesn’t have broad privacy laws, only narrow “sectoral” laws like HIPAA (health info) and FERPA (student info) and ECPA (email)? ✦ That thing. That’s how this is legal. ✦ Anything that is not actually forbidden is allowed. ✦ And it’s often possible to do an end-run around existing US privacy laws. For example, my doctor can’t legally reveal my health information to you… but a health- or ﬁtness-tracking mobile app can! HIPAA only governs health-care professionals! ✦ (Later on in the course I’ll teach you how to surveil email legally in the US. It’s totally doable. Never trust email to be secure or private!)
collect data on you. ✦ Once that service sells your data to Someone Else (which you consented to let it do), Someone Else can do whatever they feel like with it. ✦ They don’t have to get your consent. They don’t even have to tell you! ✦ If that service is ever bought by Someone Else (including via bankruptcy), your original agreement is basically null and void. ✦ Data sharing and sale is ubiquitous. You have no control over it, or over the data sold about you.
have said this. ✦ There’s a Spanish phrase “engañar con la verdad.” ✦ To mislead [somebody] with the truth. ✦ One meaning: “We’re not selling your PII (but we are selling everything else).” ✦ Another meaning: “We’re not selling data as such; we analyze it internally and sell the results.” ✦ This is how Facebook operates: “microtargeting” of advertising. ✦ One problem is that the targeting is SO micro- that if an advertiser wants to target to you and already knows stuﬀ about you, they can. ✦ Another problem is that Facebook lets advertisers target based on PII they already have. “Target to the person with this email address… oh, and tell me what else you know about them!” And Facebook does.
sell the data. They give it away to their “special friends.” ✦ This is how the Facebook / Cambridge Analytica thing worked. ✦ This can lead to disingenuous miscommunication: ✦ News media: “GIANT FACEBOOK / CAMBRIDGE ANALYTICA DATA BREACH!” ✦ Facebook: It wasn’t a breach! Cambridge Analytica had permission to see the data. They just, uh, um, well, kind of misused it? ✦ Cambridge Analytica: Facebook never told us we couldn’t. ✦ Entire Facebook-using world: OH, COME ON. ✦ (This is a classic contextual-integrity situation. We felt that our data got misused. Facebook and Cambridge Analytica stuck to the letter of their agreements. Pretty much nobody felt that made this okay.)
Or they care about money more than they care about you, or about us. ✦ Or they believe total garbage like “technology is neutral” or “it’s just a tool (so the implications aren’t my problem)” or “anything not actually illegal is obviously okay.” ✦ (If you believe any of these things, PLEASE STOP. Learn better.) ✦ There’s serious Fear of Missing Out (“FOMO”) around Big Data collection and analysis… even in non-proﬁt sectors like public education. ✦ Yes, even in libraries. This DEVASTATES me as a librarian. I was taught to do better! But it’s true.