Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The surveillance industrial complex: tracking, ML/AI, Big Data

The surveillance industrial complex: tracking, ML/AI, Big Data

For Computer Science 202 "Introduction to Computing," summer 2021. Includes an extended discussion of learning analytics and Canvas tracking.

837b357dc46c47fc99560e03b8841a27?s=128

Dorothea Salo

July 21, 2021
Tweet

Transcript

  1. The surveillance industrial complex: tracking, ML/AI, Big Data CS 202

  2. Knowledge is power! • I actually hate that cliché, being

    in the knowledge business. Nothing’s that simple. • But I do want to say out loud that knowledge about you translates into power over you. • And the other way around, too! It is NOT COINCIDENCE that areas with a lot of people of color in them are heavily oversurveilled in the US! It’s a power play! • It’s also not coincidence that you students are more heavily surveilled by UW- Madison than I am as an employee! • UW-Madison is more scared of me than it is of you, even if only a little bit. I have certain protections enshrined in campus policy. You… don’t have nearly as many.
  3. AI/ML, in a nutshell • AI: “arti fi cial intelligence.”

    An unrealistic pipe dream. • Even humans don’t know exactly how humans think. So we’re gonna train a computer to do it? Yeah… no, not really. It’s been tried; it’s always failed. • In certain sharply limited situations, can be made to work. Kind of. • ML: “machine learning.” A set of computational and mathematical/statistical techniques that enable computers to fi nd (“model”), report, and act on patterns in the data used to train them… and sometimes (only sometimes!!!) similar data to the training data. • Limitations in the training data = limitations in pattern detection capacity • Bias in the training data = bias in the model • Chose a wrong technique to use with your data? Garbage in, garbage out. • They’re computers, not magic wands!
  4. With that in mind… • I’m going to toss y’all

    into breakout rooms again. • In your rooms, make two lists: • One list of information about students’ homework practices that you think it’s reasonable for your instructors (good ones AND bad ones) to know. • One list of information about students’ homework practices that you think is just plain none of our business. • “Homework practices” include (but are not limited to): • When and how long you do homework • Where you are (in realspace) and whom you’re with when you do homework • What you read and do online (e.g. your web searches, clicks, browser URLs, online library visits, etc) while you’re working on homework • What you read and do of fl ine (e.g. reading print books, highlighting, writing things on paper, going to the library) while you’re working on homework • Mistakes you make and then correct on your homework • NO RIGHT/WRONG ANSWERS! Just what you think.
  5. Jargon, fi rst: PII • Personally Identi fi able Information.

    Anything that uniquely (or close to…) identi fi es you, such as: • Your name • ID number (SSN, driver’s license number, student ID number, passport number, etc) • Sensitive (combinations of) personal information are sometimes included, such as: • race/ethnicity, gender, birth date • In the US, PII tends to be more protected, legally, than other kinds of information about you (for example, your web-browsing habits).
  6. “Join FaceTwitInstaZon! It’s free!” • It is NOT FREE. You’re

    paying with your data, the analysis and sale of which can and do make money. • Data you give the service about yourself (often PII) • Data the service fi nds out about you by observing your behavior as you interact with it • Data the service fi nds out about you by observing your behavior elsewhere online, through ad networks and/or other online-behavior surveillance networks • Data the service can infer ( fi gure out) from your behavior (or, in especially creepy cases, by trying to manipulate your behavior) • Data the service can buy about you from other services, from ISPs and cell-service providers, or from data clearinghouses called data brokers (which in turn are happy to buy data about you from services you use…)
  7. “Big Data” • When you hear “Big Data” in the

    media, usually they mean huge aggregations of Data About Individual People. • Just to be perfectly clear: it doesn’t HAVE to be. Weather forecasting uses Big Data, for example, but it’s not data about people. • So does physics, biomedicine, economic modeling/forecasting… • When you hear “AI” or “machine learning” in the media, though, they’re usually talking about analyzing Big Data About People with computers to look for patterns. • Computers are quite a bit better at sifting through large piles of data looking for patterns than human beings are. • What they can’t do is fi gure out why the pattern is the way it is (as we saw with patterns of bias) or the implications of acting on the pattern. • One variety of analysis for Big Data about people is often called “inference.”
  8. Inference • Once a computer fi nds patterns in Big

    Data about people, you can infer things about the people that they didn’t actually tell you and might not even want you knowing. • Real-world inference based on data collected online has included: gender, race/ ethnicity, biological-family relationships (including ones unknown to the people involved), romantic relationships, age, country of origin, sexuality, religion, marital status, income, debt level, political preferences/beliefs, educational level, (lots of variations on) health status (including mental health), veteran status, pregnancy status, gullibility… • Again, computers detect patterns humans can’t. • So we usually can’t know exactly what it is in the data collected about us that will give away something we don’t want known (or don’t want used against us). WE CANNOT KNOW. And at present we have no realistic way to challenge the conclusions or stop the inferencing.
  9. Who collects Big Data? • Online advertising relies on data

    collection. • “Real-time ad bidding:” 1) You visit website. 2) Website fi gures out who you are and what it already knows about you, and 3) sends that (plus new info about you) to an ad network asking “what ad should I show this person?” • QUITE PERSONAL INFORMATION, e.g. race/ethnicity, sexuality, health status, MAY TRAVEL BETWEEN THE PARTIES HERE. • Online media (including journalism) relies on online advertising as well as other kinds/uses of surveillance. • Social media, mobile apps, and companies called “data brokers” rely on data collection, analysis, exploitation, and (re)sale. • Increasingly, workplaces, schools (K-12 through higher ed), and whole societies are surveilling their employees/students/citizens! • In some cases, theoretically to “serve them better.” Some, “to justify our existence.” Some, it’s plain old naked authoritarian control.
  10. “So Big Data is all online?” • Thanks for asking

    that! … No. • Your physical location over time is heavily tracked. • This happens substantially through your phone… but also via camera and voice surveillance, license-plate surveillance, credit-card and other ID records, etc. • Interactions you have in the physical world also leave traces in databases. • Non-cash purchases, of course, but also… • … through facial, gait, and voice recognition technologies. • “Brick-and-mortar” stores are actively researching how to track you more.
  11. Device identi fi ers • All your devices have them.

    • Network-card identi fi ers: MAC addresses (though these are less useful for surveillance than formerly) • Advertising identi fi ers on mobile devices, speci fi cally for ad tracking but used in many other surveillance contexts as well • Phone numbers • Serial numbers • Since most devices are one-owner, if they’ve got a device identi fi er, THEY PRETTY MUCH KNOW IT’S YOU. • Weasel trick: don’t collect the stuff that counts as PII, but DO collect device identi fi ers. • July 2021: revelation of the existence of companies that exist to match device identi fi ers to their owners’ identity • Fingerprinting: combos of settings that make your device (or web browser) unique
  12. Stingrays • Fake cell phone towers • Cell phones have

    to connect to towers to work at all. • Exist to collect phone identi fi ers and locations, undetectably to phone users • In use by many US local, state, and federal law-enforcement agencies • Stores and some college/university campuses considering using these (and similar follow-you-via-your-phone tech).
  13. “Wait, don’t they have to tell you fi rst?” •

    Sort of, some places. Not in the US. • In the US, this is mostly governed through contract law, which online means those Terms of Service and Privacy Policy things you never read. • Don’t lie; I know you don’t read them. There isn’t enough time in the universe to read them all! Which is part of the problem here! • US law is perfectly happy to let you agree to ToSes and PPs that are terrible for you. You’re an adult, you are supposed to have read it! If you didn’t and you’re hosed, OH WELL.
  14. Notice and consent • Notice and consent: A common legal

    maneuver (particularly for US-based online businesses) in which privacy concerns are considered satis fi ed if • the business noti fi es its customers how their data will be collected and used • and has them consent to it somehow (e.g. via clickthrough). • It doesn’t work; gives people false trust. • Impenetrable legalese allowed! Vague weasel-wording allowed! Changes without renoti fi cation allowed! That is not communication. • Misleading language is allowed in the gaining-consent process. • Research: many people wrongly think that the mere existence of a privacy policy means that the service does not share or sell data. WRONG! A privacy policy can absolutely say “we share and/or sell your data” and most of them do!
  15. “But it’s all anonymized, so no big deal, right?” •

    Wrong. Given enough data, removal of PII is meaningless. Big Data knows it’s you. • Remember those device identi fi ers? Yeah. Those. They don’t count as PII, legally. • Pet peeve of mine: removal of PII is not “anonymization,” but deidenti fi cation. • Anonymization is “ensuring that no one in the data can be identi fi ed from it”— and most security researchers believe it to be impossible. • Reidenti fi cation is “determining someone’s identity from deidenti fi ed data.” There are several ways to do it, and it’s often not even hard. • Common weasel phrase in ToSes and PPs: “We guard your PII carefully.” • What this actually means: “All the rest of the data is fair game for whatever we want to do with it and whoever we want to sell it to!”
  16. What are data about you used for? • What they’ll

    tell you: • Personalization: “Ads/content tailored to your interests!” (Not… exactly. Ads/ content they believe you will click on, whether it’s good for you or not. People get bamboozled into believing conspiracy theories via “tailoring.”) • “A better experience!” (Whatever that even means.) • What they won’t tell you: • Outright data sale, without your knowledge or consent • Inferring further data (including stuff you’d rather keep private) about you • Manipulating you (especially though not exclusively fi nancially and politically) • Making important decisions about you (loans, insurance, college admission, jobs) • Rolling over on you to government, law enforcement, and other authorities • Lots of other things! They won’t tell you what they’re doing! (FACEBOOK!)
  17. When does personalization become manipulation? • Well, always, really. All

    personalization is aimed at manipulating your behavior!!!!!!!! • Social media, for example, uses personalization to manipulate you into staying longer. Even when that’s not good for you or anyone else. • With product advertising, we understand that and we tend to be okay with it? • Arguably in many situations we shouldn’t be! For example, Facebook tricking children into overspending on Facebook games. • But what about “personalized” education? Or “personalized” news? Or “personalized” politics? • We can end up with a dangerously skewed sense of the world this way… and that can lead us to do dangerously messed-up things. • (I say “us” because no one is immune to this! Not me, not you, not anyone. That’s not how human brains work.)
  18. How else can Big Data be used against you? •

    Deny you opportunity • Facebook patented a system to test your loanworthiness based on analysis of your friend network • Colleges and universities use data brokers and monitor use of the campus website and social media to make admissions decisions. • Deny you services • Health insurers want to kick people who may have expensive health situations off their rolls. They’ll do almost anything (short of violating HIPAA) to fi nd out if you’re in such a situation. • Yes, in the US Big Data can deny you health care! • Get you in legal or reputational trouble • Employers, for example, also want to know if you’re a “health risk” or if you’re liable not to be the perfect employee for some reason.
  19. But it’s okay if it’s the truth, right? • Inferences

    especially can be wrong, and often are. • Garbage (data) in, garbage (inferences) out. • Bias in, bias out. For example, Amazon tried to infer who would be a good hire from analyzing resumes. The result? The computer marked down anything showing an applicant was female, because Amazon’s longstanding gender biases in hiring showed up in the data! • The data we can collect—even Big Data—can never paint a whole picture. (Jargon: availability bias—we judge on the data we can easily collect, which is not always the best or most useful data.) • Correlation is not causation, and all the other cautions that come from a good statistics or research-methods course! • Even truths can harm you unfairly. • Ask anyone who’s dealt with discrimination based on a personal characteristic.
  20. But ____ are the good guys, right? So it’s okay?

    • Even if that’s so (and it’s debatable)… • … anything ____ can collect, a bad guy (or bad government, or terrorist organization) can fi gure out how to collect also. • Or buy. Or hack ____ for. • Plenty of Big Datastores and data brokers and governments and whatnot have been hacked, or have leaked info. (EQUIFAAAAAAAAAX) • There is no such thing as tracking or data collection “only for the good guys.” • Moral: Even if you trust me as a person, or UW-Madison as an organization… you shouldn’t just trust us with lots of your data. Make us explain and justify what we’re doing!
  21. Who’s making it easier for law enforcement? MARKETERS. • Any

    information marketers can collect, law enforcement can also collect. • Sometimes directly from the marketers! After all, many sell data. • But more commonly, they just copy marketers’ tracking techniques. • Any tracking marketers can do, much law enforcement can also do. It’s all algorithms! • Anything marketers can learn about people, individually or collectively, law enforcement can also learn. • Few legal constraints on any of this!
  22. Geofeedia: or, police surveillance of social media • Geofeedia used

    geotagging on various social media along with text mining to create alerts for police. • Twitter, Facebook, Instagram, YouTube, Vine, Periscope, and more • One known dubiously-legal, unethical use: tracking the real-world locations of activists of color • Geofeedia got bad press, was abandoned by police. • Many similar tools still in use! • Do you know what your local police use? Possibly time to ask! • George Floyd protests: a data broker called Mobilewalla geolocated protestors via their phones… just for fun, apparently… and published the results. They were super-proud of themselves!
  23. But I’m just a student… • What if the computer

    fi nds a pattern suggesting students “like you” shouldn’t be in your major because they fail it a lot? • What if the computer is actually noticing your gender or your race/ethnicity, and the REAL problem is that the department offering that major behaves in racist and sexist ways? Do you trust UW-Madison to handle this correctly? • What if the computer discovers that students who eat that thing you often get at the Union don’t do well on fi nal exams? Or have mental-health issues? • Such a pattern would almost certainly be nonsense coincidence! But would that necessarily stop UW-Madison from acting on it? • Basically, what if a computer thinks You Are Studenting Wrong? • What is your recourse if that’s used against you? Or it’s incorrect? Or the problem isn’t actually your problem, but the university’s?
  24. Don’t these people have any ethics? • Often they don’t.

    Or they care about money more than they care about you, or about us. • Or they believe total garbage like “technology is neutral” or “it’s just a tool (so the implications aren’t my problem)” or “anything not actually illegal is obviously okay.” • (If you believe any of these things, PLEASE STOP. Learn better.) • There’s serious Fear of Missing Out (“FOMO”) around Big Data collection and analysis… even in non-pro fi t sectors like public education. • Yes, even in libraries. This DEVASTATES me as a librarian. I was taught to do better! But it’s true.
  25. Educational surveillance also known as “learning analytics”

  26. Did anybody ever tell you “this is going on your

    permanent record?” They were bluf fi ng, mostly. But now they’re not.
  27. FERPA • Family Educational Rights and Privacy Act • Protects

    any US-based “educational record” you have. • Grades are educational records, for example. • Not just anybody can waltz in and ask to see it; usually you (if you are adult) or your parents/guardians (if not) have to consent fi rst. • Even your instructors/advisors/counselors etc. have to have a reason to look up your records. • Not perfect law, but not bad either, for its time… which was roughly my lifetime ago.
  28. Here’s the thing… • The current de fi nition of

    what counts as an “educational record” is pretty speci fi c and completely print-based. • Most digital surveillance it’s possible to do in (for example) Canvas? Doesn’t count as a “record” under FERPA. • FERPA also has a giant loophole: an organization can use your data for internal research and assessment. • And can extend this ability to companies it contracts with to do research or assessment. You see (based on what I’ve said) how that could get sticky, I trust? • Add that to the Big Data movement, and you get…
  29. “Learning analytics” • Surveilling students as they learn, both online

    and in the physical world, and (supposedly… but not always) trying to use the information to help them learn. • Not properly tested. A lot of the “innovation” in this space is going on hunches and guesses, even as it’s affecting real students. • I repeat: WE DON’T EVEN KNOW IF/WHEN THIS WORKS. The results we have so far are pretty unimpressive. • Worse, a lot of the experimentation is not undergoing regular research oversight. • Obviously this information is gold to lots of others too… • … imagine if a prospective employer got hold of it. (Some of them are sleazy enough with the educational records they CAN get.)
  30. Including… • Anything you do in Canvas or Canvas add-ons

    • E-resources you use from the library • Website-based interactions and use, sometimes • Anything you do in the Student Center • enrollment, classes you look at (without enrolling in them), etc. • And more. • Higher-education institutions building “data warehouses” to retain all this information and connect it up with other information… just like a commercial data broker, really, except for (so far! no guarantees!) somewhat less actual data sale.
  31. Wait, the physical world too? How does that work? •

    ID-card swiping • Any time you swipe your WisCard, that turns into a row in a campus database, tied directly to you (via your student ID number as identi fi er). • That includes purchases, whenever you use WisCard for them! • So that you know: if you WisCard-swipe your way into a building or other physical space, that data goes to the UW Police Department. • Same space-surveillance techniques that retail and law enforcement use • Stingrays to track student cell phones • Video surveillance • IP address (for off-campus geolocation) when you use campus’s online resources • Wi fi geolocation of your devices when you’re physically on-campus
  32. Canvas analytics (from its documentation)

  33. None
  34. When you did what

  35. Exact page views

  36. And that’s just what Canvas shows ME. Canvas collects more

    data than this. The Unizin Consortium is building a “data platform” to hold it.
  37. Here’s what UW-Madison thinks it’s okay to do with learning

    analytics. As I explain, please consider misinterpretations and abuses. • https://at.doit.wisc.edu/evaluation-design-analysis/ learning-analytics-functional-taxonomy/
  38. None
  39. None
  40. WHAT DID I SAY about personalization?

  41. None
  42. So many assumptions. • “A pattern of interactions that works

    for one or a few students (or even many!) must work for everybody.” • Y’all are individuals!!! In individual situations!!!!! • “There’s a correlation between time spent and grades.” • OH MY GOSH Y’ALL. This is such nonsense I can’t even. Canvas can’t even measure all the time you spend! (And what if you just leave a window open, inactive?) • Some A+ students don’t spend much time (often due to prior experience). Some F students spend LOTS of time (because they’re lost). Do I trust that “predictive analytics” systems understand this? I DO NOT. • Communication habits depend on a lot of things — such as whether interactions with the instructor and/or other students have been positive. (*-isms do happen here. Are we going to blame their targets for “not interacting enough”?) • Students do not all have equal amounts of time to dedicate to school! Are we measuring “engagement” or privilege here?
  43. Not gonna lie: when I was a new instructor I

    had a lot to learn about how best to understand and interact with students. (I don’t pretend that I’m perfect at it now.) Based on what I know, I can say that these systems and assumptions are lots more clueless than I was then.
  44. I am ANGRY about this. (Can you tell?) But I

    do not control it. (I’ve been told by someone I trust that campus folks responsible for this won’t work with me or even talk with me because of what they know about my work and my beliefs and how shy I’m NOT about calling bad practice out.)
  45. Do I use this stuff? • Thank you for thinking

    about this. You are absolutely right that you are owed this information. • The answer: Very, very rarely. • The only situation in which I go into an individual student’s Canvas analytics is if they’ve dropped off the radar. • And when that happens, I email them to ask what’s up. No rush to judgment. • When or where or with whom you do homework, as long as it’s in on time and you come prepared to class? I consider it NONE OF MY BUSINESS. • I also don’t pretend that beyond the incredibly obvious (turn work in!), I (much less a computer) can make reliable predictions about student performance. • And I wish the rest of campus thought as I do.
  46. But cheaters! • Is there cheating? Yeah. A lot of

    it? Not that I’ve ever noticed in my classes. • Pedagogy research: A lot of cheating comes from students feeling anxious, unsure what to do, or overloaded. I try to make clear that I’d rather students ASK than cheat! • How instructors respond to potential cheating matters. • Pedagogy research: Calling on students to be honorable people stops a lot of cheating cold. • Draconian measures to prevent cheating damage student trust and raise student anxiety. (I mean, duh, right?) My teaching style relies on student trust quite a bit. • So I strongly prefer to treat you as the honorable adults you are. I think I teach better and you learn better that way. • I also believe that real cheaters get theirs, even if not directly from me.
  47. Where’s the harm?

  48. Where’s the harm?

  49. Exam proctoring • I hope you now understand why I

    was vehement about not using it when the course began. • Northwestern University students are suing under Illinois’s Biometric Information Protection Act. • Student protest has gotten proctoring contracts cancelled at several universities nationwide. • And proctoring settings (marginally) improved elsewhere, here included. • You DO NOT have to take this garbage lying down.
  50. Where’s the harm? • Naming-and-shaming: this is the University of

    Arizona, business-school researcher Dr. Sudha Ram.
  51. (the research ethics here are EXTRA-sketchy, but that’s a whole

    other discussion)
  52. To help possible dropouts? Not necessarily.

  53. Instructors, advisors, and administrators using these tools to judge and

    punish instead of understand is a serious, unsolved problem. (And a microcosm of Big Data About People use in the rest of society.)
  54. What students can do • EACH ONE TEACH ONE. Tell

    others what you now know! • Tell helicopter parents to step off. • Too many parents are demanding that campus surveil students “for safety.” • When your instructor says “Let’s use this online thing!” ask back “What’ll it do with my personal data? Behavioral data? Grades?” • They probably won’t have thought about it. At least ask them to THINK. • (“Don’t use your real name!” is one reason I’m comfortable using Scratch.) • Raise this with student organizations and *PIRGs. • If you need someone to explain it, CALL ON ME. PLEASE. I will back you up! • We will discuss some tracking-prevention tools and techniques a bit later on. Cross my heart.
  55. Thanks.