$30 off During Our Annual Pro Sale. View Details »

The surveillance industrial complex: tracking, ML/AI, Big Data

The surveillance industrial complex: tracking, ML/AI, Big Data

For Computer Science 202 "Introduction to Computing," summer 2021. Includes an extended discussion of learning analytics and Canvas tracking.

Dorothea Salo

July 21, 2021

More Decks by Dorothea Salo

Other Decks in Technology


  1. The surveillance industrial
    complex: tracking, ML/AI,
    Big Data
    CS 202

    View Slide

  2. Knowledge is power!
    • I actually hate that cliché, being in the knowledge business.
    Nothing’s that simple.

    • But I do want to say out loud that knowledge about you
    translates into power over you.

    • And the other way around, too! It is NOT COINCIDENCE that areas with a lot of
    people of color in them are heavily oversurveilled in the US! It’s a power play!

    • It’s also not coincidence that you students are more heavily surveilled by UW-
    Madison than I am as an employee!

    • UW-Madison is more scared of me than it is of you, even if only a little bit. I have
    certain protections enshrined in campus policy. You… don’t have nearly as many.

    View Slide

  3. AI/ML, in a nutshell
    • AI: “arti
    cial intelligence.” An unrealistic pipe dream.

    • Even humans don’t know exactly how humans think. So we’re gonna train a
    computer to do it? Yeah… no, not really. It’s been tried; it’s always failed.

    • In certain sharply limited situations, can be made to work. Kind of.

    • ML: “machine learning.” A set of computational and
    mathematical/statistical techniques that enable computers
    nd (“model”), report, and act on patterns in the data used
    to train them… and sometimes (only sometimes!!!) similar
    data to the training data.

    • Limitations in the training data = limitations in pattern detection capacity

    • Bias in the training data = bias in the model

    • Chose a wrong technique to use with your data? Garbage in, garbage out.

    • They’re computers, not magic wands!

    View Slide

  4. With that in mind…
    • I’m going to toss y’all into breakout rooms again.

    • In your rooms, make two lists:

    • One list of information about students’ homework practices that you think it’s reasonable
    for your instructors (good ones AND bad ones) to know.

    • One list of information about students’ homework practices that you think is just plain
    none of our business.

    • “Homework practices” include (but are not limited to):

    • When and how long you do homework

    • Where you are (in realspace) and whom you’re with when you do homework

    • What you read and do online (e.g. your web searches, clicks, browser URLs, online library
    visits, etc) while you’re working on homework

    • What you read and do of
    ine (e.g. reading print books, highlighting, writing things on
    paper, going to the library) while you’re working on homework

    • Mistakes you make and then correct on your homework

    • NO RIGHT/WRONG ANSWERS! Just what you think.

    View Slide

  5. Jargon,
    rst: PII
    • Personally Identi
    able Information. Anything that
    uniquely (or close to…) identi
    es you, such as:

    • Your name

    • ID number (SSN, driver’s license number, student ID number, passport number,

    • Sensitive (combinations of) personal information are
    sometimes included, such as:

    • race/ethnicity, gender, birth date

    • In the US, PII tends to be more protected, legally, than
    other kinds of information about you (for example, your
    web-browsing habits).

    View Slide

  6. “Join FaceTwitInstaZon!
    It’s free!”
    • It is NOT FREE. You’re paying with your data, the analysis
    and sale of which can and do make money.

    • Data you give the service about yourself (often PII)

    • Data the service
    nds out about you by observing your behavior as you
    interact with it

    • Data the service
    nds out about you by observing your behavior
    elsewhere online, through ad networks and/or other online-behavior
    surveillance networks

    • Data the service can infer (
    gure out) from your behavior (or, in especially creepy
    cases, by trying to manipulate your behavior)

    • Data the service can buy about you from other services, from ISPs and
    cell-service providers, or from data clearinghouses called data brokers (which
    in turn are happy to buy data about you from services you use…)

    View Slide

  7. “Big Data”
    • When you hear “Big Data” in the media, usually they mean
    huge aggregations of Data About Individual People.

    • Just to be perfectly clear: it doesn’t HAVE to be. Weather forecasting uses Big
    Data, for example, but it’s not data about people.

    • So does physics, biomedicine, economic modeling/forecasting…

    • When you hear “AI” or “machine learning” in the media,
    though, they’re usually talking about analyzing Big Data
    About People with computers to look for patterns.

    • Computers are quite a bit better at sifting through large piles of data looking for
    patterns than human beings are.

    • What they can’t do is
    gure out why the pattern is the way it is (as we saw with
    patterns of bias) or the implications of acting on the pattern.

    • One variety of analysis for Big Data about people is often called “inference.”

    View Slide

  8. Inference
    • Once a computer
    nds patterns in Big Data about people,
    you can infer things about the people that they didn’t
    actually tell you and might not even want you knowing.

    • Real-world inference based on data collected online has included: gender, race/
    ethnicity, biological-family relationships (including ones unknown to the people
    involved), romantic relationships, age, country of origin, sexuality, religion,
    marital status, income, debt level, political preferences/beliefs, educational level,
    (lots of variations on) health status (including mental health), veteran status,
    pregnancy status, gullibility…

    • Again, computers detect patterns humans can’t.

    • So we usually can’t know exactly what it is in the data collected about us that will
    give away something we don’t want known (or don’t want used against us). WE
    CANNOT KNOW. And at present we have no realistic way to challenge the
    conclusions or stop the inferencing.

    View Slide

  9. Who collects Big Data?
    • Online advertising relies on data collection.

    • “Real-time ad bidding:” 1) You visit website. 2) Website
    gures out who you are and what
    it already knows about you, and 3) sends that (plus new info about you) to an ad network
    asking “what ad should I show this person?”

    • QUITE PERSONAL INFORMATION, e.g. race/ethnicity, sexuality, health status, MAY TRAVEL

    • Online media (including journalism) relies on online advertising
    as well as other kinds/uses of surveillance.

    • Social media, mobile apps, and companies called “data brokers”
    rely on data collection, analysis, exploitation, and (re)sale.

    • Increasingly, workplaces, schools (K-12 through higher ed), and
    whole societies are surveilling their employees/students/citizens!

    • In some cases, theoretically to “serve them better.” Some, “to justify our existence.” Some,
    it’s plain old naked authoritarian control.

    View Slide

  10. “So Big Data is all online?”
    • Thanks for asking that! … No.

    • Your physical location over time is heavily tracked.

    • This happens substantially through your phone… but also via camera and voice
    surveillance, license-plate surveillance, credit-card and other ID records, etc.

    • Interactions you have in the physical world also leave traces
    in databases.

    • Non-cash purchases, of course, but also…

    • … through facial, gait, and voice recognition technologies.

    • “Brick-and-mortar” stores are actively researching how to
    track you more.

    View Slide

  11. Device identi
    • All your devices have them.

    • Network-card identi
    ers: MAC addresses (though these are less useful for
    surveillance than formerly)

    • Advertising identi
    ers on mobile devices, speci
    cally for ad tracking but used in
    many other surveillance contexts as well

    • Phone numbers

    • Serial numbers

    • Since most devices are one-owner, if they’ve got a device

    • Weasel trick: don’t collect the stuff that counts as PII, but DO collect device identi

    • July 2021: revelation of the existence of companies that exist to match device
    ers to their owners’ identity

    • Fingerprinting: combos of settings that make your device
    (or web browser) unique

    View Slide

  12. Stingrays
    • Fake cell phone towers

    • Cell phones have to connect to towers to work at all.

    • Exist to collect phone identi
    ers and locations,
    undetectably to phone users

    • In use by many US local, state, and federal law-enforcement

    • Stores and some college/university campuses considering
    using these (and similar follow-you-via-your-phone tech).

    View Slide

  13. “Wait, don’t they have to
    tell you
    • Sort of, some places. Not in the US.

    • In the US, this is mostly governed through contract law,
    which online means those Terms of Service and Privacy
    Policy things you never read.

    • Don’t lie; I know you don’t read them. There isn’t enough time in the universe to
    read them all! Which is part of the problem here!

    • US law is perfectly happy to let you agree to ToSes and PPs that are terrible for
    you. You’re an adult, you are supposed to have read it! If you didn’t and you’re
    hosed, OH WELL.

    View Slide

  14. Notice and consent
    • Notice and consent: A common legal maneuver
    (particularly for US-based online businesses) in which
    privacy concerns are considered satis
    ed if

    • the business noti
    es its customers how their data will be collected and used

    • and has them consent to it somehow (e.g. via clickthrough).

    • It doesn’t work; gives people false trust.

    • Impenetrable legalese allowed! Vague weasel-wording allowed! Changes without
    cation allowed! That is not communication.

    • Misleading language is allowed in the gaining-consent process.

    • Research: many people wrongly think that the mere existence of a privacy policy
    means that the service does not share or sell data. WRONG! A privacy policy can
    absolutely say “we share and/or sell your data” and most of them do!

    View Slide

  15. “But it’s all anonymized,
    so no big deal, right?”
    • Wrong. Given enough data, removal of PII is meaningless.
    Big Data knows it’s you.

    • Remember those device identi
    ers? Yeah. Those. They don’t count as PII, legally.

    • Pet peeve of mine: removal of PII is not “anonymization,” but deidenti

    • Anonymization is “ensuring that no one in the data can be identi
    ed from it”—
    and most security researchers believe it to be impossible.

    • Reidenti
    cation is “determining someone’s identity from deidenti
    ed data.”
    There are several ways to do it, and it’s often not even hard.

    • Common weasel phrase in ToSes and PPs: “We guard your
    PII carefully.”

    • What this actually means: “All the rest of the data is fair game for whatever we
    want to do with it and whoever we want to sell it to!”

    View Slide

  16. What are data about you
    used for?
    • What they’ll tell you:

    • Personalization: “Ads/content tailored to your interests!” (Not… exactly. Ads/
    content they believe you will click on, whether it’s good for you or not. People get
    bamboozled into believing conspiracy theories via “tailoring.”)

    • “A better experience!” (Whatever that even means.)

    • What they won’t tell you:

    • Outright data sale, without your knowledge or consent

    • Inferring further data (including stuff you’d rather keep private) about you

    • Manipulating you (especially though not exclusively
    nancially and politically)

    • Making important decisions about you (loans, insurance, college admission, jobs)

    • Rolling over on you to government, law enforcement, and other authorities

    • Lots of other things! They won’t tell you what they’re doing! (FACEBOOK!)

    View Slide

  17. When does personalization
    become manipulation?
    • Well, always, really. All personalization is aimed at manipulating
    your behavior!!!!!!!!

    • Social media, for example, uses personalization to manipulate you into staying longer.
    Even when that’s not good for you or anyone else.

    • With product advertising, we understand that and we tend to be
    okay with it?

    • Arguably in many situations we shouldn’t be! For example, Facebook tricking children
    into overspending on Facebook games.

    • But what about “personalized” education? Or “personalized”
    news? Or “personalized” politics?

    • We can end up with a dangerously skewed sense of the world this way… and that can
    lead us to do dangerously messed-up things.

    • (I say “us” because no one is immune to this! Not me, not you, not anyone. That’s not
    how human brains work.)

    View Slide

  18. How else can Big Data be
    used against you?
    • Deny you opportunity

    • Facebook patented a system to test your loanworthiness based on analysis of your
    friend network

    • Colleges and universities use data brokers and monitor use of the campus
    website and social media to make admissions decisions.

    • Deny you services

    • Health insurers want to kick people who may have expensive health situations off
    their rolls. They’ll do almost anything (short of violating HIPAA) to
    nd out if
    you’re in such a situation.

    • Yes, in the US Big Data can deny you health care!

    • Get you in legal or reputational trouble

    • Employers, for example, also want to know if you’re a “health risk” or if you’re
    liable not to be the perfect employee for some reason.

    View Slide

  19. But it’s okay if it’s the
    truth, right?
    • Inferences especially can be wrong, and often are.

    • Garbage (data) in, garbage (inferences) out.

    • Bias in, bias out. For example, Amazon tried to infer who would be a good hire
    from analyzing resumes. The result? The computer marked down anything
    showing an applicant was female, because Amazon’s longstanding gender biases
    in hiring showed up in the data!

    • The data we can collect—even Big Data—can never paint a whole picture. (Jargon:
    availability bias—we judge on the data we can easily collect, which is not
    always the best or most useful data.)

    • Correlation is not causation, and all the other cautions that come from a good
    statistics or research-methods course!

    • Even truths can harm you unfairly.

    • Ask anyone who’s dealt with discrimination based on a personal characteristic.

    View Slide

  20. But ____ are the good
    guys, right? So it’s okay?
    • Even if that’s so (and it’s debatable)…

    • … anything ____ can collect, a bad guy (or bad
    government, or terrorist organization) can
    gure out how
    to collect also.

    • Or buy. Or hack ____ for.

    • Plenty of Big Datastores and data brokers and governments and whatnot have
    been hacked, or have leaked info. (EQUIFAAAAAAAAAX)

    • There is no such thing as tracking or data collection
    “only for the good guys.”

    • Moral: Even if you trust me as a person, or UW-Madison as
    an organization… you shouldn’t just trust us with lots of
    your data. Make us explain and justify what we’re doing!

    View Slide

  21. Who’s making it easier
    for law enforcement?
    • Any information marketers can collect, law
    enforcement can also collect.

    • Sometimes directly from the marketers! After all, many sell data.

    • But more commonly, they just copy marketers’ tracking techniques.

    • Any tracking marketers can do, much law enforcement can
    also do. It’s all algorithms!

    • Anything marketers can learn about people, individually
    or collectively, law enforcement can also learn.

    • Few legal constraints on any of this!

    View Slide

  22. Geofeedia: or, police
    surveillance of social
    • Geofeedia used geotagging on various social media along
    with text mining to create alerts for police.

    • Twitter, Facebook, Instagram, YouTube, Vine, Periscope, and more

    • One known dubiously-legal, unethical use: tracking the real-world locations of
    activists of color

    • Geofeedia got bad press, was abandoned by police.

    • Many similar tools still in use!

    • Do you know what your local police use? Possibly time to ask!

    • George Floyd protests: a data broker called Mobilewalla geolocated protestors via
    their phones… just for fun, apparently… and published the results. They were
    super-proud of themselves!

    View Slide

  23. But I’m just a student…
    • What if the computer
    nds a pattern suggesting students “like
    you” shouldn’t be in your major because they fail it a lot?

    • What if the computer is actually noticing your gender or your race/ethnicity, and the
    REAL problem is that the department offering that major behaves in racist and sexist
    ways? Do you trust UW-Madison to handle this correctly?

    • What if the computer discovers that students who eat that
    thing you often get at the Union don’t do well on
    nal exams?
    Or have mental-health issues?

    • Such a pattern would almost certainly be nonsense coincidence! But would that
    necessarily stop UW-Madison from acting on it?

    • Basically, what if a computer thinks You Are Studenting Wrong?

    • What is your recourse if that’s used against you? Or it’s incorrect? Or the problem isn’t
    actually your problem, but the university’s?

    View Slide

  24. Don’t these people have
    any ethics?
    • Often they don’t. Or they care about money more than they
    care about you, or about us.

    • Or they believe total garbage like “technology is neutral” or “it’s just a tool (so the
    implications aren’t my problem)” or “anything not actually illegal is obviously

    • (If you believe any of these things, PLEASE STOP. Learn better.)

    • There’s serious Fear of Missing Out (“FOMO”) around Big
    Data collection and analysis… even in non-pro
    t sectors
    like public education.

    • Yes, even in libraries. This DEVASTATES me as a librarian. I
    was taught to do better! But it’s true.

    View Slide

  25. Educational
    also known as “learning analytics”

    View Slide

  26. Did anybody ever tell
    you “this is going on your
    permanent record?”
    They were bluf
    ng, mostly.

    But now they’re not.

    View Slide

  27. FERPA
    • Family Educational Rights and Privacy Act

    • Protects any US-based “educational record” you have.

    • Grades are educational records, for example.

    • Not just anybody can waltz in and ask to see it; usually you
    (if you are adult) or your parents/guardians (if not) have to

    • Even your instructors/advisors/counselors etc. have to have a reason to look up
    your records.

    • Not perfect law, but not bad either, for its time… which was
    roughly my lifetime ago.

    View Slide

  28. Here’s the thing…
    • The current de
    nition of what counts as an “educational
    record” is pretty speci
    c and completely print-based.

    • Most digital surveillance it’s possible to do in (for example)
    Canvas? Doesn’t count as a “record” under FERPA.

    • FERPA also has a giant loophole: an organization can use
    your data for internal research and assessment.

    • And can extend this ability to companies it contracts with to do research or
    assessment. You see (based on what I’ve said) how that could get sticky, I trust?

    • Add that to the Big Data movement, and you get…

    View Slide

  29. “Learning analytics”
    • Surveilling students as they learn, both online and in the
    physical world, and (supposedly… but not always) trying
    to use the information to help them learn.

    • Not properly tested. A lot of the “innovation” in this
    space is going on hunches and guesses, even as it’s
    affecting real students.

    • I repeat: WE DON’T EVEN KNOW IF/WHEN THIS WORKS. The results we
    have so far are pretty unimpressive.

    • Worse, a lot of the experimentation is not undergoing regular research oversight.

    • Obviously this information is gold to lots of others too…

    • … imagine if a prospective employer got hold of it. (Some of them are sleazy
    enough with the educational records they CAN get.)

    View Slide

  30. Including…
    • Anything you do in Canvas or Canvas add-ons

    • E-resources you use from the library

    • Website-based interactions and use, sometimes

    • Anything you do in the Student Center

    • enrollment, classes you look at (without enrolling in them), etc.

    • And more.

    • Higher-education institutions building “data warehouses” to retain all this
    information and connect it up with other information… just like a commercial
    data broker, really, except for (so far! no guarantees!) somewhat less actual data

    View Slide

  31. Wait, the physical world
    too? How does that work?
    • ID-card swiping

    • Any time you swipe your WisCard, that turns into a row in a campus database, tied
    directly to you (via your student ID number as identi

    • That includes purchases, whenever you use WisCard for them!

    • So that you know: if you WisCard-swipe your way into a building or other physical
    space, that data goes to the UW Police Department.

    • Same space-surveillance techniques that retail and law
    enforcement use

    • Stingrays to track student cell phones

    • Video surveillance

    • IP address (for off-campus geolocation) when you use campus’s online resources

    • Wi
    geolocation of your devices when you’re physically on-campus

    View Slide

  32. Canvas analytics

    (from its documentation)

    View Slide

  33. View Slide

  34. When you did what

    View Slide

  35. Exact page views

    View Slide

  36. And that’s just

    what Canvas shows ME.

    Canvas collects

    more data than this.

    The Unizin Consortium

    is building a “data platform”
    to hold it.

    View Slide

  37. Here’s what UW-Madison

    thinks it’s okay to do

    with learning analytics.

    As I explain, please consider

    misinterpretations and abuses.
    • https://at.doit.wisc.edu/evaluation-design-analysis/

    View Slide

  38. View Slide

  39. View Slide


    about personalization?

    View Slide

  41. View Slide

  42. So many assumptions.
    • “A pattern of interactions that works for one or a few
    students (or even many!) must work for everybody.”

    • Y’all are individuals!!! In individual situations!!!!!

    • “There’s a correlation between time spent and grades.”

    • OH MY GOSH Y’ALL. This is such nonsense I can’t even. Canvas can’t even measure
    all the time you spend! (And what if you just leave a window open, inactive?)

    • Some A+ students don’t spend much time (often due to prior experience). Some F
    students spend LOTS of time (because they’re lost). Do I trust that “predictive
    analytics” systems understand this? I DO NOT.

    • Communication habits depend on a lot of things — such as whether interactions
    with the instructor and/or other students have been positive. (*-isms do happen
    here. Are we going to blame their targets for “not interacting enough”?)

    • Students do not all have equal amounts of time to dedicate to school! Are we
    measuring “engagement” or privilege here?

    View Slide

  43. Not gonna lie:

    when I was a new instructor

    I had a lot to learn

    about how best to understand

    and interact with students.

    (I don’t pretend

    that I’m perfect at it now.)

    Based on what I know, I can say

    that these systems and assumptions

    are lots more clueless than I was then.

    View Slide

  44. I am ANGRY about this.

    (Can you tell?)

    But I do not control it.

    (I’ve been told by someone I trust

    that campus folks responsible for this

    won’t work with me

    or even talk with me

    because of what they know

    about my work and my beliefs

    and how shy I’m NOT

    about calling bad practice out.)

    View Slide

  45. Do I use this stuff?
    • Thank you for thinking about this. You are absolutely right
    that you are owed this information.

    • The answer: Very, very rarely.

    • The only situation in which I go into an individual student’s
    Canvas analytics is if they’ve dropped off the radar.

    • And when that happens, I email them to ask what’s up. No rush to judgment.

    • When or where or with whom you do homework, as long as
    it’s in on time and you come prepared to class? I consider it

    • I also don’t pretend that beyond the incredibly obvious (turn work in!), I (much
    less a computer) can make reliable predictions about student performance.

    • And I wish the rest of campus thought as I do.

    View Slide

  46. But cheaters!
    • Is there cheating? Yeah. A lot of it? Not that I’ve ever noticed
    in my classes.

    • Pedagogy research: A lot of cheating comes from students feeling anxious, unsure
    what to do, or overloaded. I try to make clear that I’d rather students ASK than cheat!

    • How instructors respond to potential cheating matters.

    • Pedagogy research: Calling on students to be honorable people stops a lot of
    cheating cold.

    • Draconian measures to prevent cheating damage student trust and raise student
    anxiety. (I mean, duh, right?) My teaching style relies on student trust quite a bit.

    • So I strongly prefer to treat you as the honorable adults you
    are. I think I teach better and you learn better that way.

    • I also believe that real cheaters get theirs, even if not directly
    from me.

    View Slide

  47. Where’s the harm?

    View Slide

  48. Where’s the harm?

    View Slide

  49. Exam proctoring
    • I hope you now understand why I was vehement about not
    using it when the course began.

    • Northwestern University students are suing under Illinois’s
    Biometric Information Protection Act.

    • Student protest has gotten proctoring contracts cancelled at
    several universities nationwide.

    • And proctoring settings (marginally) improved elsewhere, here included.

    • You DO NOT have to take this garbage lying down.

    View Slide

  50. Where’s the harm?
    • Naming-and-shaming: this is the University of Arizona,
    business-school researcher Dr. Sudha Ram.

    View Slide

  51. (the research ethics here
    are EXTRA-sketchy,

    but that’s a whole other

    View Slide

  52. To help possible dropouts?
    Not necessarily.

    View Slide

  53. Instructors, advisors,

    and administrators

    using these tools to judge and punish

    instead of understand

    is a serious, unsolved problem.

    (And a microcosm

    of Big Data About People use

    in the rest of society.)

    View Slide

  54. What students can do
    • EACH ONE TEACH ONE. Tell others what you now know!

    • Tell helicopter parents to step off.

    • Too many parents are demanding that campus surveil students “for safety.”

    • When your instructor says “Let’s use this online thing!” ask
    back “What’ll it do with my personal data? Behavioral data?

    • They probably won’t have thought about it. At least ask them to THINK.

    • (“Don’t use your real name!” is one reason I’m comfortable using Scratch.)

    • Raise this with student organizations and *PIRGs.

    • If you need someone to explain it, CALL ON ME. PLEASE. I will back you up!

    • We will discuss some tracking-prevention tools and
    techniques a bit later on. Cross my heart.

    View Slide

  55. Thanks.

    View Slide