Knowledge is power! • I actually hate that cliché, being in the knowledge business. Nothing’s that simple.
• But I do want to say out loud that knowledge about you translates into power over you.
• And the other way around, too! It is NOT COINCIDENCE that areas with a lot of people of color in them are heavily oversurveilled in the US! It’s a power play!
• It’s also not coincidence that you students are more heavily surveilled by UW- Madison than I am as an employee!
• UW-Madison is more scared of me than it is of you, even if only a little bit. I have certain protections enshrined in campus policy. You… don’t have nearly as many.
AI/ML, in a nutshell • AI: “arti fi cial intelligence.” An unrealistic pipe dream.
• Even humans don’t know exactly how humans think. So we’re gonna train a computer to do it? Yeah… no, not really. It’s been tried; it’s always failed.
• In certain sharply limited situations, can be made to work. Kind of.
• ML: “machine learning.” A set of computational and mathematical/statistical techniques that enable computers to fi nd (“model”), report, and act on patterns in the data used to train them… and sometimes (only sometimes!!!) similar data to the training data.
• Limitations in the training data = limitations in pattern detection capacity
• Bias in the training data = bias in the model
• Chose a wrong technique to use with your data? Garbage in, garbage out.
With that in mind… • I’m going to toss y’all into breakout rooms again.
• In your rooms, make two lists:
• One list of information about students’ homework practices that you think it’s reasonable for your instructors (good ones AND bad ones) to know.
• One list of information about students’ homework practices that you think is just plain none of our business.
• “Homework practices” include (but are not limited to):
• When and how long you do homework
• Where you are (in realspace) and whom you’re with when you do homework
• What you read and do online (e.g. your web searches, clicks, browser URLs, online library visits, etc) while you’re working on homework
• What you read and do of fl ine (e.g. reading print books, highlighting, writing things on paper, going to the library) while you’re working on homework
• Mistakes you make and then correct on your homework
“Join FaceTwitInstaZon! It’s free!” • It is NOT FREE. You’re paying with your data, the analysis and sale of which can and do make money.
• Data you give the service about yourself (often PII)
• Data the service fi nds out about you by observing your behavior as you interact with it
• Data the service fi nds out about you by observing your behavior elsewhere online, through ad networks and/or other online-behavior surveillance networks
• Data the service can infer ( fi gure out) from your behavior (or, in especially creepy cases, by trying to manipulate your behavior)
• Data the service can buy about you from other services, from ISPs and cell-service providers, or from data clearinghouses called data brokers (which in turn are happy to buy data about you from services you use…)
“Big Data” • When you hear “Big Data” in the media, usually they mean huge aggregations of Data About Individual People.
• Just to be perfectly clear: it doesn’t HAVE to be. Weather forecasting uses Big Data, for example, but it’s not data about people.
• So does physics, biomedicine, economic modeling/forecasting…
• When you hear “AI” or “machine learning” in the media, though, they’re usually talking about analyzing Big Data About People with computers to look for patterns.
• Computers are quite a bit better at sifting through large piles of data looking for patterns than human beings are.
• What they can’t do is fi gure out why the pattern is the way it is (as we saw with patterns of bias) or the implications of acting on the pattern.
• One variety of analysis for Big Data about people is often called “inference.”
Inference • Once a computer fi nds patterns in Big Data about people, you can infer things about the people that they didn’t actually tell you and might not even want you knowing.
• Real-world inference based on data collected online has included: gender, race/ ethnicity, biological-family relationships (including ones unknown to the people involved), romantic relationships, age, country of origin, sexuality, religion, marital status, income, debt level, political preferences/beliefs, educational level, (lots of variations on) health status (including mental health), veteran status, pregnancy status, gullibility…
• Again, computers detect patterns humans can’t.
• So we usually can’t know exactly what it is in the data collected about us that will give away something we don’t want known (or don’t want used against us). WE CANNOT KNOW. And at present we have no realistic way to challenge the conclusions or stop the inferencing.
Who collects Big Data? • Online advertising relies on data collection.
• “Real-time ad bidding:” 1) You visit website. 2) Website fi gures out who you are and what it already knows about you, and 3) sends that (plus new info about you) to an ad network asking “what ad should I show this person?”
• QUITE PERSONAL INFORMATION, e.g. race/ethnicity, sexuality, health status, MAY TRAVEL BETWEEN THE PARTIES HERE.
• Online media (including journalism) relies on online advertising as well as other kinds/uses of surveillance.
• Social media, mobile apps, and companies called “data brokers” rely on data collection, analysis, exploitation, and (re)sale.
• Increasingly, workplaces, schools (K-12 through higher ed), and whole societies are surveilling their employees/students/citizens!
• In some cases, theoretically to “serve them better.” Some, “to justify our existence.” Some, it’s plain old naked authoritarian control.
“So Big Data is all online?” • Thanks for asking that! … No.
• Your physical location over time is heavily tracked.
• This happens substantially through your phone… but also via camera and voice surveillance, license-plate surveillance, credit-card and other ID records, etc.
• Interactions you have in the physical world also leave traces in databases.
• Non-cash purchases, of course, but also…
• … through facial, gait, and voice recognition technologies.
• “Brick-and-mortar” stores are actively researching how to track you more.
“Wait, don’t they have to tell you fi rst?” • Sort of, some places. Not in the US.
• In the US, this is mostly governed through contract law, which online means those Terms of Service and Privacy Policy things you never read.
• Don’t lie; I know you don’t read them. There isn’t enough time in the universe to read them all! Which is part of the problem here!
• US law is perfectly happy to let you agree to ToSes and PPs that are terrible for you. You’re an adult, you are supposed to have read it! If you didn’t and you’re hosed, OH WELL.
Notice and consent • Notice and consent: A common legal maneuver (particularly for US-based online businesses) in which privacy concerns are considered satis fi ed if
• the business noti fi es its customers how their data will be collected and used
• and has them consent to it somehow (e.g. via clickthrough).
• It doesn’t work; gives people false trust.
• Impenetrable legalese allowed! Vague weasel-wording allowed! Changes without renoti fi cation allowed! That is not communication.
• Misleading language is allowed in the gaining-consent process.
• Research: many people wrongly think that the mere existence of a privacy policy means that the service does not share or sell data. WRONG! A privacy policy can absolutely say “we share and/or sell your data” and most of them do!
What are data about you used for? • What they’ll tell you:
• Personalization: “Ads/content tailored to your interests!” (Not… exactly. Ads/ content they believe you will click on, whether it’s good for you or not. People get bamboozled into believing conspiracy theories via “tailoring.”)
• “A better experience!” (Whatever that even means.)
• What they won’t tell you:
• Outright data sale, without your knowledge or consent
• Inferring further data (including stuff you’d rather keep private) about you
• Manipulating you (especially though not exclusively fi nancially and politically)
• Making important decisions about you (loans, insurance, college admission, jobs)
• Rolling over on you to government, law enforcement, and other authorities
• Lots of other things! They won’t tell you what they’re doing! (FACEBOOK!)
How else can Big Data be used against you? • Deny you opportunity
• Facebook patented a system to test your loanworthiness based on analysis of your friend network
• Colleges and universities use data brokers and monitor use of the campus website and social media to make admissions decisions.
• Deny you services
• Health insurers want to kick people who may have expensive health situations off their rolls. They’ll do almost anything (short of violating HIPAA) to fi nd out if you’re in such a situation.
• Yes, in the US Big Data can deny you health care!
• Get you in legal or reputational trouble
• Employers, for example, also want to know if you’re a “health risk” or if you’re liable not to be the perfect employee for some reason.
But it’s okay if it’s the truth, right? • Inferences especially can be wrong, and often are.
• Garbage (data) in, garbage (inferences) out.
• Bias in, bias out. For example, Amazon tried to infer who would be a good hire from analyzing resumes. The result? The computer marked down anything showing an applicant was female, because Amazon’s longstanding gender biases in hiring showed up in the data!
• The data we can collect—even Big Data—can never paint a whole picture. (Jargon: availability bias—we judge on the data we can easily collect, which is not always the best or most useful data.)
• Correlation is not causation, and all the other cautions that come from a good statistics or research-methods course!
• Even truths can harm you unfairly.
• Ask anyone who’s dealt with discrimination based on a personal characteristic.
But ____ are the good guys, right? So it’s okay? • Even if that’s so (and it’s debatable)…
• … anything ____ can collect, a bad guy (or bad government, or terrorist organization) can fi gure out how to collect also.
• Or buy. Or hack ____ for.
• Plenty of Big Datastores and data brokers and governments and whatnot have been hacked, or have leaked info. (EQUIFAAAAAAAAAX)
• There is no such thing as tracking or data collection “only for the good guys.”
• Moral: Even if you trust me as a person, or UW-Madison as an organization… you shouldn’t just trust us with lots of your data. Make us explain and justify what we’re doing!
Geofeedia: or, police surveillance of social media • Geofeedia used geotagging on various social media along with text mining to create alerts for police.
• Twitter, Facebook, Instagram, YouTube, Vine, Periscope, and more
• One known dubiously-legal, unethical use: tracking the real-world locations of activists of color
• Geofeedia got bad press, was abandoned by police.
• Many similar tools still in use!
• Do you know what your local police use? Possibly time to ask!
• George Floyd protests: a data broker called Mobilewalla geolocated protestors via their phones… just for fun, apparently… and published the results. They were super-proud of themselves!
But I’m just a student… • What if the computer fi nds a pattern suggesting students “like you” shouldn’t be in your major because they fail it a lot?
• What if the computer is actually noticing your gender or your race/ethnicity, and the REAL problem is that the department offering that major behaves in racist and sexist ways? Do you trust UW-Madison to handle this correctly?
• What if the computer discovers that students who eat that thing you often get at the Union don’t do well on fi nal exams? Or have mental-health issues?
• Such a pattern would almost certainly be nonsense coincidence! But would that necessarily stop UW-Madison from acting on it?
• Basically, what if a computer thinks You Are Studenting Wrong?
• What is your recourse if that’s used against you? Or it’s incorrect? Or the problem isn’t actually your problem, but the university’s?
Don’t these people have any ethics? • Often they don’t. Or they care about money more than they care about you, or about us.
• Or they believe total garbage like “technology is neutral” or “it’s just a tool (so the implications aren’t my problem)” or “anything not actually illegal is obviously okay.”
• (If you believe any of these things, PLEASE STOP. Learn better.)
• There’s serious Fear of Missing Out (“FOMO”) around Big Data collection and analysis… even in non-pro fi t sectors like public education.
• Yes, even in libraries. This DEVASTATES me as a librarian. I was taught to do better! But it’s true.
Here’s the thing… • The current de fi nition of what counts as an “educational record” is pretty speci fi c and completely print-based.
• Most digital surveillance it’s possible to do in (for example) Canvas? Doesn’t count as a “record” under FERPA.
• FERPA also has a giant loophole: an organization can use your data for internal research and assessment.
• And can extend this ability to companies it contracts with to do research or assessment. You see (based on what I’ve said) how that could get sticky, I trust?
“Learning analytics” • Surveilling students as they learn, both online and in the physical world, and (supposedly… but not always) trying to use the information to help them learn.
• Not properly tested. A lot of the “innovation” in this space is going on hunches and guesses, even as it’s affecting real students.
• I repeat: WE DON’T EVEN KNOW IF/WHEN THIS WORKS. The results we have so far are pretty unimpressive.
• Worse, a lot of the experimentation is not undergoing regular research oversight.
• Obviously this information is gold to lots of others too…
• … imagine if a prospective employer got hold of it. (Some of them are sleazy enough with the educational records they CAN get.)
Including… • Anything you do in Canvas or Canvas add-ons
• E-resources you use from the library
• Website-based interactions and use, sometimes
• Anything you do in the Student Center
• enrollment, classes you look at (without enrolling in them), etc.
• And more.
• Higher-education institutions building “data warehouses” to retain all this information and connect it up with other information… just like a commercial data broker, really, except for (so far! no guarantees!) somewhat less actual data sale.
So many assumptions. • “A pattern of interactions that works for one or a few students (or even many!) must work for everybody.”
• Y’all are individuals!!! In individual situations!!!!!
• “There’s a correlation between time spent and grades.”
• OH MY GOSH Y’ALL. This is such nonsense I can’t even. Canvas can’t even measure all the time you spend! (And what if you just leave a window open, inactive?)
• Some A+ students don’t spend much time (often due to prior experience). Some F students spend LOTS of time (because they’re lost). Do I trust that “predictive analytics” systems understand this? I DO NOT.
• Communication habits depend on a lot of things — such as whether interactions with the instructor and/or other students have been positive. (*-isms do happen here. Are we going to blame their targets for “not interacting enough”?)
• Students do not all have equal amounts of time to dedicate to school! Are we measuring “engagement” or privilege here?
Do I use this stuff? • Thank you for thinking about this. You are absolutely right that you are owed this information.
• The answer: Very, very rarely.
• The only situation in which I go into an individual student’s Canvas analytics is if they’ve dropped off the radar.
• And when that happens, I email them to ask what’s up. No rush to judgment.
• When or where or with whom you do homework, as long as it’s in on time and you come prepared to class? I consider it NONE OF MY BUSINESS.
• I also don’t pretend that beyond the incredibly obvious (turn work in!), I (much less a computer) can make reliable predictions about student performance.
But cheaters! • Is there cheating? Yeah. A lot of it? Not that I’ve ever noticed in my classes.
• Pedagogy research: A lot of cheating comes from students feeling anxious, unsure what to do, or overloaded. I try to make clear that I’d rather students ASK than cheat!
• How instructors respond to potential cheating matters.
• Pedagogy research: Calling on students to be honorable people stops a lot of cheating cold.
• Draconian measures to prevent cheating damage student trust and raise student anxiety. (I mean, duh, right?) My teaching style relies on student trust quite a bit.
• So I strongly prefer to treat you as the honorable adults you are. I think I teach better and you learn better that way.
• I also believe that real cheaters get theirs, even if not directly from me.