Hi there, and thanks for inviting me. I understand that John Wilbanks was instrumental in getting me invited, so thanks, John, and thanks also to Lara for clear communication about the nature and purpose of this event. I also wish to acknowledge that I stand on the unceded territory of a great many Coast Salish peoples.
who don’t know me already, I’m a librarian and I am totally Team Librarian. What I actually do these days is teach in the Information School at the University of Wisconsin-Madison. I train people to join Team Librarian, and also Team Web Manager and Team Database Admin and Team Digital Preservationist and Team Research Data Steward and Team Digitizer and Team Cybersecurity and, you know, lots of related teams having to do with information and data management in some form or other, that wasn’t an exhaustive list or anything.
steward hat on, I’ve worked with biomedical researchers, both directly and indirectly, and the impression I usually get is that all they want to do with data from a ﬁnished project is put a tombstone over it. Project’s done, data’s dead, bury it. And it may surprise you to hear a librarian and research-data steward say this, and especially say this HERE, but that’s… not actually the worst way for a biomedical researcher to be. The human-subjects research enterprise is built on human beings TRUSTING researchers and research. Memory, both individual and cultural, deﬁnitely ﬁgures into trust, because a lot of DIStrust is built on remembrance of wrong, and a lot of trust boils down to trust in the act of forgetting.
cropped The of Research MEMORY But of course the memory of research is changing, or we wouldn’t be here today talking about algorithms, because there wouldn’t be nearly as much data to run the algorithms ON as there actually is! And digital deﬁnitely changes WHAT we remember about research, how MUCH we remember, how well we remember it, and how much we can DO with what we remember. And my sense is that these changes have serious implications for conducting human-subjects research ethically, with appropriate regard for the human beings you are studying. So let’s talk about that.
St. Paul” https://www.ﬂickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So, as I said I’m a librarian, and I earned my way through library school working on a grant-funded project transcribing early 20th-century census records from microﬁlm—speaking of memory tools—into a computer for programmable analysis. The principal investigator for this grant was a demographer, a population scientist. And the census records we were transcribing were speciﬁcally limited to the island of Puerto Rico. So, let’s see who’s paying attention: Why would a demographer be interested in the Puerto Rican population of the early-to-mid 20th century, and what does that have to do with biomedical research?
Here’s a hint in the form of some seriously great Puerto Rican street art. Jog anybody’s memory? (if nobody STILL gets it) Wow. I’m disappointed. This is Research Ethics 101 stuﬀ. (if somebody got it) Good memory you have there! I’m curious, how did you know about this? Puerto Rico is interesting to demographers because it was Ground Zero for the early-to-mid twentieth-century eugenics movement that originated among wealthy white people in the United States, and was directed largely at poor people, people with disabilities, people of color, and the intersections among those groups. This movement took several diﬀerent forms, but what’s especially relevant to us here today talking about digital biomedicine is the widespread, decades-long ABUSE of Puerto Rican women as human subjects for research into birth control, particularly though not exclusively birth-control pills and surgical sterilization techniques.
ethical horror, so instead I’ll recommend a couple of excellent books about it. Iris López’s book Matters of Choice is matter-of-fact and utterly devastating reading. Lourdes Lugo-Ortiz, in Tropiezos con la memoria, does a ton of work picking apart the Puerto Rican press discourse of the time around eugenics and female sterilization.
St. Paul” https://www.ﬂickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So how do we still have memories of the research-ethics breaches in Puerto Rico, and what form do those memories take? Not really records of the research. If there are consent forms still extant anywhere, for example, I don’t know about them, though we’re pretty sure consent processes happened for at least some of the work, because we know a lot of those researchers LIED to a lot of women. We know through other forms of cultural memory, such as the census records I transcribed, the interviews Iris López did, and the preserved Puerto Rican newspapers that Lourdes Lugo-Ortiz analyzed in her book. So let’s talk about what digital memories look like. Digital memories of research, digital memories in life more generally, and how those intersect.
I can use it, it’s my own photo! You can too; it’s licensed CC-BY 4.0 International. I come from Madison, so I am a Mad Information Scientist—here, I’ll put on my Mad Information Scientist headgear to prove it— and in this photo is my latest Mad Information Science. It’s called RADD, which stands for Recover Analog and Digital Data, because that is what it does, it quite literally saves memories. Senator Thom Tillis in the Facebook hearings, talking about a “history grabber machine?” This. This is that. It’s rescued oral histories from open-reel audiotapes, community-created video from VHS and U-Matic tapes, and—you’ll love this—it’s rescued biomedical research data from ﬁve-and-a-quarter-inch ﬂoppy diskettes. Remember those?
you’re expecting me to go oﬀ on a rant about the coming digital information apocalypse, well, sorry to disappoint. All information is fragile unless properly cared for, no matter how it’s recorded or what it’s recorded ON. Got a stone tablet? Gimme a sledgehammer. And digital information actually has a signiﬁcant preservation virtue: the veriﬁably perfect copy. So if the will is there and the money is there, we can save as much digital information as we decide we want to. Not to say digital preservation is cheap or easy, it’s NOT, just saying that NO kind of memory preservation is cheap or easy. But that doesn’t make a digital information apocalypse inevitable, like some people say. I train people who prevent that. That’s my job. (LOSE THE HEADGEAR.)
leaks out into the world. A lot. Ask Equifax. Or, closer to home, a whole lot of research institutions, apparently. I’m at an institution that got spear-phished by the Mabna Institute, maybe you are too? Anyway, the longer digital data lingers, the more likely it is to leak.
data packs up way smaller than paper—just LOOK at this photo, how many petabytes’ worth of hard drives could I ﬁt into the space those books are taking up?—we as societies are remembering a lot more data about people than we used to, and grabbing more any way we can think of. Not just in biomedical research, though y’all are certainly busy little data-grabbers, but in lots of OTHER kinds of research, AND in government, not to say law enforcement, AND in marketing, AND in education, AND online. And giant wodges of this data about people end up—one way or another—with companies called “data brokers” who are in business to aggregate and sell it. And in a truly weird parallel with print books, we never seem to want to throw any of the data away. So our memory of people— individual people—has expanded VASTLY in the last quarter-century or so. We have NEVER BEFORE in our history as a species been able to remember so much about so many diﬀerent people!
cropped, animated One really big problem: your anonymized data isn’t. Period, exclamation point, anonymization is OVER. There’s just too much data available about too many people for standard anonymization techniques to work, and I’m frankly not entirely convinced by the newfangled fuzzing techniques I’m reading about. *CLICK* I can be picked right out of a crowd, any crowd, based on data about me. So can you, and so can anybody else. Not just that, you and I don’t know when some health-related or even NON-health-related service we use is going to completely roll over on our health data, making it way easier to pick us out of a crowd or use our data against us—just ask any HIV-positive person who’s used Grindr. Given that, it’s not just you in your research lab or startup I have to worry about. It’s everybody ELSE in THEIR research labs. It’s Google, and news websites, and Facebook and Cambridge Analytica, and Acxiom, and random hackers. It’s the NSA and FBI and ICE and their analogues across the world. It’s health-care providers and HMOs and insurance companies and their craptastically insecure computer systems. It’s Fitbit and Strava and UnderArmour and the rest of the Internet of Craptastically Insecure Health Surveillance Things. It’s my very own employer, who would love to ﬁnd a data-driven excuse to kick me oﬀ their health-insurance rolls because—well, take one look at me, we all know why. Any insurer or HMO who doesn’t try to kick out fat people like me has shareholders to answer to. So if harm could come to someone if the data your research collects about them is tied to them—and that’s exactly the assumption on which anonymization as a harm reducer is based—then y’all have a PROBLEM, because every single one of the people you’re working with is at major risk of exactly that harm. We. Let. Computers. Remember. Too. Much. Data. About. People.
and algorithms coming from modern biomedical research couldn’t possibly be used to harm or limit individuals unfairly, could they? Surely there’s no potential for abuse. (sorry, let me wipe the sarcasm oﬀ my keyboard here. if you sense that I am angry about this? yes. I am ANGRY about this.) Like, I hope we all know this isn’t even the most exploitable example I could have picked. Privacy isn’t just a librarian buzzword; it’s an umbrella defense against known ways people get hurt, okay?
digital memory. I’ll return to Puerto Rico to discuss a second problem. Anybody recognize either of these folks? (Margaret Sanger, Clarence “Proctor and” Gamble) So Sanger, she was a racist, classist, ableist eugenicist. I’m not gonna soften that! She honestly, deeply believed that there needed to be fewer people in general, especially if they were poor, and fewer people with DISABILITIES and people of COLOR in particular. So she rustled up startup money from rich white racist society friends of hers for testing birth control pills in Puerto Rico. (And if you think I’m saying something about where startup money is coming from today and what the ethical implications of THAT are, gold star! I am.) I don’t have time to tell you even half the grossly unethical shenanigans Gamble got up to, but just as one example: He and the pharma company Searle purposefully chose poor women from a housing project in Rio Piedras to test the Pill on. Some of the women ended up in a nearby hospital with severe side eﬀects. What did our boy Clarence do about that? He shrugged and kept testing. Sanger and Gamble couldn’t see Puerto Rican women as people. Which leads me to ask y’all a couple of rhetorical questions: The Big Data datapoints that your algorithms are running on, do you see them as people? The numbers in your computer, do you remember that they’re people?
Grigg, “Not 100% Effective,” https://www.ﬂickr.com/photos/nateone/2713580189/ CC-BY But look, where I am as an individual in all this… I’m not of Puerto Rican ancestry myself, as far as I know. (Do I trust 23 and Me? Ahahahahaha no.) So besides having worked on that grant project, here’s the connection. Well, several connections. *CLICK* First, I have in the course of my life relied on Planned Parenthood, which Margaret Sanger founded, for reproductive health care. *CLICK* Second, I spent a few years taking birth control pills to control my fertility. And when the pill gave me high blood pressure, that was a known side eﬀect and my doctors knew how to deal with it. *CLICK* Third, when I decided I absolutely did not want to bear children, I had what in Puerto Rico they still call “la operación” myself—I had a tubal ligation, which went ﬁne and has done its job. So I beneﬁted directly from the unconscionable things Sanger and Gamble did in Puerto Rico. I am complicit. What’s more, the biomedical research enterprise along with money from creepy rich people MADE ME COMPLICIT in systematic dehumanization of, and direct harm to, Puerto Rican women. And I am HORRIFIED by that. I can’t just un-know this, somehow remove it from my memory; I sure hope now YOU can’t either. It is part of our collective memory now, as it should be. So how do I trust the industry that made me complicit in this? How does anybody trust you? Why should we?
an interesting thing I read in Lopez’s book. One of Sanger’s ﬁrst attempts to arrange a test of the Pill in Puerto Rico failed. Why? Because the women she tried to recruit were educated, had some social standing. They ﬁgured out there were shenanigans going on, and they felt conﬁdent enough to tell Sanger NOPE, step oﬀ, lady. So what did Sanger and Gamble do? Well, I told you already, ﬁnd uneducated women in poverty-stricken areas to test on, of course. Pieces of WORK, those two. So yeah. Me consenting to give you my health data for your algorithms amounts to letting you make me complicit in whatever shenanigans you get up to with it. AND whatever shenanigans anybody ELSE gets up to when you let them analyze it or combine it with THEIR data. AND whatever shenanigans happen when you leak or abandon or sell or bury your data and somebody else takes it over. And I don’t want to be complicit in any more racism, any more ableism, any more dehumanization than I already am!
on Big Data and algorithms have been sexist, racist, classist, ableist, and dehumanizing? For pity’s sake, you heard Dr. Melissa Creary, how much of it has been anything ELSE?! I don’t have time to even get started on this, but hey, this is a bit of my Pinboard where I keep all my web bookmarks, y’all can check it out. So I? Am a biomedical research refusenik. I WILL NOT knowingly be a subject of biomedical research, I WILL NOT be part of personalized medicine, and I WILL NOT just give you my data, about my health or anything else. NOPE. Y’all can step RIGHT oﬀ, I will not be complicit in y’all’s shenanigans.
lucky, at the moment. I’m not depending on current biomedical research or researchers just to go on living. But if any of you is thinking right now, “but my research is saving my research subjects’ lives!” let me just say, studying vulnerable people who are utterly dependent on your work does NOT exempt you from the responsibility to keep them safe from abuse, including abuse of the data you collect and analyze about them. It actually means you have MORE responsibility, because those folks don’t have the luxury I have of telling you…
step oﬀ. Now. I’m an educator. I teach people about data, algorithms, surveillance, reidentiﬁcation, and their personal and societal implications. So, ﬁnally, here’s your second problem: if I’m successful at my job, and I sure hope I am, at some point y’all are gonna have a real hard time ﬁnding people who will sign a consent form or download your app or release control over their data. My students will react like that ﬁrst group of Puerto Rican women: NOPE! Step oﬀ!
asserted. Leaky data is a third problem y’all have. Who’s this, anybody know? This is Doctor Bonnie Yankaskas, formerly a cancer researcher at North Carolina, and I’m sorry if any of y’all know her, but this memory of research is important. Yankaskas’s lab computers burbled away on the open Internet unsecured and unpatched, so they were predictably hacked. And those computers had a bunch of human-subject data for breast-cancer research, including most subjects’ social-security numbers, so most of the women were absolutely individually identiﬁable. Today, with all the data out there, I’m betting just about all of them would be identiﬁable. About a hundred-eighty thousand women represented in that data. Oh, and I want you all to know this if you don’t already: health data is SUPER sought-after in the black market for personal data. Lots of identiﬁers, lots of ﬁnancial and personal and family details, it’s juicy stuﬀ for identity thieves and social engineers. When the hack was ﬁnally noticed, did everybody involved feel appropriate horror, make amends, and do better? Yeah, no. There was this ghastly public slapﬁght in which Yankaskas blamed U-N-C, which turned around and blamed HER, and then she blamed the systems administrator that SHE HERSELF HIRED, and so on. So NOBODY AT ALL in this ﬁasco TOOK ANY RESPONSIBILITY FOR THOSE WOMEN AND THEIR DATA, much less for ﬁxing things to try to prevent the next such ﬁasco. Sure. That increases my trust in digital biomedicine and research memory a WHOLE lot.
y’all happens to ﬁnd Yankaskas’s leaked data, those hundred-eighty thousand de-anonymizable women, on the open web somewhere and you decide to use the data in your research. What’s your IRB gonna say? The data are openly available, no matter how they got to be that way, so your IRB is probably gonna say “public data, not our problem. Harm’s already been done, EXEMPT!” Y’all, that is NOT the answer that makes data less leaky, and it is NOT the answer that makes all y’all trustworthy! I mean, if I’m a REAL jerk researcher, I’m going to hack some other researcher’s data and accidentally-on-purpose leak it on the open web, because then it’s olly olly oxen free! (Possibly I should not have said that out loud.) And of course “public data” is the exact rationale researchers are using to con IRBs into letting them ferret out sensitive health data from social media. Without any consent process whatever, much less INFORMED consent. So the Common Rule not having anything helpful to say about use of so-called public data, or subsequent uses of already- collected data, is deﬁnitely part of the problem here. In a time of vastly expanded digital memory, not all harm to human subjects happens at the data-collection stage!
2018. http://genomemag.com/do-you-belong-to-you/ Fair use asserted. But look, some of y’all don’t even PRETEND to care about any of this. This here is from a recent news story about genomic data getting reused in subsequent research without its donors’ knowledge or consent. “Eh,” says Dr. Jorge Contreras of the University of Utah. “Everyone beneﬁts from medical research.” So whatever, right? Leaks, dehumanizing shenanigans, unconsented data collection and reuse, whatevs, anything goes, because everyone beneﬁts from medical research. Well. I tell you what. I carry in my memory of research a lot of women in Puerto Rico and North Carolina who did not beneﬁt. I wonder if Dr. Contreras knows about them? And look, this Contreras dude? He’s a big wheel! He’s on the Scientiﬁc Advisory Board of the Utah Genome Project, among plenty of other high-proﬁle work in science data policy. This dude right here is deciding what a whole lot of people just like you should be able to do with the intensely personal, intensely risk-laden health data of people just like me. I have SO MANY PROBLEMS with that. Biomedical researchers and entrepreneurs CANNOT be trusted, not with digital memory as sharp and as extensive as it is, until somebody makes this Contreras and everybody like him write out longhand “HUMAN PRIVACY, DIGNITY, AND SAFETY ARE MORE IMPORTANT THAN YOUR RESEARCH.” A hundred times, a thousand, however many times it takes to commit that to memory.
St. Paul” https://www.ﬂickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So, you might be wondering at this point, do I ﬁll out my census forms? I get why you’re curious. The Census Bureau has a lot of sensitive data, obviously, and its history is NOT pretty. It rolled over on Japanese-Americans in World War II, as well as Muslim- Americans in two-thousand-four. And now somebody wants to add a citizenship question to the twenty-twenty census? NOPE, talk about things I don’t want to be complicit in. But. In the Census Bureau’s minimal defense, it solved the random-third-party-data-exploitation problem ages ago. How’s that, you say? By only allowing access to individual records when over seventy years have passed since data collection. Seventy years. Are y’all ready to do that? If not, how am I supposed to trust you, considering vastly expanded digital memory? How is anybody else supposed to trust you either?
TIME RUNNING SHORT.) As I told you at the beginning of my talk, I’m a librarian and I’m totally Team Librarian. So understand how hard it is for me to say this: I am ﬁghting tooth and nail with my own much-loved profession right now over surveillance, digital surveillance in particular. We librarians talk REAL BIG about privacy, but we are also showing our underwear in some very major ways, and I’m not happy with that so I’m trying to do something about it. My profession, so it’s my mess, and I’m trying to clean it up. But here’s the thought this librarian wants to leave you with. Biomedicine folks, never mind web companies, aren’t even up to the privacy and ethics-of-care standards around data and memory that we librarians adopted for ourselves, lobbied to get made into law, and mostly held to in our ANALOG days. You didn’t and still don’t self-regulate and lobby to protect people the way we did. You adopted the Common Rule kicking and screaming, and some of you are trying to tear it down instead of building it up to where it needs to be to truly protect people. Some of you don’t even understand why you SHOULD self-regulate, much less be regulated. And some of you either don’t understand or don’t CARE about the damage digital research memory can do to the people who, conspicuously unlike me, are trusting you enough to let you study them.
you the same thing I’m telling my fellow librarians right now about digital surveillance: Don’t. Don’t be like that. Don’t do it. If Margaret Sanger and Clarence Gamble would have loved to use your data to further their racist classist ableist eugenics, seriously, don’t. Don’t admire Mark Zuckerberg and Sergey Brin and Larry Page and Jorge Contreras, much less emulate them, MUCH LESS GIVE THEM DATA. Don’t do it. DO BETTER. BE BETTER.
This presentation is copyright 2018 by Dorothea Salo. It is available under a Creative Commons Attribution 4.0 International license. Dorothea Salo Information School University of Wisconsin-Madison email@example.com https://speakerdeck.com/dsalo Thank you.