Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Memory of Research (with notes)

Memory of Research (with notes)

Given for the Sage Assembly meeting, 20 April 2018.

Dorothea Salo

April 20, 2018

More Decks by Dorothea Salo

Other Decks in Technology


  1. Photo: Tim Green, “Memory,” https://www.flickr.com/photos/atoach/3075699326/ CC-BY, cropped The of Research

    Hi there, and thanks for inviting me. I understand that John Wilbanks was instrumental in getting me invited, so thanks, John, and thanks also to Lara for clear communication about the nature and purpose of this event. I also wish to acknowledge that I stand on the unceded territory of a great many Coast Salish peoples.
  2. Photo: anneheathen, “Team Librarian bunting,” https://www.flickr.com/photos/annethelibrarian/5972028418/ CC-BY, cropped For those

    who don’t know me already, I’m a librarian and I am totally Team Librarian. What I actually do these days is teach in the Information School at the University of Wisconsin-Madison. I train people to join Team Librarian, and also Team Web Manager and Team Database Admin and Team Digital Preservationist and Team Research Data Steward and Team Digitizer and Team Cybersecurity and, you know, lots of related teams having to do with information and data management in some form or other, that wasn’t an exhaustive list or anything.
  3. Photo: Tim Green, “Memory,” https://www.flickr.com/photos/atoach/3075699326/ CC-BY, cropped With my research-data

    steward hat on, I’ve worked with biomedical researchers, both directly and indirectly, and the impression I usually get is that all they want to do with data from a finished project is put a tombstone over it. Project’s done, data’s dead, bury it. And it may surprise you to hear a librarian and research-data steward say this, and especially say this HERE, but that’s… not actually the worst way for a biomedical researcher to be. The human-subjects research enterprise is built on human beings TRUSTING researchers and research. Memory, both individual and cultural, definitely figures into trust, because a lot of DIStrust is built on remembrance of wrong, and a lot of trust boils down to trust in the act of forgetting.
  4. Photo: Paul Hudson, “126/366 - Old memory chips,” https://www.flickr.com/photos/pahudson/7359203182/ CC-BY,

    cropped The of Research MEMORY But of course the memory of research is changing, or we wouldn’t be here today talking about algorithms, because there wouldn’t be nearly as much data to run the algorithms ON as there actually is! And digital definitely changes WHAT we remember about research, how MUCH we remember, how well we remember it, and how much we can DO with what we remember. And my sense is that these changes have serious implications for conducting human-subjects research ethically, with appropriate regard for the human beings you are studying. So let’s talk about that.
  5. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of

    St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So, as I said I’m a librarian, and I earned my way through library school working on a grant-funded project transcribing early 20th-century census records from microfilm—speaking of memory tools—into a computer for programmable analysis. The principal investigator for this grant was a demographer, a population scientist. And the census records we were transcribing were specifically limited to the island of Puerto Rico. So, let’s see who’s paying attention: Why would a demographer be interested in the Puerto Rican population of the early-to-mid 20th century, and what does that have to do with biomedical research?
  6. Photo: Juan Cristobal Argueta, “Graffiti”https://www.flickr.com/photos/28312366@N08/14960339281/ CC-BY (if nobody got it)

    Here’s a hint in the form of some seriously great Puerto Rican street art. Jog anybody’s memory? (if nobody STILL gets it) Wow. I’m disappointed. This is Research Ethics 101 stuff. (if somebody got it) Good memory you have there! I’m curious, how did you know about this? Puerto Rico is interesting to demographers because it was Ground Zero for the early-to-mid twentieth-century eugenics movement that originated among wealthy white people in the United States, and was directed largely at poor people, people with disabilities, people of color, and the intersections among those groups. This movement took several different forms, but what’s especially relevant to us here today talking about digital biomedicine is the widespread, decades-long ABUSE of Puerto Rican women as human subjects for research into birth control, particularly though not exclusively birth-control pills and surgical sterilization techniques.
  7. I am not expert enough to talk extensively about this

    ethical horror, so instead I’ll recommend a couple of excellent books about it. Iris López’s book Matters of Choice is matter-of-fact and utterly devastating reading. Lourdes Lugo-Ortiz, in Tropiezos con la memoria, does a ton of work picking apart the Puerto Rican press discourse of the time around eugenics and female sterilization.
  8. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of

    St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So how do we still have memories of the research-ethics breaches in Puerto Rico, and what form do those memories take? Not really records of the research. If there are consent forms still extant anywhere, for example, I don’t know about them, though we’re pretty sure consent processes happened for at least some of the work, because we know a lot of those researchers LIED to a lot of women. We know through other forms of cultural memory, such as the census records I transcribed, the interviews Iris López did, and the preserved Puerto Rican newspapers that Lourdes Lugo-Ortiz analyzed in her book. So let’s talk about what digital memories look like. Digital memories of research, digital memories in life more generally, and how those intersect.
  9. Photo: Dorothea Salo, “Recover Analog and Digital Data” Of course

    I can use it, it’s my own photo! You can too; it’s licensed CC-BY 4.0 International. I come from Madison, so I am a Mad Information Scientist—here, I’ll put on my Mad Information Scientist headgear to prove it— and in this photo is my latest Mad Information Science. It’s called RADD, which stands for Recover Analog and Digital Data, because that is what it does, it quite literally saves memories. Senator Thom Tillis in the Facebook hearings, talking about a “history grabber machine?” This. This is that. It’s rescued oral histories from open-reel audiotapes, community-created video from VHS and U-Matic tapes, and—you’ll love this—it’s rescued biomedical research data from five-and-a-quarter-inch floppy diskettes. Remember those?
  10. Photo: Pete, “Irony Stops Play,” https://www.flickr.com/photos/comedynose/2705794133/ CC0, cropped And if

    you’re expecting me to go off on a rant about the coming digital information apocalypse, well, sorry to disappoint. All information is fragile unless properly cared for, no matter how it’s recorded or what it’s recorded ON. Got a stone tablet? Gimme a sledgehammer. And digital information actually has a significant preservation virtue: the verifiably perfect copy. So if the will is there and the money is there, we can save as much digital information as we decide we want to. Not to say digital preservation is cheap or easy, it’s NOT, just saying that NO kind of memory preservation is cheap or easy. But that doesn’t make a digital information apocalypse inevitable, like some people say. I train people who prevent that. That’s my job. (LOSE THE HEADGEAR.)
  11. Photo: Rikki’s Refuge, “Skunk” https://www.flickr.com/photos/rikkis_refuge/8295769018/ CC-BY Another thing about digital

    information vis-a-vis memory is that like a bad smell, it lingers unpredictably, including when it’s not necessarily supposed to, or when its lingering is not actually a good thing.
  12. Photo: Robert Hoge, “Drip” https://www.flickr.com/photos/32933171@N04/5067707348/ CC-BY, cropped And digital data

    leaks out into the world. A lot. Ask Equifax. Or, closer to home, a whole lot of research institutions, apparently. I’m at an institution that got spear-phished by the Mabna Institute, maybe you are too? Anyway, the longer digital data lingers, the more likely it is to leak.
  13. Photo: Natalie R., “1/9/10,” https://www.flickr.com/photos/theshrubberyblog/4260641084/ CC-BY, cropped And because digital

    data packs up way smaller than paper—just LOOK at this photo, how many petabytes’ worth of hard drives could I fit into the space those books are taking up?—we as societies are remembering a lot more data about people than we used to, and grabbing more any way we can think of. Not just in biomedical research, though y’all are certainly busy little data-grabbers, but in lots of OTHER kinds of research, AND in government, not to say law enforcement, AND in marketing, AND in education, AND online. And giant wodges of this data about people end up—one way or another—with companies called “data brokers” who are in business to aggregate and sell it. And in a truly weird parallel with print books, we never seem to want to throw any of the data away. So our memory of people— individual people—has expanded VASTLY in the last quarter-century or so. We have NEVER BEFORE in our history as a species been able to remember so much about so many different people!
  14. Photo: Neal Jennings, “So What?” https://www.flickr.com/photos/sweetone/2666516868/ CC-BY, cropped/brightened So what?

    What is the practical upshot of this vastly expanded digital memory, inside and outside biomedical research, for research ethics, and for you as researchers and health entrepreneurs?
  15. Photo: Parker Knight, “Run Like Hell 2015 800” https://www.flickr.com/photos/rocketboom/22483293881/ CC-BY,

    cropped, animated One really big problem: your anonymized data isn’t. Period, exclamation point, anonymization is OVER. There’s just too much data available about too many people for standard anonymization techniques to work, and I’m frankly not entirely convinced by the newfangled fuzzing techniques I’m reading about. *CLICK* I can be picked right out of a crowd, any crowd, based on data about me. So can you, and so can anybody else. Not just that, you and I don’t know when some health-related or even NON-health-related service we use is going to completely roll over on our health data, making it way easier to pick us out of a crowd or use our data against us—just ask any HIV-positive person who’s used Grindr. Given that, it’s not just you in your research lab or startup I have to worry about. It’s everybody ELSE in THEIR research labs. It’s Google, and news websites, and Facebook and Cambridge Analytica, and Acxiom, and random hackers. It’s the NSA and FBI and ICE and their analogues across the world. It’s health-care providers and HMOs and insurance companies and their craptastically insecure computer systems. It’s Fitbit and Strava and UnderArmour and the rest of the Internet of Craptastically Insecure Health Surveillance Things. It’s my very own employer, who would love to find a data-driven excuse to kick me off their health-insurance rolls because—well, take one look at me, we all know why. Any insurer or HMO who doesn’t try to kick out fat people like me has shareholders to answer to. So if harm could come to someone if the data your research collects about them is tied to them—and that’s exactly the assumption on which anonymization as a harm reducer is based—then y’all have a PROBLEM, because every single one of the people you’re working with is at major risk of exactly that harm. We. Let. Computers. Remember. Too. Much. Data. About. People.
  16. Story from MIT Technology Review. Fair use asserted. But data

    and algorithms coming from modern biomedical research couldn’t possibly be used to harm or limit individuals unfairly, could they? Surely there’s no potential for abuse. (sorry, let me wipe the sarcasm off my keyboard here. if you sense that I am angry about this? yes. I am ANGRY about this.) Like, I hope we all know this isn’t even the most exploitable example I could have picked. Privacy isn’t just a librarian buzzword; it’s an umbrella defense against known ways people get hurt, okay?
  17. The death of anonymization is just one problem created by

    digital memory. I’ll return to Puerto Rico to discuss a second problem. Anybody recognize either of these folks? (Margaret Sanger, Clarence “Proctor and” Gamble) So Sanger, she was a racist, classist, ableist eugenicist. I’m not gonna soften that! She honestly, deeply believed that there needed to be fewer people in general, especially if they were poor, and fewer people with DISABILITIES and people of COLOR in particular. So she rustled up startup money from rich white racist society friends of hers for testing birth control pills in Puerto Rico. (And if you think I’m saying something about where startup money is coming from today and what the ethical implications of THAT are, gold star! I am.) I don’t have time to tell you even half the grossly unethical shenanigans Gamble got up to, but just as one example: He and the pharma company Searle purposefully chose poor women from a housing project in Rio Piedras to test the Pill on. Some of the women ended up in a nearby hospital with severe side effects. What did our boy Clarence do about that? He shrugged and kept testing. Sanger and Gamble couldn’t see Puerto Rican women as people. Which leads me to ask y’all a couple of rhetorical questions: The Big Data datapoints that your algorithms are running on, do you see them as people? The numbers in your computer, do you remember that they’re people?
  18. Photo: Hey Paul Studios, “Uterus Art,” https://www.flickr.com/photos/hey__paul/5870794493/ CC-BY Photo: Nate

    Grigg, “Not 100% Effective,” https://www.flickr.com/photos/nateone/2713580189/ CC-BY But look, where I am as an individual in all this… I’m not of Puerto Rican ancestry myself, as far as I know. (Do I trust 23 and Me? Ahahahahaha no.) So besides having worked on that grant project, here’s the connection. Well, several connections. *CLICK* First, I have in the course of my life relied on Planned Parenthood, which Margaret Sanger founded, for reproductive health care. *CLICK* Second, I spent a few years taking birth control pills to control my fertility. And when the pill gave me high blood pressure, that was a known side effect and my doctors knew how to deal with it. *CLICK* Third, when I decided I absolutely did not want to bear children, I had what in Puerto Rico they still call “la operación” myself—I had a tubal ligation, which went fine and has done its job. So I benefited directly from the unconscionable things Sanger and Gamble did in Puerto Rico. I am complicit. What’s more, the biomedical research enterprise along with money from creepy rich people MADE ME COMPLICIT in systematic dehumanization of, and direct harm to, Puerto Rican women. And I am HORRIFIED by that. I can’t just un-know this, somehow remove it from my memory; I sure hope now YOU can’t either. It is part of our collective memory now, as it should be. So how do I trust the industry that made me complicit in this? How does anybody trust you? Why should we?
  19. Photo: Erokism, “Nope in Manchester,” https://www.flickr.com/photos/10295270@N05/3858329955/ CC-BY, cropped/brightened Now here’s

    an interesting thing I read in Lopez’s book. One of Sanger’s first attempts to arrange a test of the Pill in Puerto Rico failed. Why? Because the women she tried to recruit were educated, had some social standing. They figured out there were shenanigans going on, and they felt confident enough to tell Sanger NOPE, step off, lady. So what did Sanger and Gamble do? Well, I told you already, find uneducated women in poverty-stricken areas to test on, of course. Pieces of WORK, those two. So yeah. Me consenting to give you my health data for your algorithms amounts to letting you make me complicit in whatever shenanigans you get up to with it. AND whatever shenanigans anybody ELSE gets up to when you let them analyze it or combine it with THEIR data. AND whatever shenanigans happen when you leak or abandon or sell or bury your data and somebody else takes it over. And I don’t want to be complicit in any more racism, any more ableism, any more dehumanization than I already am!
  20. And do I have evidence that research and practice based

    on Big Data and algorithms have been sexist, racist, classist, ableist, and dehumanizing? For pity’s sake, you heard Dr. Melissa Creary, how much of it has been anything ELSE?! I don’t have time to even get started on this, but hey, this is a bit of my Pinboard where I keep all my web bookmarks, y’all can check it out. So I? Am a biomedical research refusenik. I WILL NOT knowingly be a subject of biomedical research, I WILL NOT be part of personalized medicine, and I WILL NOT just give you my data, about my health or anything else. NOPE. Y’all can step RIGHT off, I will not be complicit in y’all’s shenanigans.
  21. Photo: Bride of Frankenstein, “Lucky” https://www.flickr.com/photos/frankenhut/2776373150/ CC-BY, cropped And I’m

    lucky, at the moment. I’m not depending on current biomedical research or researchers just to go on living. But if any of you is thinking right now, “but my research is saving my research subjects’ lives!” let me just say, studying vulnerable people who are utterly dependent on your work does NOT exempt you from the responsibility to keep them safe from abuse, including abuse of the data you collect and analyze about them. It actually means you have MORE responsibility, because those folks don’t have the luxury I have of telling you…
  22. Photo: Erokism, “Nope in Manchester,” https://www.flickr.com/photos/10295270@N05/3858329955/ CC-BY, cropped/brightened … NOPE,

    step off. Now. I’m an educator. I teach people about data, algorithms, surveillance, reidentification, and their personal and societal implications. So, finally, here’s your second problem: if I’m successful at my job, and I sure hope I am, at some point y’all are gonna have a real hard time finding people who will sign a consent form or download your app or release control over their data. My students will react like that first group of Puerto Rican women: NOPE! Step off!
  23. Photo: From the Chronicle of Higher Education, https://www.chronicle.com/article/Chapel-Hill-Researcher-Fights/124821. Fair use

    asserted. Leaky data is a third problem y’all have. Who’s this, anybody know? This is Doctor Bonnie Yankaskas, formerly a cancer researcher at North Carolina, and I’m sorry if any of y’all know her, but this memory of research is important. Yankaskas’s lab computers burbled away on the open Internet unsecured and unpatched, so they were predictably hacked. And those computers had a bunch of human-subject data for breast-cancer research, including most subjects’ social-security numbers, so most of the women were absolutely individually identifiable. Today, with all the data out there, I’m betting just about all of them would be identifiable. About a hundred-eighty thousand women represented in that data. Oh, and I want you all to know this if you don’t already: health data is SUPER sought-after in the black market for personal data. Lots of identifiers, lots of financial and personal and family details, it’s juicy stuff for identity thieves and social engineers. When the hack was finally noticed, did everybody involved feel appropriate horror, make amends, and do better? Yeah, no. There was this ghastly public slapfight in which Yankaskas blamed U-N-C, which turned around and blamed HER, and then she blamed the systems administrator that SHE HERSELF HIRED, and so on. So NOBODY AT ALL in this fiasco TOOK ANY RESPONSIBILITY FOR THOSE WOMEN AND THEIR DATA, much less for fixing things to try to prevent the next such fiasco. Sure. That increases my trust in digital biomedicine and research memory a WHOLE lot.
  24. Photo: Martin Cooper, “IRB,” https://www.flickr.com/photos/m-a-r-t-i-n/15766212458/, CC-BY Let’s say one of

    y’all happens to find Yankaskas’s leaked data, those hundred-eighty thousand de-anonymizable women, on the open web somewhere and you decide to use the data in your research. What’s your IRB gonna say? The data are openly available, no matter how they got to be that way, so your IRB is probably gonna say “public data, not our problem. Harm’s already been done, EXEMPT!” Y’all, that is NOT the answer that makes data less leaky, and it is NOT the answer that makes all y’all trustworthy! I mean, if I’m a REAL jerk researcher, I’m going to hack some other researcher’s data and accidentally-on-purpose leak it on the open web, because then it’s olly olly oxen free! (Possibly I should not have said that out loud.) And of course “public data” is the exact rationale researchers are using to con IRBs into letting them ferret out sensitive health data from social media. Without any consent process whatever, much less INFORMED consent. So the Common Rule not having anything helpful to say about use of so-called public data, or subsequent uses of already- collected data, is definitely part of the problem here. In a time of vastly expanded digital memory, not all harm to human subjects happens at the data-collection stage!
  25. Story: Misha Angrist, “Do You Belong To You?” 2 January

    2018. http://genomemag.com/do-you-belong-to-you/ Fair use asserted. But look, some of y’all don’t even PRETEND to care about any of this. This here is from a recent news story about genomic data getting reused in subsequent research without its donors’ knowledge or consent. “Eh,” says Dr. Jorge Contreras of the University of Utah. “Everyone benefits from medical research.” So whatever, right? Leaks, dehumanizing shenanigans, unconsented data collection and reuse, whatevs, anything goes, because everyone benefits from medical research. Well. I tell you what. I carry in my memory of research a lot of women in Puerto Rico and North Carolina who did not benefit. I wonder if Dr. Contreras knows about them? And look, this Contreras dude? He’s a big wheel! He’s on the Scientific Advisory Board of the Utah Genome Project, among plenty of other high-profile work in science data policy. This dude right here is deciding what a whole lot of people just like you should be able to do with the intensely personal, intensely risk-laden health data of people just like me. I have SO MANY PROBLEMS with that. Biomedical researchers and entrepreneurs CANNOT be trusted, not with digital memory as sharp and as extensive as it is, until somebody makes this Contreras and everybody like him write out longhand “HUMAN PRIVACY, DIGNITY, AND SAFETY ARE MORE IMPORTANT THAN YOUR RESEARCH.” A hundred times, a thousand, however many times it takes to commit that to memory.
  26. Otherwise we get this. Which is so obscene I cannot

    even find the words. We all know this isn’t even the latest Facebook-gets-sketchy-with-health-data story, right? Right? Good.
  27. With thanks to Ryan Baumann on Twitter. But hey, I

    could live in the UK where the National Health Service illegally handed a crapton of health data to Google! No notification, no consent. But I’m here in the US, so everything’s shiny, right?
  28. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of

    St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/ CC-BY, cropped So, you might be wondering at this point, do I fill out my census forms? I get why you’re curious. The Census Bureau has a lot of sensitive data, obviously, and its history is NOT pretty. It rolled over on Japanese-Americans in World War II, as well as Muslim- Americans in two-thousand-four. And now somebody wants to add a citizenship question to the twenty-twenty census? NOPE, talk about things I don’t want to be complicit in. But. In the Census Bureau’s minimal defense, it solved the random-third-party-data-exploitation problem ages ago. How’s that, you say? By only allowing access to individual records when over seventy years have passed since data collection. Seventy years. Are y’all ready to do that? If not, how am I supposed to trust you, considering vastly expanded digital memory? How is anybody else supposed to trust you either?
  29. Photo: Sean MacEntee, “privacy,” https://www.flickr.com/photos/smemon/4592915995/ CC-BY (SKIP THIS SLIDE IF

    TIME RUNNING SHORT.) As I told you at the beginning of my talk, I’m a librarian and I’m totally Team Librarian. So understand how hard it is for me to say this: I am fighting tooth and nail with my own much-loved profession right now over surveillance, digital surveillance in particular. We librarians talk REAL BIG about privacy, but we are also showing our underwear in some very major ways, and I’m not happy with that so I’m trying to do something about it. My profession, so it’s my mess, and I’m trying to clean it up. But here’s the thought this librarian wants to leave you with. Biomedicine folks, never mind web companies, aren’t even up to the privacy and ethics-of-care standards around data and memory that we librarians adopted for ourselves, lobbied to get made into law, and mostly held to in our ANALOG days. You didn’t and still don’t self-regulate and lobby to protect people the way we did. You adopted the Common Rule kicking and screaming, and some of you are trying to tear it down instead of building it up to where it needs to be to truly protect people. Some of you don’t even understand why you SHOULD self-regulate, much less be regulated. And some of you either don’t understand or don’t CARE about the damage digital research memory can do to the people who, conspicuously unlike me, are trusting you enough to let you study them.
  30. And the endgame there is that all of you as

    researchers and entrepreneurs, and all of your projects, suffer because some of you are a Margaret-Sanger-Clarence-Gamble level of untrustworthy or clueless.
  31. Photo: @markheybo, “Don’t Do It.” https://www.flickr.com/photos/cybercafe/6623373705/ CC-BY So I’ll tell

    you the same thing I’m telling my fellow librarians right now about digital surveillance: Don’t. Don’t be like that. Don’t do it. If Margaret Sanger and Clarence Gamble would have loved to use your data to further their racist classist ableist eugenics, seriously, don’t. Don’t admire Mark Zuckerberg and Sergey Brin and Larry Page and Jorge Contreras, much less emulate them, MUCH LESS GIVE THEM DATA. Don’t do it. DO BETTER. BE BETTER.
  32. Photo: @markheybo, “Don’t Do It.” https://www.flickr.com/photos/cybercafe/6623373705/ CC-BY, darkened Thank you.

    This presentation is copyright 2018 by Dorothea Salo. It is available under a Creative Commons Attribution 4.0 International license. Dorothea Salo Information School University of Wisconsin-Madison [email protected] https://speakerdeck.com/dsalo Thank you.