$30 off During Our Annual Pro Sale. View Details »

Memory of Research (with notes)

Memory of Research (with notes)

Given for the Sage Assembly meeting, 20 April 2018.

Dorothea Salo

April 20, 2018

More Decks by Dorothea Salo

Other Decks in Technology


  1. Photo: Tim Green, “Memory,”
    CC-BY, cropped
    of Research
    Hi there, and thanks for inviting me. I understand that John Wilbanks was instrumental in getting me invited, so thanks, John,
    and thanks also to Lara for clear communication about the nature and purpose of this event. I also wish to acknowledge that I
    stand on the unceded territory of a great many Coast Salish peoples.

    View Slide

  2. Photo: anneheathen, “Team Librarian bunting,” https://www.flickr.com/photos/annethelibrarian/5972028418/
    CC-BY, cropped
    For those who don’t know me already, I’m a librarian and I am totally Team Librarian.

    What I actually do these days is teach in the Information School at the University of Wisconsin-Madison. I train people to join
    Team Librarian, and also Team Web Manager and Team Database Admin and Team Digital Preservationist and Team Research
    Data Steward and Team Digitizer and Team Cybersecurity and, you know, lots of related teams having to do with information and
    data management in some form or other, that wasn’t an exhaustive list or anything.

    View Slide

  3. Photo: Tim Green, “Memory,”
    CC-BY, cropped
    With my research-data steward hat on, I’ve worked with biomedical researchers, both directly and indirectly, and the impression I
    usually get is that all they want to do with data from a finished project is put a tombstone over it. Project’s done, data’s dead,
    bury it.

    And it may surprise you to hear a librarian and research-data steward say this, and especially say this HERE, but that’s… not
    actually the worst way for a biomedical researcher to be. The human-subjects research enterprise is built on human beings
    TRUSTING researchers and research. Memory, both individual and cultural, definitely figures into trust, because a lot of DIStrust
    is built on remembrance of wrong, and a lot of trust boils down to trust in the act of forgetting.

    View Slide

  4. Photo: Paul Hudson, “126/366 - Old memory chips,”
    CC-BY, cropped
    of Research
    But of course the memory of research is changing, or we wouldn’t be here today talking about algorithms, because there
    wouldn’t be nearly as much data to run the algorithms ON as there actually is!

    And digital definitely changes WHAT we remember about research, how MUCH we remember, how well we remember it, and
    how much we can DO with what we remember. And my sense is that these changes have serious implications for conducting
    human-subjects research ethically, with appropriate regard for the human beings you are studying. So let’s talk about that.

    View Slide

  5. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/
    CC-BY, cropped
    So, as I said I’m a librarian, and I earned my way through library school working on a grant-funded project transcribing early
    20th-century census records from microfilm—speaking of memory tools—into a computer for programmable analysis.

    The principal investigator for this grant was a demographer, a population scientist. And the census records we were transcribing
    were specifically limited to the island of Puerto Rico. So, let’s see who’s paying attention: Why would a demographer be
    interested in the Puerto Rican population of the early-to-mid 20th century, and what does that have to do with biomedical

    View Slide

  6. Photo: Juan Cristobal Argueta, “Graffiti”https://www.flickr.com/photos/28312366@N08/14960339281/ CC-BY
    (if nobody got it) Here’s a hint in the form of some seriously great Puerto Rican street art. Jog anybody’s memory?

    (if nobody STILL gets it) Wow. I’m disappointed. This is Research Ethics 101 stuff.

    (if somebody got it) Good memory you have there! I’m curious, how did you know about this?

    Puerto Rico is interesting to demographers because it was Ground Zero for the early-to-mid twentieth-century eugenics
    movement that originated among wealthy white people in the United States, and was directed largely at poor people, people
    with disabilities, people of color, and the intersections among those groups. This movement took several different forms, but
    what’s especially relevant to us here today talking about digital biomedicine is the widespread, decades-long ABUSE of Puerto
    Rican women as human subjects for research into birth control, particularly though not exclusively birth-control pills and surgical
    sterilization techniques.

    View Slide

  7. I am not expert enough to talk extensively about this ethical horror, so instead I’ll recommend a couple of excellent books about
    it. Iris López’s book Matters of Choice is matter-of-fact and utterly devastating reading. Lourdes Lugo-Ortiz, in Tropiezos con la
    memoria, does a ton of work picking apart the Puerto Rican press discourse of the time around eugenics and female

    View Slide

  8. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/
    CC-BY, cropped
    So how do we still have memories of the research-ethics breaches in Puerto Rico, and what form do those memories take? Not
    really records of the research. If there are consent forms still extant anywhere, for example, I don’t know about them, though
    we’re pretty sure consent processes happened for at least some of the work, because we know a lot of those researchers LIED
    to a lot of women.

    We know through other forms of cultural memory, such as the census records I transcribed, the interviews Iris López did, and the
    preserved Puerto Rican newspapers that Lourdes Lugo-Ortiz analyzed in her book. So let’s talk about what digital memories
    look like. Digital memories of research, digital memories in life more generally, and how those intersect.

    View Slide

  9. Photo: Dorothea Salo, “Recover Analog and Digital Data”
    Of course I can use it, it’s my own photo! You can too; it’s licensed CC-BY 4.0 International.
    I come from Madison, so I am a Mad Information Scientist—here, I’ll put on my Mad Information Scientist headgear to prove it—
    and in this photo is my latest Mad Information Science. It’s called RADD, which stands for Recover Analog and Digital Data,
    because that is what it does, it quite literally saves memories. Senator Thom Tillis in the Facebook hearings, talking about a
    “history grabber machine?” This. This is that.

    It’s rescued oral histories from open-reel audiotapes, community-created video from VHS and U-Matic tapes, and—you’ll love
    this—it’s rescued biomedical research data from five-and-a-quarter-inch floppy diskettes. Remember those?

    View Slide

  10. Photo: Pete, “Irony Stops Play,”
    CC0, cropped
    And if you’re expecting me to go off on a rant about the coming digital information apocalypse, well, sorry to disappoint. All
    information is fragile unless properly cared for, no matter how it’s recorded or what it’s recorded ON. Got a stone tablet? Gimme
    a sledgehammer. And digital information actually has a significant preservation virtue: the verifiably perfect copy. So if the will is
    there and the money is there, we can save as much digital information as we decide we want to.

    Not to say digital preservation is cheap or easy, it’s NOT, just saying that NO kind of memory preservation is cheap or easy. But
    that doesn’t make a digital information apocalypse inevitable, like some people say. I train people who prevent that. That’s my


    View Slide

  11. Photo: Rikki’s Refuge, “Skunk”
    Another thing about digital information vis-a-vis memory is that like a bad smell, it lingers unpredictably, including when it’s not
    necessarily supposed to, or when its lingering is not actually a good thing.

    View Slide

  12. Photo: Robert Hoge, “Drip”
    CC-BY, cropped
    And digital data leaks out into the world.

    A lot.

    Ask Equifax. Or, closer to home, a whole lot of research institutions, apparently. I’m at an institution that got spear-phished by
    the Mabna Institute, maybe you are too? Anyway, the longer digital data lingers, the more likely it is to leak.

    View Slide

  13. Photo: Natalie R., “1/9/10,”
    CC-BY, cropped
    And because digital data packs up way smaller than paper—just LOOK at this photo, how many petabytes’ worth of hard drives
    could I fit into the space those books are taking up?—we as societies are remembering a lot more data about people than we
    used to, and grabbing more any way we can think of.

    Not just in biomedical research, though y’all are certainly busy little data-grabbers, but in lots of OTHER kinds of research, AND
    in government, not to say law enforcement, AND in marketing, AND in education, AND online. And giant wodges of this data
    about people end up—one way or another—with companies called “data brokers” who are in business to aggregate and sell it.

    And in a truly weird parallel with print books, we never seem to want to throw any of the data away. So our memory of people—
    individual people—has expanded VASTLY in the last quarter-century or so. We have NEVER BEFORE in our history as a species
    been able to remember so much about so many different people!

    View Slide

  14. Photo: Neal Jennings, “So What?”
    CC-BY, cropped/brightened
    So what? What is the practical upshot of this vastly expanded digital memory, inside and outside biomedical research, for
    research ethics, and for you as researchers and health entrepreneurs?

    View Slide

  15. Photo: Parker Knight, “Run Like Hell 2015 800”
    CC-BY, cropped, animated
    One really big problem: your anonymized data isn’t. Period, exclamation point, anonymization is OVER. There’s just too much data available
    about too many people for standard anonymization techniques to work, and I’m frankly not entirely convinced by the newfangled fuzzing
    techniques I’m reading about.

    *CLICK* I can be picked right out of a crowd, any crowd, based on data about me. So can you, and so can anybody else. Not just that, you
    and I don’t know when some health-related or even NON-health-related service we use is going to completely roll over on our health data,
    making it way easier to pick us out of a crowd or use our data against us—just ask any HIV-positive person who’s used Grindr.

    Given that, it’s not just you in your research lab or startup I have to worry about. It’s everybody ELSE in THEIR research labs. It’s Google, and
    news websites, and Facebook and Cambridge Analytica, and Acxiom, and random hackers. It’s the NSA and FBI and ICE and their analogues
    across the world. It’s health-care providers and HMOs and insurance companies and their craptastically insecure computer systems. It’s Fitbit
    and Strava and UnderArmour and the rest of the Internet of Craptastically Insecure Health Surveillance Things. It’s my very own employer,
    who would love to find a data-driven excuse to kick me off their health-insurance rolls because—well, take one look at me, we all know why.
    Any insurer or HMO who doesn’t try to kick out fat people like me has shareholders to answer to.

    So if harm could come to someone if the data your research collects about them is tied to them—and that’s exactly the assumption on which
    anonymization as a harm reducer is based—then y’all have a PROBLEM, because every single one of the people you’re working with is at
    major risk of exactly that harm. We. Let. Computers. Remember. Too. Much. Data. About. People.

    View Slide

  16. Story from MIT Technology Review.
    Fair use asserted.
    But data and algorithms coming from modern biomedical research couldn’t possibly be used to harm or limit individuals unfairly,
    could they? Surely there’s no potential for abuse.

    (sorry, let me wipe the sarcasm off my keyboard here. if you sense that I am angry about this? yes. I am ANGRY about this.)

    Like, I hope we all know this isn’t even the most exploitable example I could have picked. Privacy isn’t just a librarian buzzword;
    it’s an umbrella defense against known ways people get hurt, okay?

    View Slide

  17. The death of anonymization is just one problem created by digital memory. I’ll return to Puerto Rico to discuss a second problem. Anybody
    recognize either of these folks? (Margaret Sanger, Clarence “Proctor and” Gamble)

    So Sanger, she was a racist, classist, ableist eugenicist. I’m not gonna soften that! She honestly, deeply believed that there needed to be
    fewer people in general, especially if they were poor, and fewer people with DISABILITIES and people of COLOR in particular. So she rustled
    up startup money from rich white racist society friends of hers for testing birth control pills in Puerto Rico.

    (And if you think I’m saying something about where startup money is coming from today and what the ethical implications of THAT are, gold
    star! I am.)

    I don’t have time to tell you even half the grossly unethical shenanigans Gamble got up to, but just as one example: He and the pharma
    company Searle purposefully chose poor women from a housing project in Rio Piedras to test the Pill on. Some of the women ended up in a
    nearby hospital with severe side effects. What did our boy Clarence do about that? He shrugged and kept testing.

    Sanger and Gamble couldn’t see Puerto Rican women as people. Which leads me to ask y’all a couple of rhetorical questions: The Big Data
    datapoints that your algorithms are running on, do you see them as people? The numbers in your computer, do you remember that they’re

    View Slide

  18. Photo: Hey Paul Studios, “Uterus Art,”
    https://www.flickr.com/photos/hey__paul/5870794493/ CC-BY
    Photo: Nate Grigg, “Not 100% Effective,”
    https://www.flickr.com/photos/nateone/2713580189/ CC-BY
    But look, where I am as an individual in all this… I’m not of Puerto Rican ancestry myself, as far as I know. (Do I trust 23 and Me?
    Ahahahahaha no.) So besides having worked on that grant project, here’s the connection. Well, several connections.

    *CLICK* First, I have in the course of my life relied on Planned Parenthood, which Margaret Sanger founded, for reproductive health care.
    *CLICK* Second, I spent a few years taking birth control pills to control my fertility. And when the pill gave me high blood pressure, that was a
    known side effect and my doctors knew how to deal with it. *CLICK* Third, when I decided I absolutely did not want to bear children, I had
    what in Puerto Rico they still call “la operación” myself—I had a tubal ligation, which went fine and has done its job.

    So I benefited directly from the unconscionable things Sanger and Gamble did in Puerto Rico. I am complicit. What’s more, the biomedical
    research enterprise along with money from creepy rich people MADE ME COMPLICIT in systematic dehumanization of, and direct harm to,
    Puerto Rican women. And I am HORRIFIED by that.

    I can’t just un-know this, somehow remove it from my memory; I sure hope now YOU can’t either. It is part of our collective memory now, as it
    should be. So how do I trust the industry that made me complicit in this? How does anybody trust you? Why should we?

    View Slide

  19. Photo: Erokism, “Nope in Manchester,”
    CC-BY, cropped/brightened
    Now here’s an interesting thing I read in Lopez’s book. One of Sanger’s first attempts to arrange a test of the Pill in Puerto Rico
    failed. Why? Because the women she tried to recruit were educated, had some social standing. They figured out there were
    shenanigans going on, and they felt confident enough to tell Sanger NOPE, step off, lady.

    So what did Sanger and Gamble do? Well, I told you already, find uneducated women in poverty-stricken areas to test on, of
    course. Pieces of WORK, those two.

    So yeah. Me consenting to give you my health data for your algorithms amounts to letting you make me complicit in whatever
    shenanigans you get up to with it. AND whatever shenanigans anybody ELSE gets up to when you let them analyze it or
    combine it with THEIR data. AND whatever shenanigans happen when you leak or abandon or sell or bury your data and
    somebody else takes it over. And I don’t want to be complicit in any more racism, any more ableism, any more dehumanization
    than I already am!

    View Slide

  20. And do I have evidence that research and practice based on Big Data and algorithms have been sexist, racist, classist, ableist,
    and dehumanizing? For pity’s sake, you heard Dr. Melissa Creary, how much of it has been anything ELSE?! I don’t have time to
    even get started on this, but hey, this is a bit of my Pinboard where I keep all my web bookmarks, y’all can check it out.

    So I? Am a biomedical research refusenik. I WILL NOT knowingly be a subject of biomedical research, I WILL NOT be part of
    personalized medicine, and I WILL NOT just give you my data, about my health or anything else. NOPE. Y’all can step RIGHT off,
    I will not be complicit in y’all’s shenanigans.

    View Slide

  21. Photo: Bride of Frankenstein, “Lucky”
    CC-BY, cropped
    And I’m lucky, at the moment. I’m not depending on current biomedical research or researchers just to go on living. But if any of
    you is thinking right now, “but my research is saving my research subjects’ lives!” let me just say, studying vulnerable people
    who are utterly dependent on your work does NOT exempt you from the responsibility to keep them safe from abuse, including
    abuse of the data you collect and analyze about them.

    It actually means you have MORE responsibility, because those folks don’t have the luxury I have of telling you…

    View Slide

  22. Photo: Erokism, “Nope in Manchester,”
    CC-BY, cropped/brightened
    … NOPE, step off.

    Now. I’m an educator. I teach people about data, algorithms, surveillance, reidentification, and their personal and societal
    implications. So, finally, here’s your second problem: if I’m successful at my job, and I sure hope I am, at some point y’all are
    gonna have a real hard time finding people who will sign a consent form or download your app or release control over their data.
    My students will react like that first group of Puerto Rican women: NOPE! Step off!

    View Slide

  23. Photo: From the Chronicle of Higher Education, https://www.chronicle.com/article/Chapel-Hill-Researcher-Fights/124821.
    Fair use asserted.
    Leaky data is a third problem y’all have. Who’s this, anybody know? This is Doctor Bonnie Yankaskas, formerly a cancer researcher at North
    Carolina, and I’m sorry if any of y’all know her, but this memory of research is important. Yankaskas’s lab computers burbled away on the
    open Internet unsecured and unpatched, so they were predictably hacked. And those computers had a bunch of human-subject data for
    breast-cancer research, including most subjects’ social-security numbers, so most of the women were absolutely individually identifiable.
    Today, with all the data out there, I’m betting just about all of them would be identifiable. About a hundred-eighty thousand women
    represented in that data.

    Oh, and I want you all to know this if you don’t already: health data is SUPER sought-after in the black market for personal data. Lots of
    identifiers, lots of financial and personal and family details, it’s juicy stuff for identity thieves and social engineers.

    When the hack was finally noticed, did everybody involved feel appropriate horror, make amends, and do better? Yeah, no. There was this
    ghastly public slapfight in which Yankaskas blamed U-N-C, which turned around and blamed HER, and then she blamed the systems
    administrator that SHE HERSELF HIRED, and so on. So NOBODY AT ALL in this fiasco TOOK ANY RESPONSIBILITY FOR THOSE WOMEN
    AND THEIR DATA, much less for fixing things to try to prevent the next such fiasco.

    Sure. That increases my trust in digital biomedicine and research memory a WHOLE lot.

    View Slide

  24. Photo: Martin Cooper, “IRB,” https://www.flickr.com/photos/m-a-r-t-i-n/15766212458/, CC-BY
    Let’s say one of y’all happens to find Yankaskas’s leaked data, those hundred-eighty thousand de-anonymizable women, on the
    open web somewhere and you decide to use the data in your research. What’s your IRB gonna say? The data are openly
    available, no matter how they got to be that way, so your IRB is probably gonna say “public data, not our problem. Harm’s
    already been done, EXEMPT!” Y’all, that is NOT the answer that makes data less leaky, and it is NOT the answer that makes all
    y’all trustworthy!

    I mean, if I’m a REAL jerk researcher, I’m going to hack some other researcher’s data and accidentally-on-purpose leak it on the
    open web, because then it’s olly olly oxen free! (Possibly I should not have said that out loud.) And of course “public data” is the
    exact rationale researchers are using to con IRBs into letting them ferret out sensitive health data from social media. Without any
    consent process whatever, much less INFORMED consent.

    So the Common Rule not having anything helpful to say about use of so-called public data, or subsequent uses of already-
    collected data, is definitely part of the problem here. In a time of vastly expanded digital memory, not all harm to human subjects
    happens at the data-collection stage!

    View Slide

  25. Story: Misha Angrist, “Do You Belong To You?” 2 January 2018. http://genomemag.com/do-you-belong-to-you/
    Fair use asserted.
    But look, some of y’all don’t even PRETEND to care about any of this. This here is from a recent news story about genomic data getting
    reused in subsequent research without its donors’ knowledge or consent. “Eh,” says Dr. Jorge Contreras of the University of Utah. “Everyone
    benefits from medical research.” So whatever, right? Leaks, dehumanizing shenanigans, unconsented data collection and reuse, whatevs,
    anything goes, because everyone benefits from medical research.

    Well. I tell you what. I carry in my memory of research a lot of women in Puerto Rico and North Carolina who did not benefit. I wonder if Dr.
    Contreras knows about them?

    And look, this Contreras dude? He’s a big wheel! He’s on the Scientific Advisory Board of the Utah Genome Project, among plenty of other
    high-profile work in science data policy. This dude right here is deciding what a whole lot of people just like you should be able to do with the
    intensely personal, intensely risk-laden health data of people just like me. I have SO MANY PROBLEMS with that.

    Biomedical researchers and entrepreneurs CANNOT be trusted, not with digital memory as sharp and as extensive as it is, until somebody
    makes this Contreras and everybody like him write out longhand “HUMAN PRIVACY, DIGNITY, AND SAFETY ARE MORE IMPORTANT THAN
    YOUR RESEARCH.” A hundred times, a thousand, however many times it takes to commit that to memory.

    View Slide

  26. Otherwise we get this. Which is so obscene I cannot even find the words.

    We all know this isn’t even the latest Facebook-gets-sketchy-with-health-data story, right? Right? Good.

    View Slide

  27. With thanks to Ryan Baumann on Twitter.
    But hey, I could live in the UK where the National Health Service illegally handed a crapton of health data to Google! No
    notification, no consent. But I’m here in the US, so everything’s shiny, right?

    View Slide

  28. Photo: Daniel X. O’Neil, “1910 Census for 2000 Block of St. Paul” https://www.flickr.com/photos/juggernautco/6063339595/
    CC-BY, cropped
    So, you might be wondering at this point, do I fill out my census forms? I get why you’re curious. The Census Bureau has a lot of
    sensitive data, obviously, and its history is NOT pretty. It rolled over on Japanese-Americans in World War II, as well as Muslim-
    Americans in two-thousand-four. And now somebody wants to add a citizenship question to the twenty-twenty census? NOPE,
    talk about things I don’t want to be complicit in.

    But. In the Census Bureau’s minimal defense, it solved the random-third-party-data-exploitation problem ages ago. How’s that,
    you say? By only allowing access to individual records when over seventy years have passed since data collection.

    Seventy years. Are y’all ready to do that? If not, how am I supposed to trust you, considering vastly expanded digital memory?
    How is anybody else supposed to trust you either?

    View Slide

  29. Photo: Sean MacEntee, “privacy,” https://www.flickr.com/photos/smemon/4592915995/
    As I told you at the beginning of my talk, I’m a librarian and I’m totally Team Librarian. So understand how hard it is for me to say
    this: I am fighting tooth and nail with my own much-loved profession right now over surveillance, digital surveillance in particular.
    We librarians talk REAL BIG about privacy, but we are also showing our underwear in some very major ways, and I’m not happy
    with that so I’m trying to do something about it. My profession, so it’s my mess, and I’m trying to clean it up.

    But here’s the thought this librarian wants to leave you with. Biomedicine folks, never mind web companies, aren’t even up to the
    privacy and ethics-of-care standards around data and memory that we librarians adopted for ourselves, lobbied to get made into
    law, and mostly held to in our ANALOG days. You didn’t and still don’t self-regulate and lobby to protect people the way we did.
    You adopted the Common Rule kicking and screaming, and some of you are trying to tear it down instead of building it up to
    where it needs to be to truly protect people. Some of you don’t even understand why you SHOULD self-regulate, much less be
    regulated. And some of you either don’t understand or don’t CARE about the damage digital research memory can do to the
    people who, conspicuously unlike me, are trusting you enough to let you study them.

    View Slide

  30. And the endgame there is that all of you as researchers and entrepreneurs, and all of your projects, suffer because some of you
    are a Margaret-Sanger-Clarence-Gamble level of untrustworthy or clueless.

    View Slide

  31. Photo: @markheybo, “Don’t Do It.”
    https://www.flickr.com/photos/cybercafe/6623373705/ CC-BY
    So I’ll tell you the same thing I’m telling my fellow librarians right now about digital surveillance: Don’t. Don’t be like that. Don’t
    do it. If Margaret Sanger and Clarence Gamble would have loved to use your data to further their racist classist ableist eugenics,
    seriously, don’t. Don’t admire Mark Zuckerberg and Sergey Brin and Larry Page and Jorge Contreras, much less emulate them,
    MUCH LESS GIVE THEM DATA. Don’t do it.


    View Slide

  32. Photo: @markheybo, “Don’t Do It.”
    https://www.flickr.com/photos/cybercafe/6623373705/ CC-BY, darkened
    Thank you.
    This presentation is copyright 2018 by Dorothea Salo.
    It is available under a Creative Commons Attribution 4.0
    International license.
    Dorothea Salo
    Information School
    University of Wisconsin-Madison
    [email protected]
    Thank you.

    View Slide