Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Patron Data, Patron Peril

Dorothea Salo
November 10, 2020

Patron Data, Patron Peril

Given for the University of Iowa Libraries.

Dorothea Salo

November 10, 2020
Tweet

More Decks by Dorothea Salo

Other Decks in Education

Transcript

  1. Patron Data, Patron Peril? Keeping ourselves and our patrons safe

    Dorothea Salo Information School University of Wisconsin-Madison
  2. Apologies, first ✦ This talk is whomped up out of

    an amalgam of ✦ course slidedecks ✦ earlier talks ✦ a forthcoming article ✦ ongoing research (datadoubles.org, and for clarity, I do not represent this project or its other investigators today) ✦ It won’t hold together as well as I like my talks to do. It certainly doesn’t have pretty slides! ✦ I’m sorry. I ask for and appreciate your patience. ✦ Silver lining: I don’t mind tangents! They can’t interrupt a flow that doesn’t exist! So ask all the questions you like whenever you like.
  3. Pivot, second ✦ The request for this talk came from

    a learner in my Information Security and Privacy course. I was originally asked to catalog privacy dangers and demonstrate threat models. ✦ I don’t want to do that right now, though. I’m raw and tired, and I know I’m not the only one. ✦ Recommended, if you want this: Morrone et al’s https:// dataprivacyproject.org/learning-modules/risk-assessment/ ✦ So, instead, here’s my plan: ✦ Foundations: why privacy in libraries? ✦ Situation report: what are today’s threats to library privacy specifically? (spoiler: there are lots!) ✦ Blameless post-mortem: how did we let this happen? ✦ Testing a heuristic: “physical-equivalent privacy.” How can we think differently so that this stops happening?
  4. Physical-equivalent privacy? ✦ Yes. Article with this title forthcoming in

    a privacy-themed issue of Serials Review. I don’t know exactly when. ✦ I can’t make it open-access until publication. Honestly, I’m chewing my fingernails about that. But as soon as it goes live, I’ll put my accepted manuscript in MINDS@UW. ✦ I also have no room to criticize the publication schedule, because I turned in my manuscript a month late! (Love you, SR editors!) ✦ But if you want a preview (beyond this talk)… ✦ … go look at the slides from my NASIG 2015 keynote, especially the slide about video surveillance, because that’s where the idea began. ✦ https://speakerdeck.com/dsalo/aint-nobodys-business-if-i-do-read- serials-with-notes
  5. Ethics codes ✦ IFLA: “… respect for personal privacy, protection

    of personal data, and confidentiality in the relationship between the user and library…” ✦ https://www.ifla.org/publications/node/10056 and it’s excellent, the best and most situationally-aware document libraries have ✦ ALA: “We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.” ✦ ACRL: “The privacy of library users is and must be inviolable.”
  6. The thing about ethics codes is… ✦ … they’re largely

    deontological. Here Are Your Principles, Go Forth And Observe Them. ✦ Fine as far as it goes… but doesn’t explain why why WHY these are the principles! ✦ Much less how to operationalize them. (Which, fair: operationalization changes constantly, but ethics codes shouldn’t.) ✦ Or what to do when principles collide. Which principle wins? ✦ I mention this because in my estimation, privacy has been taking a back seat to several other principles lately. I don’t approve. ✦ Allows empty lip service.
  7. So why privacy, then? ✦ Really excellent read, highly recommended:

    ✦ Steve Witt. “The Evolution of Privacy within the American Library Association, 1906–2002.” Library Trends 65:4 (2017). ✦ My next five slides derive entirely from this piece. ✦ Turns out to be pragmatic consequentialism: without privacy, patrons got in trouble… and so did libraries. ✦ 1906: Immigrant Henry Melnek, suspected of anarchism, arrested. Chief Librarian helped with the arrest, even testified against Melnek in court, disclosing his library information habits! ✦ Russian czarist agents were also involved (weird echoes today, right?). And a newspaper called libraries “schools of anarchism” for having anarchist materials available. Criticism of libraries went on for years!
  8. ALA president Arthur Bostwick, 1911: ✦ “In [the library’s] registration

    files it has a valuable selected list of names and addresses which may be of service in various ways either as a MAILING-LIST or as a DIRECTORY. ✦ “Probably there are no two opinions regarding the impropriety of allowing the list to be used for COMMERCIAL PURPOSES along either line. ✦ (Me, today: … really? I wish there weren’t!) ✦ “The use as a directory may occasionally be legitimate and is allowable after investigation and report to someone in authority. ✦ (Me, today: really? when? what investigation? which authorities?)
  9. Arthur Bostwick, 1911: ✦ “I have known of recourse to

    library registration lists ✦ by the police, to find a fugitive from justice; ✦ by private detectives, ostensibly on the same errand; ✦ by a wife, looking for her runaway husband; ✦ by persons searching for lost relatives; ✦ and by creditors on the trail of debtors in hiding. ✦ (Take a moment. How many of these scenarios matter today? Which do you trust? Not trust?) ✦ (Definitely notice Bostwick’s “ostensibly.” Today I’d extend this to the other points too! People and organizations LIE OFTEN and CHANGE THEIR STORIES about why they want data and how they use it!)
  10. Arthur Bostwick, 1911: ✦ “One thing is certain: except in

    obedience to an order of court, it is not only unjust, but ENTIRELY INEXPEDIENT from the library’s standpoint to betray to anyone a user’s whereabouts against that user's wishes or even where there is a mere possibility of his objection. ✦ (Me, today: just whereabouts? much more is knowable!) ✦ “If it were clearly understood that such consequences might follow the holding of a library card, we should doubtless LOSE MANY READERS that we especially desire to attract and hold.” ✦ (Me, today: Is this still true? I believe it is, but I don’t have an all- encompassing answer. That’s part of why I signed on to Data Doubles.)
  11. 1939: Code of Ethics ✦ Why? Because it was the

    Great Depression, and librarian labor was suffering. ✦ Response: demonstrate that not just anybody could be a librarian! ✦ This gets deeper into questions of how professions work than I want to get, fascinating though I find labor history. ✦ But ethics codes were definitely a step toward professioning up. ✦ I mention this because protecting one another as workers is also deeply salient today! Can we use privacy as something that sets us apart? ✦ To do so, we’d have to be actively protecting it, of course! It won’t help us to trumpet promises we aren’t keeping.
  12. Privacy: not a slam dunk! ✦ As you can imagine,

    drafting the Code was not a one-and-done thing. Editing by committee! ✦ Privacy and confidentiality all but disappeared from some drafts. ✦ There was debate within the profession over privacy! Many librarians believed turning in anarchists was the right thing to do, for example. ✦ Relevant to today? Yes, absolutely. ✦ Privacy versus security ✦ Privacy versus “customer relationship management” ✦ Privacy versus assessment and analytics ✦ Privacy versus improved (?) service ✦ I’m hardcore about this: PRIVACY SHOULD WIN, hands down and without question. But not every librarian today is me!
  13. We value privacy BY CHOICE. Some of us wish to

    choose otherwise. Some of us have already chosen otherwise, with words and actions.
  14. That’s a very big statement. Let’s see some reasons I

    say that. (this will be very incomplete; see also the work of e.g. DLF Privacy and Ethics in Technology group, Digital Shred, Alison Macrina/Library Freedom Project, Yasmeen Shorish, Sarah Lamdan, Scott Young, Heather Shipman, Melissa Morrone, Kyle M. L. Jones and collaborators, and so many, many more)
  15. “PACKET SNIFFER” COPIES-AND-SAVES NETWORK TRAFFIC WORKS ON LOCAL NETWORKS, WIFI

    ALL TEXT, IMAGES FROM INSECURE (HTTP, NOT HTTPS) WEBSITES
  16. Okaaaaaay… ✦ What can we do about this? ✦ Serve

    all library websites and services over HTTPS, not HTTP. ✦ Prefer wired to wifi access on in-library patron and staff machines. Secure wifi as best we can. ✦ What have we done about this? ✦ Breeding 2018: 7.9% of academic libraries and 18.3% of public libraries serve HTTP websites, not HTTPS. WE ARE BEHIND. ✦ Wifi protection in libraries: no systematic investigation I know of, so we don’t know much, but I’m not sanguine. ✦ (It doesn’t help that wifi protocols leak privacy like sieves presently. This will change, but not as quickly as I’d like.)
  17. (we would be here all day if I started in

    on Google’s milliard privacy failures) Our ol’ pal Google
  18. Okaaaaaay… ✦ What could we do about this? ✦ Default

    our in-library browsers away from Google toward DuckDuckGo or Qwant or searX. ✦ Stop using other Google services, especially YouTube and Google Analytics (use Matomo or another privacy-aware alternative instead). ✦ Dump Facebook. (At least stop advertising it!) ✦ Educate and advocate. ✦ What are we doing about this? Nothing.
  19. Okaaaaaaay… ✦ What could we do about this? ✦ Install

    tracker-blockers in browsers on in-library machines. ✦ Refuse facial recognition and other biometrics outright. ✦ Academic libraries: refuse ID-card tracking outright. ✦ Refuse the Internet of Things outright. It’s not secure! It’s not private! ✦ Educate and advocate. ✦ What are we doing about this? Nothing. ✦ (with the exception of a few — too few! — advocates and educators)
  20. Failures of data minimization* *DATA MINIMIZATION: collecting and storing only

    data absolutely required for unquestionably necessary operations** ** I do not believe assessment is unquestionably necessary. I am, however, unusual in that.
  21. Okaaaaaay… ✦ What could we do about this? ✦ Don’t

    collect data! Don’t store data! Don’t keep data! Delete data! ✦ Privacy policies with teeth, fully enforced. I dig San Francisco Public Library’s: https://sfpl.org/about/privacy-policy ✦ ALA privacy audits. This is what they’re designed for! ✦ Riding herd on ILS vendors, content vendors, etc. ✦ What are we doing about this? Not a lot! ✦ I have a friend who is a programmer for an ILS. Horror stories about libraries asking (asking!) to store e.g. driver’s-license image scans. ✦ When was the last time you deleted your proxy-server logs? ✦ The UW-Madison Libraries do not have a comprehensive privacy policy. The only unit that does is the Digital Collections Center.
  22. # library (e)book checkouts # and date(s) of library-computer logins

    # library databases accessed # academic journals accessed Appointments with peer tutors Chat reference transactions Interlibrary loan transactions One LA project, identified (!) data on all undergraduates: # of classes attended with library instruction
  23. Okaaaaaaay… ✦ What could we do about this? ✦ Be

    very, very clear about what “confidential” means. I see too many librarians extending it past all sense: “patron data are still confidential because I decided they could have it!” for many values of “they.” ✦ (Several privacy interpretations of library ethics codes fall into this trap. I’d like to see that fixed. Simple heuristic, for starters: if the data’s seeing use outside the library, IT AIN’T CONFIDENTIAL!) ✦ Train our people better. All our people. It’s not enough for me to yell at my students (though I do!). Not all library employees have ALA- accredited degrees, and “not having the degree” is no excuse for this. ✦ Stop letting unethical patron-data use in research, both internal and for publication, slide by. ✦ Refuse to add patron data to campus or municipal data warehouses. ✦ What are we doing about this? Not half enough.
  24. Okaaaaaaaay… ✦ What could we do about this? ✦ Guidelines.

    License terms. Model licenses, model license language. ✦ Stop letting NISO write these! Stop letting NISO say it speaks for libraries! NISO is not a library organization; it is also underwritten by vendors. This is an inherent, structural conflict of interest. ✦ Audit vendors. They have to do accessibility VPATs; why don’t we have a privacy analogue to VPATs? ✦ Educate and advocate. ✦ What are we doing? Nothing.
  25. If you admit that privacy is an obstacle to what

    you’re doing, consider… not doing it! Oakleaf, Megan. 2018. “Library integration in institutional learning analytics.” https:// library.educause.edu/-/media/files/library/2018/11/liila.pdf
  26. Me and Minnesota… ✦ (my paraphrase, obviously, and I am

    obviously biased) ✦ Me: *gives keynote at MnLA Annual 2019* ✦ Me: *brings evidence of poor privacy practices in specific libraries/ consortia in Minnesota* ✦ Keynote: *goes over like lead balloon* (they can’t all be winners) ✦ WiLS: “Hey, Dorothea, favor? Would you give this talk as a webinar for us?” Me: “Sure.” ✦ A Minnesota librarian: “Hey, WiLS, Dorothea brought evidence! It was awful!” ✦ WiLS: “Hey, Dorothea… no evidence from specific libraries/consortia in your webinar, plzkthx.” ✦ Me: “I withdraw the webinar.” ✦ Me: *posts slides to SpeakerDeck anyway, because why not*
  27. Bluntly: This ain’t it, librarians. We can’t fix what we

    won’t even discuss. We can’t do right if pointing out wrong is worse than doing wrong.
  28. This space is hard to parse. ✦ It took us

    literal, actual DECADES to figure out privacy around physical libraries and materials. ✦ We’re not even done figuring it out yet! Though we have a (curiously implicit, often) shared understanding of best practices. ✦ No surprise we haven’t figured it out for online yet. It’s a lot to get our heads around! ✦ That said, I could wish we’d put a lot more effort toward it, as a profession… but that’s water under the bridge. ✦ I have an idea about how to make it more tractable. Hold that thought; I’ll get to it.
  29. We’re not being told what we need to know. ✦

    “Dark [design] patterns:” underlie a lot of privacy dangers, online and off-, in and outside libraries. ✦ Intentionally misleading/deceptive/untransparent design choices ✦ Secrecy and outright lies from Big Tech ✦ Secrecy and outright lies from Big Data pushers ✦ Secrecy and outright lies from Big Content ✦ among whom I count many library content and service vendors ✦ Secrecy and outright lies from government agencies ✦ It’s a complicated environment! Transparency would sure help!
  30. We don’t have enough experts. ✦ We do have some!

    Becky Yoose, in addition to folks I’ve previously mentioned. ✦ LDH Consulting Services: https://ldhconsultingservices.com/ ✦ I’m trying. So are Alison Macrina, Digital Shred, Melissa Morrone, ALA OIF/Erin Berman, DLF… ✦ But the intersection of privacy, technology, and libraries is hideously complicated. “Expert” is a legitimately hard place to reach! ✦ I’m not sure I’m there, and I both research and teach this stuff! ✦ I do know I can’t get somebody there in the fourteen weeks of a three-credit no-tech-prereqs course. Don’t come at me with “it’s all LIS education’s fault!” You will not like my answer.
  31. Our environments are “Big Data? READY, FIRE, AIM!” ✦ I

    feel this especially hard as an educator right now. The situation with pandemic exam proctoring is just appalling. ✦ All praise to Z Smith Reynolds Library at Wake Forest University! ✦ Real thing I heard from a real librarian once about patron-data analytics: “Finally I can speak to my administrators in language they understand!” ✦ The environments libraries exist in do not usually share or even understand library ethics! ✦ The people and services libraries rely on (IT, vendors, standards bodies) do not usually share or even understand library ethics!
  32. We’re scared. ✦ The Library Value Agenda, the CRM movement…

    they come from a place of (real, justified) fear. ✦ We are afraid of being disintermediated, erased and made invisible… and let’s be blunt: fired. ✦ We’re grasping at anything and everything to prevent that… and surveillance / data analysis is hot right now. ✦ This is one place clash of deontological principles turns up. ✦ Accountability is also a principle we believe in! What happens when that appears to mean compromising on privacy?
  33. We want to do right by patrons… ✦ … and

    that can be a trap. ✦ Deontological principle clash, again! ✦ (with an apologetic nod to Scott Young, who points out that “service” is not actually an ethical principle, but a practice) ✦ If we posit that surveilling patron behavior and analyzing patron data are the best/only ways to learn how to serve them… how do we decide not to do that? ✦ Now, that’s a really big “if” there — I don’t actually believe it for an instant! The evidence base for service interventions based on surveillance and Big Data is absolutely ABYSMAL. ✦ But that still leaves “if it DOES work, does that mean we should?”
  34. We’re being used. ✦ RA21 / Seamless Access / SSO

    ✦ very, very “about us without us” (RA21: zero librarians until the comment stage. Seamless Access: tokenized librarians) ✦ very, very dangerous (to more than privacy!) ✦ some very, very untrustworthy people and organizations involved ✦ the Sci-Hub wars ✦ I do not like what I see out of this SNSI thing. ✦ CRM: OrangeBoy, OCLC WISE, Gale Analytics… ✦ Open access —> patron data exploitation ✦ Sam Popowich has a devastating piece on this. Recommended. ✦ https://journals.library.ualberta.ca/jcie/index.php/JCIE/article/view/ 29410
  35. Here’s my idea. ✦ Online privacy dangers tend to be

    out-of-sight, out-of-mind… unlike (most) physical privacy dangers. ✦ Libraries have fairly solid best practices around the privacy of using information in physical carriers. ✦ I’m not claiming perfection! I’m claiming thought and procedure. ✦ So… maybe it makes sense to figure out what the physical analogue to online patron-data capture/ storage/use looks like? ✦ To make it easier to evaluate whether we’re okay with it?
  36. Or, formalized: ✦ [T]he PRIVACY of an e-resource may be

    considered PHYSICAL-EQUIVALENT only when a patron using an information-equivalent physical resource would enjoy no more privacy than the same patron using the e-resource. ✦ (The distinction is really online/offline, not physical/digital. I know this, okay? I wanted the alliteration. Nitpickers step off, please.)
  37. Warning: PEP is messy. ✦ I don’t pretend it’s ironclad,

    waterproof, or free of weird edge cases. It’s not! ✦ That’s okay, though. I’m not trying for that! ✦ In my Twitter bio: “Ethicists are scalpels. I am a buster sword.” ✦ I’m trying for a quick-and-dirty thought process (based on long-standing, time-tested practices) that librarians can use as a handy yardstick. ✦ Term of art for this, from psychology and neuroscience: “HEURISTIC.”
  38. Okaaaaaaay, so how…? ✦ Step 1: Figure out what patron

    data is captured/ stored/analyzed/used/shared/sold around a given online information use. ✦ This is definitely the hard part, not least because of all the secrecy and lies around it. ✦ I suggest methods in my forthcoming article, but for today’s exercises I’ll just be giving you this up-front! ✦ Step 2: What would have to happen for this amount of data to be captured (etc.) about a patron using an analogous physical object? ✦ Step 3: Is that scenario okay? If not, the analogous online scenario probably isn’t either.
  39. Three examples! (if we have time) ✦ Insecure (non-HTTPS) OPAC

    ✦ Adobe 2014 ✦ University of Minnesota learning analytics ✦ Which I called all the way out in the aforementioned keynote. ✦ Was I right? Was I wrong? You make the call. ✦ (I’ve been wrong before. I think I’m also Data Doubles’s biggest privacy hawk; even my co-investigators don’t always agree with me!)
  40. Insecure OPAC ✦ Makes available to anyone packet-sniffing (e.g. with

    Wireshark) on the same local network: ✦ Full content of all OPAC pages browsed, including search-results pages and individual-item pages ✦ All URLs browsed (this is actually true of securely-served OPACs too! it makes me rethink OPAC item permalinks…) ✦ All search terms entered into search forms (or in URL query strings, which frankly no library web tool should be using in 2020) ✦ All items requested via holds, delivery, or save-this-for-later features ✦ Easily traceable to the device being used (including devices belonging to and used by only one patron, like a phone). ✦ Okay. Capture this amount of info about a patron browsing the card catalog and library shelves. Go!
  41. Adobe 2014 ✦ “Adobe Digital Editions:” common ebook-reading software, including

    for library ebooks. ✦ In 2014, caught sending the following user information across the Internet, sniff-vulnerable: ✦ user and device identifiers ✦ each ebook accessed ✦ length of time spent reading the ebook ✦ percentage of ebook read ✦ exact pages viewed ✦ Capture this information about a patron reading a physical book. Leak the info equally broadly. ✦ Wherever the patron does the reading! In-library or out of it!
  42. What did Adobe do? ✦ Encrypted communication between Adobe Digital

    Editions and Adobe servers. ✦ No more sniffing! ✦ That’s it. ✦ As far as we know, they’re still collecting the data. ✦ We still don’t know what they did or are doing with it. ✦ Did I mention that Adobe is a major data broker? ✦ And an Adobe partner/subsidiary (Mobilewalla) published a report geolocating and tracking George Floyd protesters?
  43. University of Minnesota ✦ Remember that list of undergraduate library-use

    data points I had up earlier? It was from… ✦ UMinnesota’s library learning analytics project. ✦ I based the list on their published public publications! No inside intel! ✦ They did not notify students. There was no opt- out, much less actual informed consent. ✦ The library-use data was combined with identified demographic, GPA, transcript, and other university data. ✦ And in C&RL, some of the published statistics are for very low-n populations, raising the chances of individual reidentification. (I’m pretty sure I could do it, and I’m not experienced at reidentification.) ✦ C&RL was told of this and chose to do nothing. NOT OKAY, C&RL.
  44. # library book checkouts # and date(s) of library-computer logins

    # library databases accessed # academic journals accessed Appointments with peer tutors Reference transactions Interlibrary loan transactions Collect this (identified!!!) data on physical library users. # of classes attended with library instruction
  45. This slidedeck copyright 2020 by Dorothea Salo. It is available

    under a Creative Commons Attribution 4.0 International license. Reach me at [email protected].