Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Technology procurement

Technology procurement

For LIS 644, "Digital Tools, Trends, and Debates"


Dorothea Salo

December 01, 2019


  1. Buying gadgets

  2. Be restrained about it. •The environmental load of making computing

    hardware is HORRENDOUS. •The load of disposing of it is even worse. •Not least because wealthy developed countries dump technotrash on poorer countries. It’s just gross. •Feel guilty when you buy gadgets, please! •“Replacement cycles” should be as long as practical. •Try to buy fixable stuff. (It’s hard! Check out IFixIt, though.) •Other things to consider: •K-12/youth folks: Kids break, lose, and get bored with stuff. Buy carefully. Try before you buy.
  3. On BYOD •“Bring your own device” •Commonest with mobile phones

    •Happens also with web-based services, sometimes •It can make sense, but be aware: •Your organization is now subject to whatever security risks device owners introduce. •This sometimes means that orgs will demand the ability to know what’s on your device (privacy? what privacy?), and even remote-wipe it. •On balance, I don’t think BYOD is a good idea. •If work wants you available by mobile, work should buy it.
  4. Server-type gadgets • (for websites, internal database systems, whatever) •Back

    in the day: you bought a server, racked it, cooled it, installed everything on it… •It can be more effective (and environmentally friendly due to economies of scale) not to do this. Buy server capacity, not the actual server. •I’ll talk more about Software as a Service in a bit, but I do want to mention some pitfalls to watch out for: • SECURITY. SECURITY SECURITY SECURITY. I don’t even bookmark “somebody had their business database hanging in the breeze on an Amazon server” stories. There’s too many! • Control. • Terms of Service policies (along several axes, but privacy is an important one)
  5. Some systems in your professional neighborhood(s)

  6. Libraries •ILS: “Integrated Library System” •Catalog (front- and back-end), circulation

    management, acquisitions, etc. •“OPAC” means “Online Public Access Catalog.” Which sounds quaint these days—what other kind of catalog would there be? But you’ll still hear people say it, so. •“Discovery layer:” search tool that bridges the catalog and article-level e-resource databases •ERM(S): E-Resource Management (System) •License handling, catalog-record handling, usage reports, database lists… •“Link resolver:” given a citation/link, get a patron to full text. •“Proxy server:” for off-campus access to e-resources •The vendor needs to know that the patron is entitled to access!
  7. Archives and museums •Finding-aid/accessioning system •There isn’t a catchy name

    for these that I’m aware! Archivematica is one, though, and so is ArchivesSpace. •Helps with accessioning, finding-aid construction, patron search and browse, sometimes MARC record construction (where that’s needed) •“Collections management” software •What it sounds like. Keep track of your (physical) stuff!
  8. Digital-collections systems •“Content Management System” (CMS) •Usually for handling Stuff

    That Goes On The Website •But can be expanded into e.g. document management, policies/ procedures management, knowledgebases, etc. •“Digital Asset Management System” (DAMS) •Grew out of advertising agencies, but becoming common in a lot of industries that produce a lot of Digital Stuff that needs to be Managed in some way •I mean, the alternative is “people leave stuff lying around on their hard drives and random servers and Dropbox and…” •Digital collections •Usually for showing digitized (sometimes born-digital) content online. •Preservation system •What it says on the box! Designed for ensuring that digital stuff persists into the future. •WARNING: CMSes, DAMS, and digital collections systems are NOT PRESERVATION SYSTEMS, okay?
  9. Records/info-governance •This one’s hard, and getting harder. Electronic records are

    a huge challenge! •Some systems I’m seeing: •Preservation systems, naturally, but with the added wrinkle that stuff must be deleted on schedule •Email management •“E-discovery:” given all the places information relevant to a given Legal Situation may be lurking, find it! •Automated(ish) e-records(ish) transfer(ish) systems •A whole lot of scrambling, ad-hoc roll-your-own “systems”
  10. Infosec systems •Network-monitoring systems •“Endpoint”-monitoring systems •endpoint: any gadget an

    end-user is using (desktop, laptop, server, tablet, phone, etc) •Firewalls, anti-malware, other defensive tools •“Security Incident and Event Management” systems (SIEM) •Gather up All The Logs from All The Endpoints (and the network) in one place
  11. None of this stuff “runs itself.” •I’m embarrassed I even

    have to say this, but I have heard it from high-level academic-library administrators who should know better. •I’ve also heard vendors say it. They shouldn’t. •No software “runs itself.” No human process involving software “runs itself.” This is not a thing! Please don’t expect it!
  12. Thanks!

  13. How info agencies (often) pick software

  14. PICK SOFTWARE LAST. Friendly word of advice: Photo: “Briana Calderon;

    future educator of america.” http://www.flickr.com/photos/46132085@N03/4703617843/ Arielle Calderon / CC-BY 2.0
  15. IT’S WHAT THE SOFTWARE WON’T DO. It’s not what the

    software does that’ll kill you. Photo: “Briana Calderon; future educator of america.” http://www.flickr.com/photos/46132085@N03/4703617843/ Arielle Calderon / CC-BY 2.0
  16. First: know what you want •Jargon for this: “requirements gathering”

    •What does the software need to do? (ASK PEOPLE who interact with the software about this. Don’t just assume.) Don’t forget legal issues such as user-interface accessibility. •What is the maximum load on the software likely to be? (So you can ask whether it will handle that.) •What support and training will your organization need? What is the maximum staff time you can give to installing and supporting this software? •Prioritization: which features are dealbreakers, and which are just nice-to-have •Smart to do: ask around (quietly) for other people’s and agencies’ experiences •Off the record. We’re usually too nice to go on the record with negative experiences.
  17. Second: write RFP •“Request for Proposal.” Bureaucratic red tape document

    containing your requirements. •Lets software companies “bid” to serve your needs. •Note that the RFP process heavily disadvantages open-source software, because it assumes every software package has a vendor behind it. This is often a bad thing. •If you don’t have to go through the whole RFP rigmarole, you evaluate software yourself based on the requirements you discovered. •DO NOT evaluate or adopt software without figuring out your requirements first! Doing this is the NUMBER ONE reason we end up with bad software. •Be ESPECIALLY careful on conference exhibit floors!
  18. RFPs and innovation •Many software vendors do not innovate unless

    forced… by RFPs. •Real-world case: NCSU and Endeca • Several innovations: record deduping, LCSH drilldown, relevance ranking of results (!), faceted browsing • But Endeca was not a library vendor! Why not? NCSU couldn’t get library vendors to do what they wanted! • But after everybody oohed and aahed over the Endeca catalog, RFPs started to include Endeca-ish features, so ILS/discovery-layer vendors had to build them in order to be competitive. •Today, ILS vendors: “We’ll do linked data when our customers ask us to.” That means RFPs. •THE RFP PROCESS IS NEARLY THE ONLY TIME YOU HAVE POWER OVER VENDORS. • Use that power wisely and well, okay?
  19. Thanks!

  20. Open- and closed-source software

  21. Brief digression: open source, open standard, open access •Open source:

    refers to SOFTWARE •Open standard: refers to RULES/SPECIFICATIONS for protocols, file formats, software, etc. •“Reference implementation:” software that shows how software that complies with a particular standard should work •Open access: refers to SCHOLARLY LITERATURE •Please don’t confuse these. Thanks. •Yeah, yeah, everybody else does. Well, they’re ignorant and you’re not. •And beware of “openwashing!”
  22. Use the source, Luke! •“Source code” = the instructions that

    humans write for computers to follow •“Compiled code” or “binary code” = source code that has been munged to be directly understandable by the computer •Not interpretable by humans any more! •This is the only form in which proprietary software is distributed (usually), and why you can’t peek under its hood. •“Compiler,” “interpreter,” “virtual machine” all bits and pieces of the source-code to compiled-code transformation. •“API” = program offers “hooks” to hang custom code on. •So you can do things the original developers didn’t envision.
  23. Open-source software •The source code is open! •You can (legally)

    download and install it without paying. •You can (legally) read the code. •You can (legally) change the code. •You can (legally) resell it (sometimes with caveats). •Developers “license” their code under one of a number of open-source licenses •Commonest: GNU General Public License (GPL), which has a resharing sting in its tail •Also notable: BSD license, Artistic License •http://opensource.org/ maintains a vetted list of open-source licenses. If you care. •Overhead: open-source software organizations •Often ask for dues; sometimes sell services
  24. I’m not a programmer. Why should I care about the

    source? •Do you benefit when other people hack on the software? •With open source, quite possibly yes. •If there’s a good API or plugin infrastructure, quite possibly yes. •With API-less proprietary software, rarely and only indirectly. •What happens when a software company goes out of business? Or kills a product? •Proprietary software: decay and obsolescence. •Open-source software: new companies, forks, options. •Security, maybe •Security-through-obscurity doesn’t work. No software is perfectly secure, but OSS has a good track record of fast patches. •However, Heartbleed and Shellshock are worth thinking about. “Many eyes make bugs shallow” only works if the eyes actually exist.
  25. Should I use open-source or proprietary software, Dorothea? •It depends.

    There are tradeoffs. •$$$ vs. staff time/expertise: “free as in kittens” (consider also the cost of supporting an OSS community) •Ease of use/installation vs. control •Professional support vs. ad-hoc online communities •You can’t always know what your experience will be. •Some vendor support is horrible. Some is great. Some online communities are horrible. Some are great. •Some open-source projects move fast. Some don’t. Some vendors move fast. Most don’t (most can’t!). •Only you understand your workplace’s situation. •ASK AROUND before you invest, either way.
  26. Thanks!

  27. Software development and purchasing models and why they matter

  28. Software development and purchasing models: why you care •You will

    be involved in software choice for your employer. •How your software was built affects: •how much you pay for it, up-front and ongoing (“TCO”) •which chunk of budget those costs come from •how much you can do with and to it •how much it will cost to support and train people on it •how much control you have over your data and how your data are presented to your patrons •how good it is •There is no one right answer. There are only tradeoffs, which you need to understand.
  29. Building it yourself •Some orgs deliberately and intentionally develop their

    own software. Go them! •Some orgs do it by accident! •One bright tinkerer whomps something up. •The library/archives comes to depend on it. •... and then the tinkerer leaves. Oops. •... or the computing world changes such that the whomped-up thing no longer works. Oops. •... or the library/archives misses a chance to adopt a better tool. Oops. •Tinkerers are great. But make them document. And have a plan for transitioning off or supporting the continued development of the whomped-up thing! •This is particularly common in webspace. Make SURE you know what your library’s website is built on.
  30. Off-the-shelf software •What you buy from Microsoft, Apple, Adobe… •Made

    by for-profit companies •Though small developers and shareware makers are still out there! Especially on mobile! •Certain expectations of performance, stability, polish, documentation •May vary somewhat depending on customer base •May rely on proprietary file formats for customer lock-in •Less likely these days, but it does still happen. •Pricing: usually “per seat” or “site licensed” •TRACK YOUR SOFTWARE LICENSES. ALL OF THEM. •If you are ever audited, you NEED this documentation.
  31. Vendor software •Usually springs up in niches where off-the-shelf software

    can’t sell enough “seats” to make money. •... e.g. ILS software for libraries! Also learning-management systems! •Starting to turn up in digital-preservation space. •You pay to run the software AND for a certain level of customer service. •Installation help •Employee training, user groups, conferences •Technical support (up to and including Software as a Service) •You’ll still need local tech staff, often! •Installing and customizing these things is a HASSLE. •The larger your userbase, the more localfolk have to tweak to scale. •Make sure you take localfolk into account when determining TCO. •But there will be strict limits on what you can do.
  32. Software as a Service •You don’t run anything locally. Everything

    happens on your vendor’s servers and in your web browser. •Can be a godsend for small organizations with minimal (or uncooperative) IT staff. • Many libraries and archives borrow IT from parent organization; these folks are generally not attuned to info-org-specific needs. •Can also be a jail cell. Have an exit strategy! • Make SURE you can get your data out! By testing, not by trusting a vendor’s assertion! •Web/CMS, digital-library, IR, ILS software available this way. •I don’t need to tell you to take privacy and security issues seriously, do I? Good. Didn’t think so.
  33. Hybrid: “cloud computing” •You’re still running software. •If the code

    breaks, or needs patching, or whatever, it’s your problem, not the software vendor’s or cloud vendor’s. •But instead of running software on your own server machine, you’re running it on Amazon’s or Google’s or Microsoft’s. •Similar to but slightly different from “server virtualization.” •Question is, who’s readying the server to run your software? With server virtualization, it’s you; with cloud computing, it’s the vendor. •Code often needs some rewriting to run in the cloud. •Can protect a web app from traffic spikes •Can cost more than running one’s own server, though. •Security/privacy questions, too; data is leaving your local space!
  34. Open-source software •(explored in detail in separate lecture) •Can be

    acquired/used under any of the previous purchasing models! •You can build it yourself! And quite a few libraries and archives do. •You can buy/download it off the shelf, e.g. some Linux OS distributions, and run it yourself. •Some vendors build or rely on it, e.g. Equinox for Evergreen ILS. •It can appear as Software as a Service, e.g. omeka.net. •It can be run in the cloud. •“We don’t do open source around here” is obtuse obstructionism, also probably untrue. •Unfortunately, that doesn’t stop some people...
  35. Thanks!

  36. Lipstick on a Pig: Integrated Library Systems

  37. What’s an ILS? •Integrated Library System •THE system that handles

    library operations around analog materials and patrons. •Archives: you may catalog collections into one, or use one for your circulating materials if any. •“Modules” •Acquisitions •Cataloguing •OPAC •Circulation/patron management •Also (mostly academic libraries): serials, metasearch, e-resource managers (sometimes), link resolvers, ILL... separately or bundled •Underneath: enormous relational database! •Which means ALL THE HEADACHES with MARC data.
  38. State of the market •Big consolidations in mid-2000s •Players: Endeavor

    (Voyager), Ex Libris (Alma), Sirsi/Dynix (Horizon) •Open-source ILSes •Koha: geared toward individual public libraries •Evergreen: geared toward library consortia, building code for academic libraries (e.g. serials management) •Software as a Service •WorldCat Local •LibraryThing for Libraries •The discovery-layer thing •Primo Central, EBSCO Discovery Service (EDS), Serials Solutions Summon, VuFind, Blacklight •Typical ILS replacement cycle: 5 to 10 years
  39. History: lipsticking the pig •Mid 2000s: OPACs so bad that

    libraries turned to outside vendors, homegrown solutions •NCSU: contracted with Endeca, who are a web-commerce firm •UVa: Solr/Flare/Blacklight (ha ha ha) •Scriblio, VuFind, etc. •What were they looking for? •USABILITY! •Faceted searching/browsing •Better associations among records (quasi-FRBRization) •Better correlation between user language and controlled vocabularies •Generally: making the data work harder!
  40. Catalog vs. “resource discovery” •What’s actually in an ILS/OPAC? •Print

    books, maps, sheet music •Title-level journals/magazines/newspapers (“serials”) •Maybe govdocs, theses/dissertations, collection records for stuff in special collections and/or archives •What’s not? •The rest of the information world! Including digital collections, stuff on the web, article-level access to serials, finding aids... •The information world is bigger than it was! •So is the ILS/OPAC an INVENTORY tool, or a DISCOVERY tool? •If the latter, can we compete with Google? On what basis? •And what is our inventory, really?
  41. First-cut solution: Metasearch image courtesy Angela Pratesi and Kalsang

  42. First-cut solution: Metasearch •How many databases are you willing to

    search? With all their different interfaces? •Metasearch to the rescue! or something. •Single search interface presented to the user. •Sends user’s query to various databases; receives, processes (deduping, relevance ranking), and presents the results. •Some databases use search protocols like Z39.50 and SRU/SRW. Others have to be screenscraped. •Lousy solution •Slow, not always good at processing results, deduping is chancy, coverage not always the best, advanced-search functions gone.
  43. image courtesy Angela Pratesi and Kalsang Next try: Building local

    index for search
  44. Next try: Building local index for search •Tricky to do!

    • Which data sources can you legally build your index from? • Of those, how many have an API to their metadata? Or will you be stuck screenscraping HTML? • Or do you have to work with your link resolver? • Is the metadata any good? Will it play nicely with other metadata? (Hint: Often not!) • Mind you, the software to do this is open-source: Blacklight, Umlaut. The problem is harvesting decent metadata legally. •See also: Google Scholar • Essentially this is what GS does. They make special arrangements to crawl publisher sites, even behind firewalls.
  45. Now: “web-scale” discovery, “discovery layers” •OPAC layers (or ILS replacements,

    or ILS add-ins) that purport to offer one-stop shopping: OPAC, digital collections, serials, etc. •Serials Solutions: Summon •WorldCat Local •Ex Libris: Primo Central •EBSCO: EBSCO Discovery Service (EDS) •First question: is this a SEARCH TOOL or a CONTENT/METADATA DATABASE or both? •Next question: coverage? •Players VERY close-mouthed about serials coverage. •For now, this is an academic-library thing.
  46. More pieces: Link resolvers and OpenURL •You have a citation.

    How do you find out if the library has the article among its e-resources? •It may be in multiple databases, full text or not... •OpenURL: protocol for checking citation information against a library’s list of vendor- provided e-journals and article databases •Pack citation info into a URL or a teeny XML document •Link resolver: gizmo that takes in an OpenURL and returns list of available copies. •SFX (Ex Libris) current market leader
  47. An example of OpenURL •“Can I have Library Trends from

    2008?” •http://muse.jhu.edu.ezproxy.library.wisc.edu/ cgi-bin/resolve_openurl.cgi? genre=&eissn=1559-0682&issn=0024-2594 &date=2008 •EISSN: International Standard Serial Number, electronic •ISSN: regular ISSN •date •Lots more you can pack in! •Author, article title, journal title •Several of these are string matches, so they fail a lot. (No authority control in this environment yet!)
  48. Still more pieces: e-resource management •You just bought a huge

    bundle of journals from a publisher. How do you update holdings and URLs in your OPAC? How do you update your link resolver to know what you have? •How do you keep track of who bought what out of which fund? Or who to call when something breaks? Or usage stats? •Market leader: Serials Solutions •Service (auto-holdings-updating), not just product. •These are starting to be built into ILSes rather than sold separately.
  49. Thanks!

  50. Happening now: RDA, FRBR, BIBFRAME, linked data

  51. The future of MARC •Bluntly: it doesn’t have one. •

    As a file format, it’s LONG past its sell-by date. • Does not fit into the web universe at all. • Making it work with current-gen technology is a tremendous resource drain. • Decisions made so that MARC could easily output human-readable catalog cards are hurting us badly now that catalog cards aren’t what we want any more, and machines need to understand our data. •That said, libraries have a lot of data in MARC. Many archives do too! • If you become a cataloger, you will be involved in a mass data migration. Have fun! (Believe me, I feel your pain.) •Migration to what? Well, that’s the question. • The answer is probably multiple, which is scary all by itself. But RDA is part of the answer. So is linked data/RDF.
  52. What is RDA? •Resource Description and Access •the next analogue

    to AACR2 •Does not assume MARC or ISBD underneath! •Diane Hillmann, others actively working on linked-data/RDF expressions. They… sort of work. Hold that thought. •Claimed benefits •Expand the universe of what is describable •Spend less time on rules pilpul, punctuation (ISBD), and other cruft •Less emphasis on “record,” more on linkages •Ability to make our records work with/for outside world •FRBRization
  53. Right, so what’s FRBR? •Functional Requirements for Bibliographic Records •Relational

    data model for catalog records. •“Relational” as in “relational database.” •Recognizes that not all parts of a bibliographic record describe the same thing •Author: of a “work” •Page count: of an “edition” •“FRBRizing” a catalog means drawing all those relationship arrows between records, and then doing something with them for patrons. •We can do this mechanically. Sort of. Some of it.
  54. What is BIBFRAME? •Effort by Library of Congress to replace

    MARC with a new standard data model. •NISO being dragged along, rather unwillingly. •Based on linked data/RDF •Strawman model and related tools available, currently being beaten on and fixed/changed
  55. What is linked data? •A way of reducing data to

    its smallest, most computer-digestible parts… in such a way that the data can be published, read, queried, and reused across a global network. •RDF: Resource Description Framework, the major W3C standard underlying linked data. (There are other relevant standards!) •Imagine a single world-spanning database, with data from everywhere. That’s the idea behind linked data. •Relies heavily on globally-unique identifiers… •… for people, places, things, concepts… •… often aggregated into “(linked-data) vocabularies.” •Identifiers are often (though not quite always) URLs.
  56. Lingering problem: “things not strings!” •MARC and AACR2 lived in

    a universe of human- readable “strings” of letters and numbers. Human language, in other words. •Computers do not understand human languages. •Yes, yes, natural-language processing. It doesn’t work for library purposes. Hush. •Computers understand “this is a thing, reliably and permanently identified with this unique identifier.” •To function in a computerized environment, library data needs to IDENTIFY ALL ITS THINGS. •And honestly, given our history of authority control, you’d think we’d be cool with that! Apparently not so much...
  57. Next problem: Who owns our records? •Is a MARC record

    copyrightable? •Does it pass copyright’s originality test? •What about a collection of records? Compilation copyright? •Best current guess: transcribed fields no, other fields... maybe, compilation... maybe •Contract law with cataloging vendors can limit what libraries do, even with their own records! •What cataloging vendors don’t want: “their” recordbase on the open Web •Disintermediation! •Though the LoC’s records are more or less freely available, so I’m not sure how much I can endorse this argument...
  58. Next problem: Who owns our records? •OCLC controls union catalog

    in the US. •But OCLC didn’t author most of the records! •Flap in early 2010s about who can use/remix those records, with or without permission. •Open-records initiatives sprang up in protest •Open Library, Michigan •National Library of Sweden broke its OCLC cataloging contract over this. •“We have to share MARC records with the libraries that depend on us. It’s a lot of our reason for existing!” •To be clear: legal restrictions on reuse and mashups damage librarianship’s presence online. We can’t afford not to settle this.
  59. Last problem: How does our data fit into the Web?

    •This is not entirely a catalog problem. •What about our digitized collections? Born-digital holdings? Finding aids? Usage data? Authority data? •What are our APIs? •To what extent do we NEED local catalogs? •Uncomfortable but necessary question! Do we need to reinvent Google? If so, how do we exchange records for stuff that isn’t in our ILS? •Are we overinvested in the ILS? •How do we facilitate appropriate reuse of our data? Do we/can we bar inappropriate reuse?
  60. Thanks!