Upgrade to Pro — share decks privately, control downloads, hide ads and more …

After Cancellation: Reconstructing Knowledge Bases With OpenRefine

Scarlet Galvan
June 23, 2018
77

After Cancellation: Reconstructing Knowledge Bases With OpenRefine

ALCTS Electronic Resources Interest Group, ALA Annual 2018

Scarlet Galvan

June 23, 2018
Tweet

Transcript

  1. After Cancellation: Reconstructing Knowledge Bases With OpenRefine ALCTS Electronic Resources

    Interest Group, ALA Annual 2018 Angela Galvan | Electronic Resources Manager | Brown University @panoptigoth | asgalvan.com
  2. The first problem • Significant changes to our ScienceDirect agreement.

    • Loss of institutional memory about serials/database relationships. • ARL privilege = reactive response to cuts. • Lost access to the Freedom Collection, a package of about 2k titles • Me: “Great! I’ll uncheck the box in Serials Solutions for that database.”
  3. ARL privilege One time money/end of year funds ‘solve’ problems.

    Collections are static, not living. eResources are “on or off” without sense of underlying workflow/data.
  4. Four goals I like to share things, so it was

    important to develop a solution with these goals in mind: 1. Should be portable to other libraries with Serials Solutions. 2. Use tools available to most library workers. 3. Free, with well supported communities like OpenRefine, MARCEdit, or similar. 4. Workflow contained within a single department/person.
  5. Bespoke, artisanal Serials Solutions workflow • Previous documentation noted ScienceDirect

    was “heavily customized, use ScienceDirect database and not the smaller Freedom Collection.” • This means staff were previously checking thousands of titles by hand between the knowledge base and the titles list attached to our license, and adding them manually to the ScienceDirect database in Serials Solutions.
  6. The second problem • Me: “I’ll deliver a Knowledge Base

    and Related Tools (KBART) file from Elsevier to Serials Solutions to resolve our ScienceDirect entitlements.”
  7. KBART in Serials Solutions KBART uploads “on the roadmap” for

    Serials Solutions. ProQuest endorsed the KBART standard in 2010. Most of KBART : ODSE (Offline Date and Status Editor) is header changes…except dates.
  8. KBART in OpenRefine • Pick an identifier for cell.cross function

    or use VIB-BIT extension. • Dates by far the biggest issue, because Serials Solutions rejects KBART formatted dates. • value.toString() • value.replace('Jan ', '01/').replace('Feb ', '02/')…. • Change headers to match ODSE
  9. Future prevention • Writing documentation as case studies, instead of

    “how to.” • Surfacing complexities of eResources work whenever possible. • Fully transparent work log. • Eventually: python script.
  10. Thank you! • I can’t do an OpenRefine workshop in

    15 minutes but you can send me your questions at: [email protected] • Full steps and JSON available at asgalvan.com following #alaac18 • Cassie Schmitt for this post on dates in OpenRefine: https://icantiemyownshoes.wordpress.com/2014/04/24/clean-up- dates-and-openrefine/ • LITA/PLA AvramCamp for supporting my attendance.