Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finnish National Bibliography Fennica as Linked...

Finnish National Bibliography Fennica as Linked Data

This was the presentation that originally introduced the Finnish national bibliography Fennica as Linked Data, given at the SWIB17 conference in Hamburg, Germany, on 6 December 2017 (which happened to also be the 100 year anniversary of the independence of Finland).

Video recording: https://www.youtube.com/watch?v=sLMxALIQKmQ

Abstract:

The National Library of Finland is making our national bibliography Fennica available as Linked Open Data. We are converting the data from 1 million MARC bibliographic records first to BIBFRAME 2.0 and then further to the Schema.org data model. In the process, we are clustering works extracted from the bibliographic records, reconciling entities against internal and external authorities, cleaning up many aspects of the data and linking it to further resources. The Linked Data set is CC0 licensed and served using HDT technology.

The publishing of Linked Data supports other aspects of metadata development at the National Library. For some aspects of the Linked Data, we are relying on the RDA conversion of MARC records that was completed in early 2016. The work clustering methods, and their limitations, inform the discussions about potentially establishing a work authority, which is a prerequisite for real RDA cataloguing.

This presentation will discuss lessons learned during the publishing process, including the selection and design of the data model, the construction of the conversion pipeline using pre-existing tools, the methods used for work clustering, reconciliation and linking as well as the infrastructure for publishing the data and keeping it up to date.

Google Slides: https://tinyurl.com/fennica-ld

Avatar for Osma Suominen

Osma Suominen

December 06, 2017
Tweet

More Decks by Osma Suominen

Other Decks in Technology

Transcript

  1. Why? 1. Making our data more visible, also internationally 2.

    Improving the quality and interoperability of our metadata 3. Building competency for the future 4. Why not? :)
  2. bib record bib record bib record bib record auth record

    auth record auth record bib record bib record auth record auth record auth record Work Instance Person Subject 1M bib records 125k person names 40k corporate names 35k subjects (YSA) bib record bib record Place Organization
  3. Work Instance Person Subject Image credit: MaryMaking blog bib record

    bib record bib record bib record auth record auth record auth record bib record bib record auth record auth record auth record 125k person names 40k corporate names 35k subjects (YSA) bib record bib record 1M bib records
  4. As seen in: SWIB16 talk DCMI webinar o-bib journal article

    “From MARC silos to Linked Data silos”
  5. with separate Works and Instances like BIBFRAME, as enabled by

    the bibliographic extensions because it allows us to describe our resources from a common-sense, Web user perspective (and we get a metadata haircut for free!) Special thanks to Richard Wallis for help with applying schema.org!
  6. MARCXML BIBFRAME RDF Schema.org RDF Linked to external URIs MARC

    / Aleph seq With deduplicated works Work keys With deduplicated agents Agent keys Convert & clean using Catmandu Convert using marc2bibframe2 Convert to Schema.org using SPARQL CONSTRUCT YSA subjects YSO subjects Corporate names RDA Media, Content, Carrier Link against controlled vocabularies using SPARQL Generate work keys for merging using SPARQL Merge works using SPARQL Merge agents (person, org) using SPARQL RDF store https://github.com/NatLibFi/bib-rdf-pipeline
  7. Data dump downloads Publishing as Linked Open Data for human

    & machine access RDF HDT Jena Fuseki bib-lod-ui Flask app HTML+JSON-LD OpenSearch API Linked Data RDF RDF store RDF N-Triples MARC records Linked Data Fragments server SPARQL LDF
  8. Data dump downloads RDF HDT Jena Fuseki bib-lod-ui Flask app

    HTML+JSON-LD OpenSearch API Linked Data RDF RDF store RDF N-Triples MARC records Linked Data Fragments server SPARQL LDF
  9. Work extraction 1. Extract works from MARC records 2. Create

    a work authority 3. Use and maintain it for cataloging 4. ??? 5. Profit! Not so easy in practice. Lots of problems in the metadata that cause inconsistencies in the output.
  10. Linking Work Instance Person Subject Place Organization LCSH Finnish Place

    Name Registry Wikidata WorldCat Other national libraries WorldCat Works LIBRIS XL ISNI VIAF ISNI Wikidata
  11. 1. Findable: URIs as identifiers, with rich metadata 2. Accessible:

    URI lookup, SPARQL and LDF endpoints, downloadable data dumps 3. Interoperable: RDF represenation using Schema.org and a little bit of RDAu 4. Reusable: CC0 license. Entities that are references also from other metadata
  12. What next? 1. Enriching and cleaning the RDF data, e.g.

    using subclasses like Map 2. More links to other Linked Data sets 3. Expanding to new data sets: Viola discography, Arto article database
  13. The Finnish Declaration of Independence was adopted by the Parliament

    of Finland on 6 December 1917 My birthday present
  14. Thank you! Questions? [email protected] - @OsmaSuominen http://data.nationallibrary.fi - @NatLibFiData Code:

    https://github.com/NatLibFi/bib-rdf-pipeline https://github.com/NatLibFi/bib-lod-ui These slides: http://tinyurl.com/fennica-ld