Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linked Open Realities: : The Joys and Pains of Using LOD for Research

Linked Open Realities: : The Joys and Pains of Using LOD for Research

I presented this talk at the June 1, 2016 meeting of the CESTA (https://cesta.stanford.edu/) Graduate Fellows at Stanford University. What follows here is the translation of some very loose and informal notes into something more resembling a blog post.

Full text viewable at http://matthewlincoln.net/2016/06/06/linked-open-realities-the-joys-and-pains-of-using-lod-for-research.html

Matthew Lincoln

June 06, 2016
Tweet

More Decks by Matthew Lincoln

Other Decks in Research

Transcript

  1. Linked Open Realities Matthew Lincoln, PhD University of Maryland @matthewdlincoln

    June 1, 2016 CESTA, Stanford University The Joys and Pains of Using Linked Open Data for Research
  2. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  3. “Sculptura in Æs”, from Jan van der Straet’s Nova Reperta.

    Published by Philips Galle, c. 1588-1605. The Metropolitan Museum of Art. @matthewdlincoln
  4. Jan van der Straet Designer Jan Collaert I Engraver Philips

    Galle Publisher between 1588-1605! @matthewdlincoln
  5. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • Jacob Matham Claes Jansz. Visscher Hendrick Goltzius Karel van Mander I Jacques de Gheyn II Bartholomeus Spranger Jan Harmensz. Muller Harmen Jansz. Muller Jan van Londerseel Dirck Barendsz. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Claes Jansz. Visscher Gerard van Honthorst Theodor Matham Frederick de Wit Cornelis Danckerts I Jonas Suyderhoef Cornelis Visscher Michiel Mosyn Clement de Jonghe Cornelis van Dalen I @matthewdlincoln Example networks 1580-1590 1640-1650 Accepted for Journal of Digital Art History (summer 2016)
  6. Between 1550-1750: Mining the museum for data @matthewdlincoln British Museum

    Rijksmuseum 49,306! 46,730!prints 3,592! 5,644!nodes: distinct designers, printmakers, and publishers 76,697 ! 94,282 !edges: connections inferred from co-participation in an object Accepted for Journal of Digital Art History (summer 2016)
  7. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  8. Unknown unknowns in graph databases: the case of the nationless

    Anthony van Dyck (1599-1641) @matthewdlincoln
  9. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  10. SPARQL endpoint JSON API difficult to bulk download too unreliable

    scrape/download and create local copies Processing 1.  cleaning 2.  reshaping 3.  network analyses 4.  simulation Derivative datasets 1.  Cleaned and filtered data 2.  Analysis results Visualization & interpretation Disseminate @matthewdlincoln
  11. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  12. What LOD/Semantic Web promised the cultural heritage world The LOD

    that the cultural heritage world has produced (at least so far) Image: Manu Cornet (www.bonkersworld.net) Image: Richard Cyganiak (www.lod-cloud.net) @matthewdlincoln
  13. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  14. Image: @LegoAcademics CC-BY-NC-SA @matthewdlincoln LOD desiderata from a researcher perspective:

    •  Visual interface for grokking an unfamiliar LOD repository (or remembering how a familiar one works) •  Guided SPARQL query construction (caution: might not be able to cater to full range of SPARQL complexity) •  Repo audit: for a given rdf:type of subject, what predicates are usually present? When are these predicates missing for highly-connected subjects? •  Google OpenRefine, but for fuzzy matching against different authorities