Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linked Open Realities: : The Joys and Pains of Using LOD for Research

Linked Open Realities: : The Joys and Pains of Using LOD for Research

I presented this talk at the June 1, 2016 meeting of the CESTA (https://cesta.stanford.edu/) Graduate Fellows at Stanford University. What follows here is the translation of some very loose and informal notes into something more resembling a blog post.

Full text viewable at http://matthewlincoln.net/2016/06/06/linked-open-realities-the-joys-and-pains-of-using-lod-for-research.html

A597f983f2a3599765ef8a68ed9e5c4b?s=128

Matthew Lincoln

June 06, 2016
Tweet

Transcript

  1. Linked Open Realities Matthew Lincoln, PhD University of Maryland @matthewdlincoln

    June 1, 2016 CESTA, Stanford University The Joys and Pains of Using Linked Open Data for Research
  2. Image: @LegoAcademics CC-BY-NC-SA

  3. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  4. Use case: Print Production Networks @matthewdlincoln

  5. “Sculptura in Æs”, from Jan van der Straet’s Nova Reperta.

    Published by Philips Galle, c. 1588-1605. The Metropolitan Museum of Art. @matthewdlincoln
  6. Jan van der Straet Designer Jan Collaert I Engraver Philips

    Galle Publisher between 1588-1605! @matthewdlincoln
  7. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • Jacob Matham Claes Jansz. Visscher Hendrick Goltzius Karel van Mander I Jacques de Gheyn II Bartholomeus Spranger Jan Harmensz. Muller Harmen Jansz. Muller Jan van Londerseel Dirck Barendsz. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Claes Jansz. Visscher Gerard van Honthorst Theodor Matham Frederick de Wit Cornelis Danckerts I Jonas Suyderhoef Cornelis Visscher Michiel Mosyn Clement de Jonghe Cornelis van Dalen I @matthewdlincoln Example networks 1580-1590 1640-1650 Accepted for Journal of Digital Art History (summer 2016)
  8. Between 1550-1750: Mining the museum for data @matthewdlincoln British Museum

    Rijksmuseum 49,306! 46,730!prints 3,592! 5,644!nodes: distinct designers, printmakers, and publishers 76,697 ! 94,282 !edges: connections inferred from co-participation in an object Accepted for Journal of Digital Art History (summer 2016)
  9. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  10. JSON vs. RDF/LOD @matthewdlincoln

  11. @matthewdlincoln SPARQL Protocol And RDF Query Language

  12. @matthewdlincoln http://programminghistorian.org/lessons/graph-databases-and-SPARQL

  13. @matthewdlincoln http://programminghistorian.org/lessons/graph-databases-and-SPARQL SELECT ?artist ?painting WHERE { ?artist <has nationality>

    <Dutch> . ?painting <was created by> ?artist . }
  14. Writing SPARQL @matthewdlincoln

  15. @matthewdlincoln

  16. #! $! %! @matthewdlincoln

  17. Unknown unknowns in graph databases: the case of the nationless

    Anthony van Dyck (1599-1641) @matthewdlincoln
  18. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  19. SPARQL endpoint JSON API difficult to bulk download too unreliable

    scrape/download and create local copies Processing 1.  cleaning 2.  reshaping 3.  network analyses 4.  simulation Derivative datasets 1.  Cleaned and filtered data 2.  Analysis results Visualization & interpretation Disseminate @matthewdlincoln
  20. @matthewdlincoln

  21. @matthewdlincoln Image: Randall Munroe (www.xkcd.com/908) CC-BY-NC

  22. @matthewdlincoln

  23. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  24. What LOD/Semantic Web promised the cultural heritage world The LOD

    that the cultural heritage world has produced (at least so far) Image: Manu Cornet (www.bonkersworld.net) Image: Richard Cyganiak (www.lod-cloud.net) @matthewdlincoln
  25. None
  26. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  27. Image: @LegoAcademics CC-BY-NC-SA @matthewdlincoln

  28. Image: @LegoAcademics CC-BY-NC-SA @matthewdlincoln LOD desiderata from a researcher perspective:

    •  Visual interface for grokking an unfamiliar LOD repository (or remembering how a familiar one works) •  Guided SPARQL query construction (caution: might not be able to cater to full range of SPARQL complexity) •  Repo audit: for a given rdf:type of subject, what predicates are usually present? When are these predicates missing for highly-connected subjects? •  Google OpenRefine, but for fuzzy matching against different authorities
  29. Image: @LegoAcademics CC-BY-NC-SA Matthew Lincoln matthewlincoln.net @matthewdlincoln