Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linked Open Realities: : The Joys and Pains of ...

Linked Open Realities: : The Joys and Pains of Using LOD for Research

I presented this talk at the June 1, 2016 meeting of the CESTA (https://cesta.stanford.edu/) Graduate Fellows at Stanford University. What follows here is the translation of some very loose and informal notes into something more resembling a blog post.

Full text viewable at http://matthewlincoln.net/2016/06/06/linked-open-realities-the-joys-and-pains-of-using-lod-for-research.html

Avatar for Matthew Lincoln

Matthew Lincoln

June 06, 2016
Tweet

More Decks by Matthew Lincoln

Other Decks in Research

Transcript

  1. Linked Open Realities Matthew Lincoln, PhD University of Maryland @matthewdlincoln

    June 1, 2016 CESTA, Stanford University The Joys and Pains of Using Linked Open Data for Research
  2. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  3. “Sculptura in Æs”, from Jan van der Straet’s Nova Reperta.

    Published by Philips Galle, c. 1588-1605. The Metropolitan Museum of Art. @matthewdlincoln
  4. Jan van der Straet Designer Jan Collaert I Engraver Philips

    Galle Publisher between 1588-1605! @matthewdlincoln
  5. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • Jacob Matham Claes Jansz. Visscher Hendrick Goltzius Karel van Mander I Jacques de Gheyn II Bartholomeus Spranger Jan Harmensz. Muller Harmen Jansz. Muller Jan van Londerseel Dirck Barendsz. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Claes Jansz. Visscher Gerard van Honthorst Theodor Matham Frederick de Wit Cornelis Danckerts I Jonas Suyderhoef Cornelis Visscher Michiel Mosyn Clement de Jonghe Cornelis van Dalen I @matthewdlincoln Example networks 1580-1590 1640-1650 Accepted for Journal of Digital Art History (summer 2016)
  6. Between 1550-1750: Mining the museum for data @matthewdlincoln British Museum

    Rijksmuseum 49,306! 46,730!prints 3,592! 5,644!nodes: distinct designers, printmakers, and publishers 76,697 ! 94,282 !edges: connections inferred from co-participation in an object Accepted for Journal of Digital Art History (summer 2016)
  7. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  8. Unknown unknowns in graph databases: the case of the nationless

    Anthony van Dyck (1599-1641) @matthewdlincoln
  9. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  10. SPARQL endpoint JSON API difficult to bulk download too unreliable

    scrape/download and create local copies Processing 1.  cleaning 2.  reshaping 3.  network analyses 4.  simulation Derivative datasets 1.  Cleaned and filtered data 2.  Analysis results Visualization & interpretation Disseminate @matthewdlincoln
  11. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  12. What LOD/Semantic Web promised the cultural heritage world The LOD

    that the cultural heritage world has produced (at least so far) Image: Manu Cornet (www.bonkersworld.net) Image: Richard Cyganiak (www.lod-cloud.net) @matthewdlincoln
  13. Use Case: Print Production Networks ! LOD ! " LOD

    " 1 SPARQL encourages complex data expression and queries Composing SPARQL for an unfamiliar repo is challenging 2 LOD is addressable LOD is just data on someone else’s computer 3 Different LOD repositories are linkable Content producers rarely use existing authorities/thesauri At the end of the day, I just need data @matthewdlincoln
  14. Image: @LegoAcademics CC-BY-NC-SA @matthewdlincoln LOD desiderata from a researcher perspective:

    •  Visual interface for grokking an unfamiliar LOD repository (or remembering how a familiar one works) •  Guided SPARQL query construction (caution: might not be able to cater to full range of SPARQL complexity) •  Repo audit: for a given rdf:type of subject, what predicates are usually present? When are these predicates missing for highly-connected subjects? •  Google OpenRefine, but for fuzzy matching against different authorities