$30 off During Our Annual Pro Sale. View Details »

Linked Open Realities: : The Joys and Pains of Using LOD for Research

Linked Open Realities: : The Joys and Pains of Using LOD for Research

I presented this talk at the June 1, 2016 meeting of the CESTA (https://cesta.stanford.edu/) Graduate Fellows at Stanford University. What follows here is the translation of some very loose and informal notes into something more resembling a blog post.

Full text viewable at http://matthewlincoln.net/2016/06/06/linked-open-realities-the-joys-and-pains-of-using-lod-for-research.html

Matthew Lincoln

June 06, 2016
Tweet

More Decks by Matthew Lincoln

Other Decks in Research

Transcript

  1. Linked Open
    Realities
    Matthew Lincoln, PhD
    University of Maryland
    @matthewdlincoln
    June 1, 2016
    CESTA, Stanford University
    The Joys and Pains of Using
    Linked Open Data for Research

    View Slide

  2. Image: @LegoAcademics CC-BY-NC-SA

    View Slide

  3. Use Case: Print Production Networks
    ! LOD ! " LOD "
    1
    SPARQL encourages complex
    data expression and queries
    Composing SPARQL for an
    unfamiliar repo is challenging
    2 LOD is addressable
    LOD is just data on
    someone else’s computer
    3
    Different LOD repositories are
    linkable
    Content producers rarely use
    existing authorities/thesauri
    At the end of the day, I just need data
    @matthewdlincoln

    View Slide

  4. Use case:
    Print Production Networks
    @matthewdlincoln

    View Slide

  5. “Sculptura in Æs”, from Jan van der Straet’s
    Nova Reperta. Published by Philips Galle, c.
    1588-1605. The Metropolitan Museum of Art.
    @matthewdlincoln

    View Slide

  6. Jan van der Straet
    Designer
    Jan Collaert I
    Engraver
    Philips Galle
    Publisher between 1588-1605!
    @matthewdlincoln

    View Slide































  7. Jacob Matham
    Claes Jansz. Visscher
    Hendrick Goltzius
    Karel van Mander I
    Jacques de Gheyn II
    Bartholomeus Spranger
    Jan Harmensz. Muller
    Harmen Jansz. Muller
    Jan van Londerseel
    Dirck Barendsz.

























































































































    Claes Jansz. Visscher
    Gerard van Honthorst
    Theodor Matham
    Frederick de Wit
    Cornelis Danckerts I
    Jonas Suyderhoef
    Cornelis Visscher
    Michiel Mosyn
    Clement de Jonghe
    Cornelis van Dalen I
    @matthewdlincoln
    Example networks
    1580-1590 1640-1650
    Accepted for Journal of Digital Art History (summer 2016)

    View Slide

  8. Between 1550-1750:
    Mining the museum for data
    @matthewdlincoln
    British
    Museum
    Rijksmuseum
    49,306! 46,730!prints
    3,592! 5,644!nodes: distinct designers, printmakers, and publishers
    76,697 ! 94,282 !edges: connections inferred from co-participation in an object
    Accepted for Journal of Digital Art History (summer 2016)

    View Slide

  9. Use Case: Print Production Networks
    ! LOD ! " LOD "
    1
    SPARQL encourages complex
    data expression and queries
    Composing SPARQL for an
    unfamiliar repo is challenging
    2 LOD is addressable
    LOD is just data on
    someone else’s computer
    3
    Different LOD repositories are
    linkable
    Content producers rarely use
    existing authorities/thesauri
    At the end of the day, I just need data
    @matthewdlincoln

    View Slide

  10. JSON vs. RDF/LOD
    @matthewdlincoln

    View Slide

  11. @matthewdlincoln
    SPARQL
    Protocol
    And
    RDF
    Query
    Language

    View Slide

  12. @matthewdlincoln
    http://programminghistorian.org/lessons/graph-databases-and-SPARQL

    View Slide

  13. @matthewdlincoln
    http://programminghistorian.org/lessons/graph-databases-and-SPARQL
    SELECT ?artist ?painting
    WHERE {
    ?artist .
    ?painting ?artist .
    }

    View Slide

  14. Writing SPARQL
    @matthewdlincoln

    View Slide

  15. @matthewdlincoln

    View Slide

  16. #!
    $!
    %!
    @matthewdlincoln

    View Slide

  17. Unknown unknowns in
    graph databases: the
    case of the nationless
    Anthony van Dyck
    (1599-1641)
    @matthewdlincoln

    View Slide

  18. Use Case: Print Production Networks
    ! LOD ! " LOD "
    1
    SPARQL encourages complex
    data expression and queries
    Composing SPARQL for an
    unfamiliar repo is challenging
    2 LOD is addressable
    LOD is just data on
    someone else’s computer
    3
    Different LOD repositories are
    linkable
    Content producers rarely use
    existing authorities/thesauri
    At the end of the day, I just need data
    @matthewdlincoln

    View Slide

  19. SPARQL endpoint
    JSON API
    difficult to bulk download
    too unreliable
    scrape/download and
    create local copies
    Processing
    1.  cleaning
    2.  reshaping
    3.  network analyses
    4.  simulation
    Derivative datasets
    1.  Cleaned and filtered data
    2.  Analysis results
    Visualization &
    interpretation
    Disseminate
    @matthewdlincoln

    View Slide

  20. @matthewdlincoln

    View Slide

  21. @matthewdlincoln
    Image: Randall Munroe (www.xkcd.com/908) CC-BY-NC

    View Slide

  22. @matthewdlincoln

    View Slide

  23. Use Case: Print Production Networks
    ! LOD ! " LOD "
    1
    SPARQL encourages complex
    data expression and queries
    Composing SPARQL for an
    unfamiliar repo is challenging
    2 LOD is addressable
    LOD is just data on
    someone else’s computer
    3
    Different LOD repositories are
    linkable
    Content producers rarely use
    existing authorities/thesauri
    At the end of the day, I just need data
    @matthewdlincoln

    View Slide

  24. What LOD/Semantic Web
    promised the cultural
    heritage world
    The LOD that the cultural
    heritage world has produced
    (at least so far)
    Image: Manu Cornet (www.bonkersworld.net)
    Image: Richard Cyganiak (www.lod-cloud.net)
    @matthewdlincoln

    View Slide

  25. View Slide

  26. Use Case: Print Production Networks
    ! LOD ! " LOD "
    1
    SPARQL encourages complex
    data expression and queries
    Composing SPARQL for an
    unfamiliar repo is challenging
    2 LOD is addressable
    LOD is just data on
    someone else’s computer
    3
    Different LOD repositories are
    linkable
    Content producers rarely use
    existing authorities/thesauri
    At the end of the day, I just need data
    @matthewdlincoln

    View Slide

  27. Image: @LegoAcademics CC-BY-NC-SA
    @matthewdlincoln

    View Slide

  28. Image: @LegoAcademics CC-BY-NC-SA
    @matthewdlincoln
    LOD desiderata from a researcher perspective:
    •  Visual interface for grokking an unfamiliar LOD repository
    (or remembering how a familiar one works)
    •  Guided SPARQL query construction (caution: might not be
    able to cater to full range of SPARQL complexity)
    •  Repo audit: for a given rdf:type of subject, what
    predicates are usually present? When are these predicates
    missing for highly-connected subjects?
    •  Google OpenRefine, but for fuzzy matching against
    different authorities

    View Slide

  29. Image: @LegoAcademics CC-BY-NC-SA
    Matthew Lincoln
    matthewlincoln.net
    @matthewdlincoln

    View Slide