Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SPARQL PONIES 2!!!!

SPARQL PONIES 2!!!!

An introduction to complex SPARQL queries, from LIS 652 "XML and Linked Data" at the UW-Madison iSchool.

Dorothea Salo

April 30, 2015
Tweet

More Decks by Dorothea Salo

Other Decks in Technology

Transcript

  1. SPARQL PONIES 2!!! LIS 652 Dorothea Salo Cecilia Teodomira Márquez,

    “My Little Pony - Blind Bag Figurines (FiM) “Better Quality Pic”” https://www.flickr.com/photos/rjrgmc28/6903687042/ CC-BY-SA
  2. Review: what we know •SPARQL queries run through a URL

    called a “SPARQL endpoint.” •They can use PREFIXes, like Turtle, to abbreviate URIs. •They are commonly of the form SELECT ?thing WHERE { ?thing ?p ?o. } •Any part of the triple could be substituted with ?thing. •There can be more than one triple in the curly braces. •You can also DESCRIBE a URI to find out what the endpoint knows about it.
  3. What we don’t (yet) know •How to write a SPARQL

    query that spans two or more triples. •I have yet to see a SPARQL tutorial that talks through the thought process here. •So I’m going to try to! We’ll see how it goes! •Open the BNB’s SPARQL endpoint in your browser. It’s what we’ll be querying today. •http://bnb.data.bl.uk/flint-sparql
  4. Before you query: steps 1 through 3 •Step 1: phrase

    your question in plain language. •“What books did Ian Rankin write?” •Step 2: Circle all the nouns in your question. •“books” “Ian Rankin” •Step 3: Underline the word(s)/phrase(s) that represent what you want to know. •“what books” •These will turn into variables after SELECT. Might as well name them now and write out the SELECT. In this case, SELECT ?book or perhaps SELECT ?title. •(Yes, singular. Think of it as “select any book/title.”) •Make a note of whether you want a URI or literal as your end result. This time? Literal.
  5. Before you query: step 4 •Step 4: Put a square

    around the verb(s). •“write” •Helping verbs (“is/was” “did”) are not helpful here. They’re probably part of a phrase (“is/was verbed” “is/was adverb” “did… verb”); put your square around the rest of the phrase instead. •Chances are your verb will map onto an RDF property/predicate. •If you don’t know what the URI is for it yet, you’ll have to figure that out, too. •(The other possibility is that there’s some kind of RDF class in the middle somewhere.)
  6. Before you query: step 5 •Step 5: Ask yourself “how

    would linked data represent the nouns in circles?” •Decide right now: does this probably have a URI? is it an RDF class? •Ian Rankin: is a SPECIFIC thing-in-the-world, so he probably has a URI. If you don’t already know it, your query will have to find it out based on the literal “Ian Rankin.” •He may also be a member of an RDF class. Guess about what it might be. •books: probably an RDF class. Plan to put in a triple with ?s a and the class’s URI. Individual books will have their own URIs; the book title will be some kind of literal.
  7. Try it! •Run steps 2 through 5 on the following

    questions: •What works did William Shakespeare write? •When was the author Mary Shelley born? •Choose a SLIS faculty member (other than me). What has s/he written or otherwise created that’s been cataloged in the BNB? •What is Frankenstein about? (THINK HARD about how linked data might represent this!)
  8. Step 6: Find the documentation! •(you had to know I’d

    say this, right?) •If you’re lucky, there’s either: •A diagram of the data model somewhere •Information on the SPARQL endpoint page •BNB has both. Find both. •Now go back to your questions and start writing down URIs (prefixes are fine) for classes and properties you know you need. •If you can work out whole triples that will be part of your query, do that too… but we’ll get to that more formally in a minute.
  9. Step 6bis: Argh, no documentation! Exploratory queries, then. •Lots of

    linked datasets don’t have nice diagrams or helpful info at their SPARQL endpoints. •When that’s the case, you have to run some queries to figure out what’s going on in the data. •Run the SPARQL queries from last week that get you lists of properties and classes. •A trick: after your }, add ORDER BY ?class or ORDER BY ?p (whichever you’re looking for).
  10. Step 7: What’s in a class? •Pick one of the

    classes you know you need. •We’ll use foaf:Person as an example, since several of our questions are about authors. Don’t forget to declare the foaf prefix! •Run this query: • SELECT DISTINCT ?person WHERE {?person a foaf:Person.} LIMIT 20 •You should get 20 URIs back. DESCRIBE two or three of them. •Would any of the result triples start to answer your question, if you had the URI for the person you actually care about? •Good! Copy them somewhere as patterns. •Repeat for other classes you’re interested in. Now try properties!
  11. Taking a step back •You now have: •Your SELECT clause

    •Some sense of how the dataset is put together •Thinking of your query as a jigsaw puzzle made of triples: possibly some individual puzzle pieces •You still need to: •Get from your initial information, which is a literal, to the URI that represents it. •Possibly go in the other direction, if you want a literal result. •Put the puzzle pieces together!
  12. URIs from literals •Literals are ALWAYS ALWAYS ALWAYS triple objects!!!!!!!!!!!

    •We can use this to our advantage! •SELECT ?uri, ?p WHERE { ?uri ?p ‘My Literal’. } Try this with Ian Rankin! •If this doesn’t work: •Add a language decorator (“@en”) to the literal. Irritatingly, if the RDF specifies a language, a SPARQL query has to too. •Dear SPARQL developers: Write out Postel’s Law 100 times. In blood. This was a bad design decision that confuses SPARQL users. Love, me. •If this STILL doesn’t work… •… the dataset has formatted the literal in some way you don’t know about yet. Argh.
  13. Brute-forcing with literals •What you need at this point is

    a query that doesn’t need the EXACT literal, because you can’t figure out what it is. •See if your property list offers you anything that will help. •BNB has foaf:familyName. Try SELECT ?uri WHERE { ?uri foaf:familyName ‘Rankin’. } Eyeball the results for the name you want. •What if you’re trying to script this, though? Computers can’t eyeball! •You have another piece of name information. There’s a foaf property for it. Plug it into the curly brackets as another triple!
  14. •SELECT ?uri WHERE { ?uri foaf:familyName ‘Rankin’. ?uri foaf:givenName ‘Ian’.}

    •THERE WE GO! And now you’ve written your first query with multiple triples in it. •(For the sake of argument, see if you can make foaf:name work.)
  15. If all else fails… •There’s a way. Sort of. •SELECT

    ?uri WHERE { ?uri ?p ?name. FILTER (regex(str(?name), ‘Rankin’)). } Don’t break your brain trying to figure this one out. •Guess what? This query broke BNB! Even when I added more triples to try to narrow it down some! •So if you can’t find a property workaround that gives you your URI… at most SPARQL endpoints you’ll be hosed. Sorry. •If the endpoint runs SPARQL 1.1: •SELECT ?uri WHERE { ?uri ?p CONTAINS(?name, ‘Rankin’). } •But BNB doesn’t, so we’re hosed again. Sorry.
  16. Okay! We have our URI! Now what? •If you have

    a model diagram: •Your starting point in the graph is the “instance” representing your initial piece of information. Where would you plug Ian Rankin into the graph? •Your ending point is the spot in the graph that represents what you actually want to know. Where is a book title in the graph? •The triples in your { } are the path in-between. Trace that path! •Once you have the path, follow it by writing triples from one end to the other.
  17. Okay! We have our URI! Now what? •If you don’t

    have a model diagram: •You’ll have to follow your nose. •Start with your triples that give you Ian Rankin’s URI; put them in your { }. •DESCRIBE that URI and look for a property that gets you closer to your destination. Add another triple with Ian Rankin’s URI as subject, that property, and a named ?var as object. •Look in your RDF class list for the URI for a class that ?var might be part of. Run SELECT ?var WHERE { ?var a <URI>. } LIMIT 20. •DESCRIBE one of the results. Look for a property that gets you closer to your desired answer. •Lather, rinse, repeat!
  18. (leaving PREFIXes out…) SELECT ?title WHERE { ?author foaf:familyName ‘Rankin’.

    ?author foaf:givenName ‘Ian’. ?author blt:hasCreated ?book. ?book a bibo:Book. ?book rdfs:label ?title. } Does anything different happen if you leave out the ?book a bibo:Book triple? When will something different happen?
  19. Try it! •Okay, give these a try now. •What works

    did William Shakespeare write? •When was the author Mary Shelley born? •Choose a SLIS faculty member (other than me). What has s/he written or otherwise created that’s been cataloged in the BNB? •What is Frankenstein about? (THINK HARD about how linked data might represent this!)
  20. No answer? •Your query might be wrong. •In this case,

    try the query with just the first triple (changing the SELECT clause to match). If that works, add the second triple (again changing the SELECT clause). Keep going until it breaks. •The dataset might not know the answer. •No real way to troubleshoot this; a dataset can’t really tell you what it doesn’t know!
  21. No homework this week. This deck is licensed under a

    Creative Commons Attribution 4.0 International license. Cecilia Teodomira Márquez, “My Little Pony - Blind Bag Figurines (FiM) “Better Quality Pic”” https://www.flickr.com/photos/rjrgmc28/6903687042/ CC-BY-SA