Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SPARQL PONIES!!!!!

SPARQL PONIES!!!!!

A quick-and-dirty introduction to SPARQL for students in LIS 652 "XML and Linked Data."

Dorothea Salo

April 22, 2015
Tweet

More Decks by Dorothea Salo

Other Decks in Technology

Transcript

  1. SPARQL PONIES!!!! LIS 652 Dorothea Salo Cecilia Teodomira Márquez, “My

    Little Pony - Blind Bag Figurines (FiM) “Better Quality Pic”” https://www.flickr.com/photos/rjrgmc28/6903687042/ CC-BY-SA
  2. So you have linked data. So what? •Can you get

    information out of triples on the Web? •Can you use linked data to answer questions? •Can you boil a big complicated set of triples down into just the triples you need?
  3. So you have a database. So what? •Can you get

    information out of it? •Can you use it to answer questions? •Can you boil a big complicated set of tables down into just the information you need?
  4. •SQL: Structured Query Language, how to get information out of

    a relational database •We teach SQL, among other things, in LIS 751. •SPARQL: SPARQL Protocol and RDF Query Language (cute, right?), how to get information out of a triplestore •You do not need to know SQL to learn SPARQL. •As someone who’s pretty decent with SQL, let me say that knowing SQL may actually get in the way!
  5. How it works (over the web) •(if you have an

    in-house triplestore, you can probably SPARQL it without all the intermediate web nonsense) •The website containing the triples must implement a “SPARQL endpoint.” •At minimum, this means “a special URL that you can add a SPARQL query onto and get an answer back.” This is for computers, not humans! •Commonly, though, there’s a human-readable web-form query page also. •… for certain values of “human-readable.” (It’s not that much worse than the appspot things we’ve been using.)
  6. There are not many SPARQL endpoints. •Nearly all websites that

    publish linked data don’t have one. •Why don’t they? •Because it’s not a standard part of the web-server software stack. It’s hugely easier to publish triples to the Web than to figure out how to put up and maintain a SPARQL endpoint. •Because triplestores are not NEARLY as slick, fast, and optimized as relational DBs. •Because answering SPARQL queries is computationally intensive (and malicious queries that eat CPUs for breakfast are trivial to write). It’s EASY to knock over a poor web server just trying to handle SPARQL. •Frankly, this is yet another case of “the standards nerds didn’t think this through real well.”
  7. Without SPARQL endpoints, how does anybody use linked data? •Downloading

    triple dumps (like LoC’s) •Grabbing linked data embedded in web pages •(via slurping up microdata or following invisible links to RDF) •You can basically do this by hacking a web crawler a bit — that’s what Google did! Web crawlers are very familiar technology, unlike SPARQL. •This is, how shall I say, SUPER-KLUDGY. •Linked-data infrastructure needs better than this! Among other things, it needs a standard change-notification mechanism something fierce.
  8. Enough prologue. Time to SPARQL! Cecilia Teodomira Márquez, “My Little

    Pony - Blind Bag Figurines (FiM) “Better Quality Pic”” https://www.flickr.com/photos/rjrgmc28/6903687042/ CC-BY-SA
  9. Before the beginning: URI prefixes •Remember how in Turtle you

    declared @prefix dct: <http://purl.org/dc/terms>.? •Yeah, well, it’s slightly different in SPARQL. This is drop-dead stupid, sorry. •PREFIX dct: <http://purl.org/dc/terms> •No @ and no . I don’t believe all-caps is required, but I could be wrong. •They probably did it to be more like SQL. I think that was the wrong decision.
  10. (Europeana making this easier… just click on the prefix at

    bottom to add its declaration to the query.)
  11. Exploring a triplestore, part 1 •“What do you know about

    <URI>?” •DESCRIBE <URI> •SELECT * WHERE { <URI> ?p ?o. } (This one means “give me every triple where <URI> is the subject.”) •Try this! •Go to http://bnb.data.bl.uk/flint-sparql •Click on any of the sample queries (just to put all the prefixes in the query window at left). •Now scroll to the bottom and delete EVERYTHING that isn’t a prefix declaration. •Try DESCRIBE <http://bnb.data.bl.uk/id/person/RankinIan> (you’ll want to copy the results to a text editor to read them…) Read the triples together at your table. •Now try SELECT * WHERE { <http://bnb.data.bl.uk/id/person/RankinIan> ?p ?o. } Compare the result sets. Can you figure out how this query means what it means?
  12. You try it! •See if you can get the BNB

    to describe your favorite author. (You may have to experiment a bit to get the right URI.) •BIG HINT: Find the person’s authority-control string in LoC or VIAF… •See if you can get the BNB to describe me! •Spoiler: yes, there are triples about me in BNB. •Yes, I know I’m old. •Now try your favorite movie director. Now try a Western classical-music composer. •At least once, do this with SELECT instead of DESCRIBE.
  13. Exploring a triplestore, Part 2 •“What kinds of things-in-the-world (people,

    places, things, concepts, whatever) does this triplestore talk about?” •How does linked data represent things-in- the-world? •Put another way, where would you expect to see this kind of information in a bunch of RDF triples? •If you’re having trouble with this, look at the results of your prior queries. Where is there a triple that says what kind of thing-in-the-world Ian Rankin is? •(There may well be more than one; in that case, what do those triples have in common?)
  14. RDF classes •… the good old <URI> a <ClassURI>. pattern!

    The <ClassURI>s are what we want! •So what we need to get SPARQL to do is tell us all the triple objects in triples where the property/predicate is “a”. •Do we care what the subject is? Nah, not really. We just want that class list.
  15. Starting from SELECT •SELECT = “Show me!” •It and DESCRIBE

    are not the only options in SPARQL. There is also CONSTRUCT, which we may or may not get to eventually. •Problem: any given class is probably the object of lots of triples! (Think how many foaf:Persons are in the BNB dataset!) •So we also need to tell SPARQL “only list each class once.” •SELECT DISTINCT = “Show me (deduplicated)!” •So this is how we solve the challenge of getting SPARQL just to show us each class once, no matter how many triples contain it.
  16. After SELECT: what you want to know •* means “everything

    you got.” •Returns whole triples/graphs, usually. •Otherwise, express this in the form of a variable, which in SPARQL looks like ?var. •More than one bit of info you want back? No problem. List them and separate them with commas: SELECT ?var1, ?var2 •Don’t even think about “where this is in the triplestore” yet. JUST ASK YOURSELF “what do I want to know?” and give it a name. •The answer to Life, the Universe, and Everything is 42. But SPARQL won’t tell you that, so hush.
  17. Example: “what are the classes in this triplestore?” •What should

    the SELECT look like? •SELECT DISTINCT ?class
  18. WHERE •Comes after SELECT is done with. •SPARQL needs to

    understand how to recognize the information you want. •You tell it by showing it the pattern of triples that limits it to showing you ONLY the results you want. •WHERE { <subject> <property> <value>. } •You can have as many triples as you need inside those curly braces; the syntax works just like Turtle. •You can substitute variables for any part of a triple, if you don’t have something specific in mind.
  19. RDF classes •… the good old <URI> a <ClassURI>. pattern!

    The <ClassURI>s are what we want! •So what we need to get SPARQL to do is tell us all the triple objects in triples where the property/predicate is “a”. •Do we care what the subject is? Nah, not really. We just want that class list.
  20. Example: “what are the classes in this triplestore?” •So what

    we need to get SPARQL to do is tell us all the triple objects in triples where the property/predicate is “a”. •Do we care what the subject is? Nah, not really. We just want that class list. •SELECT DISTINCT ?class WHERE {} •Try to fill in the braces.
  21. Example: “what are the classes in this triplestore?” •So what

    we need to get SPARQL to do is tell us all the triple objects in triples where the property/predicate is “a”. •Do we care what the subject is? Nah, not really. We just want that class list. •SELECT DISTINCT ?class WHERE {?s a ?class.} •Because we don’t actually care what the subject of any of these triples is, we just pop in another variable to hold the subject place in the triple. •(By convention, people often use ?s ?p and ?o when doing this. Sometimes, though, it’s easier if you give placeholders more meaningful names, so that’s fine too.) •Try this in BNB!
  22. Try it again! •What kind(s) of thing(s) is Ian Rankin,

    as far as the BNB knows? •Remember, it’s FINE for one URI to belong to multiple classes! You can be a Person and an Agent and an Author and… •Do you need DISTINCT here?
  23. Try it again! •What kind(s) of thing(s) is Ian Rankin,

    as far as the BNB knows? •Remember, it’s FINE for one URI to belong to multiple classes! You can be a Person and an Agent and an Author and… •Do you need DISTINCT here? •Probably not (why would you write the same triple twice?), but it’s harmless. •SELECT ?class WHERE {<http://bnb.data.bl.uk/id/person/RankinIan> a ?class.} •Try this for your movie director and composer; see if you get anything different.
  24. Exploring a triplestore, Part 3 •“What does this triplestore say

    about things?” •Make a SPARQL query that makes a deduped list of all the properties in the BNB triplestore. •Be kind to the website! Put LIMIT 100 at the end of your query, please. •Queries with looooooooooooots of answers are one way to drive a SPARQL endpoint to its knees. Many triplestores use boatloads of properties.
  25. Try these on another triplestore! •(with a different form interface,

    sorry) •Europeana! •http://europeana.ontotext.com/sparql •What classes does Europeana recognize? How about properties (LIMIT 100 again)? •Pick a class you’re curious about. What are 100 URIs belonging to that class? •Pick one of those URIs. What does Europeana know about it?
  26. If we have time… more complex WHEREs •Go back to

    BNB. •Try and work out how to ask “What are the titles of books that Ian Rankin wrote?” •It may help you to look at the BNB’s data model: http://www.bl.uk/ bibliographic/pdfs/bldatamodelbook.pdf •Hint: you need more than one triple in the WHERE clause. •Hint: Try getting the URIs for the book(s) before you go after their title(s).
  27. No homework this week. This deck is licensed under a

    Creative Commons Attribution 4.0 International license. Cecilia Teodomira Márquez, “My Little Pony - Blind Bag Figurines (FiM) “Better Quality Pic”” https://www.flickr.com/photos/rjrgmc28/6903687042/ CC-BY-SA