Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RDF/XML

 RDF/XML

An introduction to XML well-formedness syntax and RDF/XML. Assumes you already understand RDF and its Turtle syntax.

Dorothea Salo

March 03, 2016
Tweet

More Decks by Dorothea Salo

Other Decks in Technology

Transcript

  1. RDF/XML
    How the Semantic Web was lost

    View Slide

  2. XML syntax rules

    View Slide

  3. Remember when I said...
    • … RDF/XML was terrible?
    • RDF/XML is terrible.
    • It is awful. Horrendously awful. Really-truly awful. It is
    awful and you will hate it. I do!
    • That awfulness contributed a great deal to lack of RDF
    uptake among web developers.
    • Lesson: COMPREHENSIBILITY AND
    USABILITY MATTER to a standard.
    • If we expect real people to use it, anyway.
    • And we do… especially as we help information work (e.g.
    metadata creation) penetrate our patron communities.

    View Slide

  4. Unfortunately, you still
    have to learn RDF/XML.
    • Who adopted XML? We did. We’re chumps.
    • It’s lurking beneath a lot of our metadata and digital
    infrastructure.
    • Don’t get me wrong, XML is great for some things. It’s…
    currently used in contexts where it doesn’t shine, though.
    • So when we use RDF, it’s often incorporated
    into XML documents. Which means RDF/XML.
    • I’m sorry. I really am. I would prefer not to
    teach RDF/XML at all!

    View Slide

  5. Examples
    • Fedora Commons repository software
    • Internal “FOXML” metadata format includes a lot of RDF/XML.
    • (May be replaced by PCDM. That would be a blessing.)
    • METS-using systems sometimes include some
    RDF/XML.
    • DPLA and Europeana metadata-exchange
    formats use RDF/XML. (Sometimes, at least.)
    • Will near-future systems use more RDF/XML?
    Open question.
    • I lean toward “no, they’ll express RDF as N-Triples/JSON-LD
    dumps or HTML microdata, sometimes over actual
    triplestores.” But that’s a total guess on my part.

    View Slide

  6. Getting started with XML
    • Starting this class with microdata-in-
    HTML5: there is method to my madness.
    • XML uses a lot of the same constructs
    you’re used to from HTML5.
    • It is a LOT STRICTER about some syntax
    things than you’re used to from HTML5…
    • … and some syntax rules are a bit different.
    • Still. We’ll start with what we know.

    View Slide

  7. Optional, but the norm.
    Very first thing in the document,
    if it’s there at all.


    some text here
    more text


    XML declaration

    View Slide

  8. Element content
    Start tag
    • Start tags may contain attributes, in the form
    attname="attvalue".
    • Difference from HTML5: attribute values are REQUIRED. The
    itemscope trick doesn’t work in XML; it would have to be
    itemscope="".
    • STRAIGHT QUOTES. STRAAAAAAIGHT QUOOOOOOTES. DO
    NOT use Microsoft Word to write XML. This is why!
    • You can’t repeat an attribute name in the same start tag. You
    can have infinite attributes with different attribute names in
    one start tag, though.
    • END TAGS ARE NOT OPTIONAL.
    End tag

    View Slide

  9. Element content
    Element
    • Element name in above example: “tagname”
    • Element value/content: “Element content”
    • (that is, the text between the tags)
    • THE ELEMENT IS THE WHOLE THING, okay?
    • Don’t mix it up with “start tag.”
    • Why this mixup happens: “element” is also shorthand for
    the abstraction “all tags with the same name,” e.g. “the
    blockquote element in HTML can contain…”.

    View Slide


  10. Empty tag
    • Empty tags don’t have element content.
    • Though empty tags can still have attributes.
    • Difference from HTML5: you have to put a
    slash / in before the closing >.
    • Convention: put a space before the slash. Why? Long story.
    • is also okay, but there
    can’t be ANYTHING between the tags.
    • Not even whitespace!

    View Slide

  11. Good XML:
    Not actually XML at all:
    Why?


    some text here
    more text



    some text here
    more text

    View Slide

  12. “Root element”
    • It is NOT an XML document unless the whole thing
    (minus the XML declaration, if present) is enclosed inside ONE
    element.
    • … that is, ONE start-tag/end-tag pair.
    • That outermost element enclosing everything else is
    the “root element” of the document.
    • HTML5 has a consistent root element in every
    document. What is it?
    • (Some XML languages have more than one element allowed to be
    the root element. RDF/XML is one such language. HTML is not.)
    • NOTHING BUT WHITESPACE (spaces, tabs, hard
    returns) after the root element’s end tag… and some
    XML checkers will even complain about whitespace.

    View Slide

  13. Okay. Now we can start
    learning RDF/XML.
    Yay?

    View Slide

  14. Root elements
    • As I mentioned, RDF/XML allows different root
    elements.
    • Which one you use depends on whether you’re
    describing JUST ONE THING or MORE THAN
    ONE THING in your RDF/XML document.
    • Just one thing: root element can be .
    • One or (especially!) more things: root element is .
    As you’d expect, each thing you describe then gets its own
    element.
    • (Nested “things,” like the ones we saw doing microdata, don’t
    count here.)

    View Slide

  15. Prefixes to namespaces
    • RDF/XML declares URI prefixes too, as XML
    attributes. XML calls these “namespaces.”
    • Same URIs, same conventional prefixes, just
    different syntax.
    • @prefix ex: .
    • xmlns:ex="http://example.com/"
    • No space before the prefix in XML! The colon moved too!
    • By convention, put all xmlns attributes on the
    root element, whichever one you’re using.
    • (This often isn’t strictly necessary per XML rules, but most
    real-world XML does it and it always works, so.)

    View Slide

  16. Where does the
    namespace prefix go?
    • (once you’ve declared it, that is)
    • In front of tag names, colon-separated.
    • some text here
    • In front of attribute names, again with
    the colon.

    • It looks like prefixes did in Turtle, mostly.

    View Slide

  17. Example

    xmlns:foaf="http://xmlns.com/foaf/0.1/">

    female


    View Slide

  18. Subject, property,
    value?

    View Slide

  19. Subject, property, value?
    • In predictable places, but non-trivial to locate.
    • The more deeply-nested the RDF/XML gets,
    the harder it is to disentangle individual
    triples.
    • I know. I’m sorry.
    • We’ll do this a bit at a time.

    View Slide

  20. (slide convention note)
    • The prefix/namespace declarations are
    visual clutter. In the examples that
    follow, I am leaving them off.
    • Pretend they’re there, okay? And don’t
    try this yourself. The declarations are not
    optional!

    View Slide

  21. Subject first!
    • {more
    RDF/XML}
    • Put another way: if you see an rdf:about
    attribute, the attribute value is ALWAYS the
    subject of one or more triples.
    • You can NEVER abbreviate the URI here, sorry.
    • Dumb and inconsistent? Yeah, but it’s the rule.
    • Knowing this makes reading RDF/XML, like,
    1000% easier.
    • I’m embarrassed to tell you how long it was before I figured
    this out.

    View Slide

  22. Blank-node subjects
    • with no rdf:about
    attribute. Probably commonest.
    • with one of those weird
    blank-node identifiers as value of
    rdf:about attribute.

    View Slide

  23. Properties





    • Usually (not always!) subelements of .
    • In other words: the property goes inside
    with the property name as tag name.
    • In other other words: if you see a namespace/URI prefix
    that’s not rdf:, it’s usually on a property.
    • Some properties we’ve used with Isabel Allende:
    • leaving them as empty tags and ignoring values for now

    View Slide

  24. Property with URI value
    • Empty subelement of
    • Property is the subelement’s tag name, as usual.
    • URI is value of an rdf:resource attribute on the subelement.
    • NOT THE ELEMENT CONTENT. NOT EVER! NO URIs AS ELEMENT CONTENT!
    • And you can’t abbreviate the URI here either. Abbreviation only works on tag
    names!
    • Mnemonic: the value of rdf:resource is ALWAYS a triple value.
    • E.g. Isabel Allende wrote Eva Luna.



    View Slide

  25. Property with literal value 1
    • Subelement of
    • Property is the subelement’s tag name.
    • Literal is the text of the subelement.
    • If a subelement of has text, that text is ALWAYS
    A LITERAL. Never ever ever put a URI there! You just de-URIzed it!
    • Yes, this is stupid and I’m sorry!
    • E.g. Isabel Allende is female.

    female

    View Slide

  26. Property with literal value 2
    • Attribute on .
    • This is the exception to “property is a subelement.”
    • Property is the attribute name.
    • Literal is the attribute value.
    • E.g. Isabel Allende is female.
    foaf:gender="female">

    View Slide

  27. Value is more triples
    • (Acts kind of like microdata!)
    • Value is an subelement of the
    property element.
    • This value can then be the subject of more triples...
    • Infinitely nestable. Infinitely confusing to read.



    {more RDF/XML here, talking about Isabel Allende}


    {more RDF/XML here, talking about Eva Luna}

    View Slide

  28. There’s more.
    • There are ways to “abbreviate” more stuff,
    make lists, etc.
    • Since almost all RDF/XML is machine-
    generated, though, what I’ve just shown
    you is 90%+ of what you’ll see in the wild.
    • Life is too short for the rest of it.
    • Which is another way of saying “if you need more, you’ll
    pick it up when you see it.”

    View Slide