Remember when I said... • … RDF/XML was terrible? • RDF/XML is terrible. • It is awful. Horrendously awful. Really-truly awful. It is awful and you will hate it. I do! • That awfulness contributed a great deal to lack of RDF uptake among web developers. • Lesson: COMPREHENSIBILITY AND USABILITY MATTER to a standard. • If we expect real people to use it, anyway. • And we do… especially as we help information work (e.g. metadata creation) penetrate our patron communities.
Unfortunately, you still have to learn RDF/XML. • Who adopted XML? We did. We’re chumps. • It’s lurking beneath a lot of our metadata and digital infrastructure. • Don’t get me wrong, XML is great for some things. It’s… currently used in contexts where it doesn’t shine, though. • So when we use RDF, it’s often incorporated into XML documents. Which means RDF/XML. • I’m sorry. I really am. I would prefer not to teach RDF/XML at all!
Examples • Fedora Commons repository software • Internal “FOXML” metadata format includes a lot of RDF/XML. • (May be replaced by PCDM. That would be a blessing.) • METS-using systems sometimes include some RDF/XML. • DPLA and Europeana metadata-exchange formats use RDF/XML. (Sometimes, at least.) • Will near-future systems use more RDF/XML? Open question. • I lean toward “no, they’ll express RDF as N-Triples/JSON-LD dumps or HTML microdata, sometimes over actual triplestores.” But that’s a total guess on my part.
Getting started with XML • Starting this class with microdata-in- HTML5: there is method to my madness. • XML uses a lot of the same constructs you’re used to from HTML5. • It is a LOT STRICTER about some syntax things than you’re used to from HTML5… • … and some syntax rules are a bit different. • Still. We’ll start with what we know.
Element content Start tag • Start tags may contain attributes, in the form attname="attvalue". • Difference from HTML5: attribute values are REQUIRED. The itemscope trick doesn’t work in XML; it would have to be itemscope="". • STRAIGHT QUOTES. STRAAAAAAIGHT QUOOOOOOTES. DO NOT use Microsoft Word to write XML. This is why! • You can’t repeat an attribute name in the same start tag. You can have infinite attributes with different attribute names in one start tag, though. • END TAGS ARE NOT OPTIONAL. End tag
Element content Element • Element name in above example: “tagname” • Element value/content: “Element content” • (that is, the text between the tags) • THE ELEMENT IS THE WHOLE THING, okay? • Don’t mix it up with “start tag.” • Why this mixup happens: “element” is also shorthand for the abstraction “all tags with the same name,” e.g. “the blockquote element in HTML can contain…”.
Empty tag • Empty tags don’t have element content. • Though empty tags can still have attributes. • Difference from HTML5: you have to put a slash / in before the closing >. • Convention: put a space before the slash. Why? Long story. • is also okay, but there can’t be ANYTHING between the tags. • Not even whitespace!
“Root element” • It is NOT an XML document unless the whole thing (minus the XML declaration, if present) is enclosed inside ONE element. • … that is, ONE start-tag/end-tag pair. • That outermost element enclosing everything else is the “root element” of the document. • HTML5 has a consistent root element in every document. What is it? • (Some XML languages have more than one element allowed to be the root element. RDF/XML is one such language. HTML is not.) • NOTHING BUT WHITESPACE (spaces, tabs, hard returns) after the root element’s end tag… and some XML checkers will even complain about whitespace.
Root elements • As I mentioned, RDF/XML allows different root elements. • Which one you use depends on whether you’re describing JUST ONE THING or MORE THAN ONE THING in your RDF/XML document. • Just one thing: root element can be . • One or (especially!) more things: root element is . As you’d expect, each thing you describe then gets its own element. • (Nested “things,” like the ones we saw doing microdata, don’t count here.)
Prefixes to namespaces • RDF/XML declares URI prefixes too, as XML attributes. XML calls these “namespaces.” • Same URIs, same conventional prefixes, just different syntax. • @prefix ex: . • xmlns:ex="http://example.com/" • No space before the prefix in XML! The colon moved too! • By convention, put all xmlns attributes on the root element, whichever one you’re using. • (This often isn’t strictly necessary per XML rules, but most real-world XML does it and it always works, so.)
Where does the namespace prefix go? • (once you’ve declared it, that is) • In front of tag names, colon-separated. • some text here • In front of attribute names, again with the colon. • • It looks like prefixes did in Turtle, mostly.
Subject, property, value? • In predictable places, but non-trivial to locate. • The more deeply-nested the RDF/XML gets, the harder it is to disentangle individual triples. • I know. I’m sorry. • We’ll do this a bit at a time.
(slide convention note) • The prefix/namespace declarations are visual clutter. In the examples that follow, I am leaving them off. • Pretend they’re there, okay? And don’t try this yourself. The declarations are not optional!
Subject first! • {more RDF/XML} • Put another way: if you see an rdf:about attribute, the attribute value is ALWAYS the subject of one or more triples. • You can NEVER abbreviate the URI here, sorry. • Dumb and inconsistent? Yeah, but it’s the rule. • Knowing this makes reading RDF/XML, like, 1000% easier. • I’m embarrassed to tell you how long it was before I figured this out.
Blank-node subjects • with no rdf:about attribute. Probably commonest. • with one of those weird blank-node identifiers as value of rdf:about attribute. •
• Usually (not always!) subelements of . • In other words: the property goes inside with the property name as tag name. • In other other words: if you see a namespace/URI prefix that’s not rdf:, it’s usually on a property. • Some properties we’ve used with Isabel Allende: • leaving them as empty tags and ignoring values for now
Property with URI value • Empty subelement of • Property is the subelement’s tag name, as usual. • URI is value of an rdf:resource attribute on the subelement. • NOT THE ELEMENT CONTENT. NOT EVER! NO URIs AS ELEMENT CONTENT! • And you can’t abbreviate the URI here either. Abbreviation only works on tag names! • Mnemonic: the value of rdf:resource is ALWAYS a triple value. • E.g. Isabel Allende wrote Eva Luna.
Property with literal value 1 • Subelement of • Property is the subelement’s tag name. • Literal is the text of the subelement. • If a subelement of has text, that text is ALWAYS A LITERAL. Never ever ever put a URI there! You just de-URIzed it! • Yes, this is stupid and I’m sorry! • E.g. Isabel Allende is female.
Property with literal value 2 • Attribute on . • This is the exception to “property is a subelement.” • Property is the attribute name. • Literal is the attribute value. • E.g. Isabel Allende is female. foaf:gender="female">
Value is more triples • (Acts kind of like microdata!) • Value is an subelement of the property element. • This value can then be the subject of more triples... • Infinitely nestable. Infinitely confusing to read.
There’s more. • There are ways to “abbreviate” more stuff, make lists, etc. • Since almost all RDF/XML is machine- generated, though, what I’ve just shown you is 90%+ of what you’ll see in the wild. • Life is too short for the rest of it. • Which is another way of saying “if you need more, you’ll pick it up when you see it.”