Introduction slides to Linked Open Data

Data → open and linked Pieter Colpaert

1. The basics Data → Open Data → Linked Data
2. Linked Open Data How to publish data? Programme

Data Wikipedia says: English (disambiguation): data is uninterpreted information English
(computing): is any sequence of symbols given meaning by specific acts of interpretation. Dutch: data is the plural of datum, which is an observation of a fact

What’s data quality?

What’s interoperability?

↓ Querying syntactic object semantic technical legal process Would the
data governance be able to be merged? Are you legally allowed to merge 2 datasets? Can you connect the communication channels? e.g., merge a dataset published as a CD with a dataset published using floppy disk How easy is it to ask certain questions over the borders of the dataset? What’s the interoperability of the serialisation formats? E.g., JSON vs. PDF? What can you request to the server? Do the words in the one dataset mean the same as the words in the other?

Open Data Because non-personal data increases in value when others
reuse it

reuse is allowed Data on the web reuse in a
gray zone unauthorised reuse

OpenDefinition.org

How can we find open data? It’s made available through
open data portals http://data.gov.uk, http://datahub.io, http://open-data.europa.eu, http://data.gent.be, … Via links in existing datasets e.g., http://dbpedia.org/resource/Ghent

Linked Data Because it is impossible to store all the
world’s knowledge on one machine

name type same as location iMinds company IBBT Gaston Crommenlaan
8 { “iMinds” : { “type” : “company”, “same as” : “IBBT, “location” : “Gaston Crommenlaan 8” } } <iMinds> <type>company</type> <sameas>IBBT</sameas> <location> Gaston Crommenlaan 8 </location> </iMinds> Table / CSV / Spreadsheet JSON XML Serialisations

name type same as location iMinds company IBBT Gaston Crommenlaan
8 <iMinds> <type> <company> . <iMinds> <sameas> <IBBT> . <iMinds> <vestiging> “Gaston Crommenlaan 8” . Table / CSV / Spreadsheet triples Triple structuur { “iMinds” : { “type” : “company”, “same as” : “IBBT, “location” : “Gaston Crommenlaan 8” } } <iMinds> <type>company</type> <sameas>IBBT</sameas> <location> Gaston Crommenlaan 8 </location> </iMinds> JSON XML

World Wide Web iMinds same as IBBT iMinds is a
company IBBT located at Gaston Crommenlaan 8 Machine 1 Machine 2 Machine 3 Linked data

Problem The word company is ambiguous. How can we make
sure that machines understand each other? semantic interoperability What about “is a”? and what about “iMinds”?

Solution iMinds → http://data.kbodata.be/organisation/0866_386_380#id is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company →
http://www.w3.org/ns/regorg#RegisteredOrganization Uniform Resource Identifiers (URI’s) een triple = is an atomary piece of data (a datum or a fact) that cannot be misunderstood on machine-level in a Web context

iMinds compa ny is a iMinds → http://data.kbodata.be/organisation/0866_386_380#id is a
→ http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company → http://www.w3.org/ns/regorg#RegisteredOrganization

Company register iMinds compa ny is a Open Knowledge Belgium
TVH Maes …

company register address database … Government Service X

The Linked Open Data cloud

Summary New terms: data quality, data interoperability, triples, open data,
linked open data cloud Linked Open Data means: making your data more interoperable with other datasets on the web by using URIs as identifiers and triples as atomary building blocks

Data publishing iMinds → http://data.kbodata.be/organisation/0866_386_380#id is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company
→ http://www.w3.org/ns/regorg#RegisteredOrganization e.g., visit these links:

Linked Data principles 1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names 3. When looking up URIs, provide useful information 4. Include links to other URIs for discoverability Only important if you’re defining new URIs Not important if you’re publishing facts by reusing identifiers

E.g., I’m launching a new company {mynewcompany} → http://{mynewcompany}.be/#org is
a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company → http://www.w3.org/ns/regorg#RegisteredOrganization An identifier for your company. The semantics are controlled by you

Mind the ambiguity

E.g., I’m launching a new company {mynewcompany} → http://{mynewcompany}.be/#org is
a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company → http://www.w3.org/ns/regorg#RegisteredOrganization {mynewcompany} → http://{mynewcompany}.be/#org has a homepage → http://xmlns.com/foaf/0.1/homepage http://{mynewcompany}.be/

What URIs should I use? http://lov.okfn.org The place to find
easy to reuse open vocabularies

Publishing methods 1. Data dumps 2. Triples within HTML pages
3. JSON → JSON-LD web services 4. Triple pattern fragments A couple of examples

http://wiki.dbpedia.org/Downloads2014 → all facts in 1 file Data dumps 1
big file with a list of triples Pro: can be imported without trouble within your own system Contra: hard to keep up to date

Triples within HTML Annotate HTML with RDFa

Triples within HTML Pro: after the data has been crawled,
the data is machine interpretable

JSON API Old manner: GET a URL, Data is without
further ado available for your app Pro: easy to refresh → real-time data http://{address to API document on Empire State}

JSON-LD API Add “context”: Each word is mapped to a
URI Each fact becomes a triple Pro: API responses are disambiguated Pro: 2 APIs with a similar context are semantically interoperable

Triple Pattern Fragments server allow basic questions ?subject → ?predicate
→ ?object iMinds → is a → company Pro: allow apps to ask complex queries over the Web of data e.g., http://fragments.dbpedia.org

Triple Pattern Fragments clients Demo: http://fragments.dbpedia.org

Questions?

Introduction slides to Linked Open Data

Introduction slides to Linked Open Data

More Decks by Pieter Colpaert

Other Decks in Technology

Featured

Transcript