Linked Open Data - A Step Towards A Semantic Web

Bebo White SLAC National Accelerator Laboratory Linked Open Data A
Step Towards a Semantic Web Web Camp III Stanford July 2011

“Developments in science and information processing have changed the meaning
of the verb, ‘to know.’ It used to mean ‘having information stored in one’s memory.’ It now means the process of having access to information and knowing how to use it.” ---Herbert Simon

Status of the Semantic Web   For years we have
heard about it! Is it real?   Where are the applications for it? Is there a “killer app?”   Who is using it?   What role (if any) will it play in the Future Web? The Semantic Web is alive and well in Linked Open Data (LOD)!

Perceptions of Web Content   The Web is generally thought
of being composed of pages, documents   We have been able to insert some data   Images <img src=“….”>   Multimedia   Web 2.0 mashups provided a new way of thinking about a “Web of Data” but it was awkward to obtain   APIs   “Screen-scraping”

The Web of Documents   Analogy   A global filesystem
  Designed for   Human consumption   Primary objects   Documents (or sub-parts of)   Links between   Documents (or sub-parts of)   Degree of structure in objects   Fairly low   Semantics of content and links   Implicit

The Web of Documents: Issues   Simplicity   Loosely structured
data, untyped links, disconnected data   Integration   Show me all the publications from HKU PhD students in Computer Science   Querying   Which papers have I written with colleagues outside the US?

The Web of Linked Documents

“Data Silos” on the Web

A World Wide Network of Data Silos

The Web of Linked Data   Analogy   A global
database   Designed for   Machines first, humans later   Primary objects   Things (or descriptions of things)   Links between   Things   Degree of structure in (descriptions of ) things   High   Semantics of content and links   Explicit

The Web of Linked Data Don’t just link the documents,
link the things

Linked Data is   A way of publishing data on
the Web that   Encourages reuse   Reduces redundancy   Maximizes its (real and potential) inter-connectedness   Enables network effects to add value to data

Linked Data Technology Stack   URIs – Universal Resource Indicators
  HTTP – HyperText Transport Protocol   RDF – Resource Description Framework   (RDFS/OWL) – RDF Schema/Web Ontology Language

URIs – Not Just for Web Pages   “A Uniform
Resource Identifier (URI) provides a simple and extensible means for identifying a resource” – RFC 3986   Many different schemes – http://, ftp://, tel:, urn:, mailto:   Some URIs for “real world” things   http://www.bebowhite.com/   http://dbpedia.org/page/University_of_Hong_Kong   http://sws.geonames.org/1819729/

HTTP   Data access mechanism   Using http:// URIs to
identify things allows people to reference these things

RDF   A data format for describing things and their
interrelationships   Standardized (XML)   Easily parsed by machines

FOAF: Friend of a Friend   An RDF vocabulary for
describing people:   Identities   Interests   Affiliations   Social networks   Etc.

Try it yourself- http://www.ldodds.com/foaf/foaf-a-matic

Imagine…   A “Web” where   Documents are available for
download on the Internet   But there would be no hyperlinks among them

And the Problem is Real

Data on the Web is Not Enough   Need a
proper infrastructure for a real Web of Data   Data is available on the Web   Accessible via standard Web technologies   Data are interlinked over the Web   ie, data can be integrated over the Web   This is where Semantic Web technologies come in

A Web of Data Supports Innovation

A Nice Usage of UK Government Data

Start with a Book

Simplified Bookstore Data ID Author Title Publisher Year ISBN 3642203914
id_xyz Social Media Tools and Platforms in Learning Environments id_qpr 2011 ID Name Homepage id_xyz White, Bebo http://www.bebowhite.com ID Publisher’s name City id_qpr Springer New York

Exported Data as a Set of Relations http://…isbn/3642203914 White, Bebo
http://www.bebowhite.com Social Media Tools… 2011 New York Springer a:title a:year a:city a:p_name a:name a:homepage a:author

Notes on Exporting the Data (1)   Relations form a
graph   The nodes refer to the “real” data or contain some literal   How the graph is represented in machine is immaterial to the first order

Notes on Exporting the Data (2)   Data export does
not necessarily mean physical conversion of the data   Relations can be generated on-the-fly at query time   Via SQL “bridges”   Scraping HTML pages   Extracting data from Excel sheets   etc.   One can export part of the data

RDF Triples (1)   Formalize the data about the book
  We “connected” the data…   But a simple connection is not enough… data should be named somehow   Hence the RDF Triples: a labelled connection between two resources

RDF Triples (2) •  An RDF Triple (s,p,o) is such
that: •  “s”, “p” are URI-s, ie, resources on the Web; “o” is a URI or a literal •  “s”, “p”, and “o” stand for “subject”, “property”, and “object” •  here is the complete triple: (<http://…isbn…6682>, <http://…/original>, <http://…isbn…409X>) •  RDF is a general model for such triples •  With machine readable formats like RDF/XML, Turtle, N3, RDFa, …

RDF Triples (3)   Resources can use any URI  
http://www.example.org/file.html#home   http://www.example.org/file2.xml#xpath(//q[@a=b])   http://www.example.org/form?a=b&c=d   RDF triples form a directed, labeled graph (the best way to think about them!)

A Simple RDF Example (in RDF/XML) <rdf:Description rdf:about="http://…/isbn/2020386682"> <f:titre xml:lang="fr”>Outils
de Medias Sociaux</f:titre> <f:original rdf:resource="http://…/isbn/3642203914"/> </rdf:Description> (Note: namespaces are used to simplify the URI-s) http://…isbn/2020386682 Outils de Medias Sociaux… http://…isbn/ 3642203914

RDF in Programming Practice   For example, using Java+Jena (HP’s
Bristol Lab):   A “Model” object is created   The RDF file is parsed and results stored in the Model   The Model offers methods to retrieve:   triples   (property,object) pairs for a specific subject   (subject,property) pairs for specific object   etc.   The rest is conventional programming…   Similar tools exist in Python, PHP, etc.

The Rough Structure of Data Integration   Map the various
data onto an abstract data representation   Make the data independent of its internal representation…   Merge the resulting representations   Start making queries on the whole!   Queries not possible on the individual data sets

Data Merging with RDF   Mix schemas/vocabularies within one document
  Less painful data merging   Mashups that work the way they’re supposed to!

Linked Data Principles   Use URIs as names of things
  Anything, not just documents   You are not your homepage   Information resources and non-information resources   Use HTTP URIs   Globally unique names, distributed ownership   Allows people to look up those names   Provide useful information in RDF   When someone looks up a URI   Include RDF links to other URIs   To enable discovery of related information

Why Publish Linked Data?   Ease of discovery   Ease
of consumption   Standards-based data sharing   Reduced redundancy   Added value   Build ecosystems around your data/content

The Linking Open Data Project

The Linking Open Data Project   Community project with W3C
support   Take existing open data sets   Make them available on the Web in RDF   Interlink them with other data sets   Began in early 2007

The Linked Open Data Cloud 180 datasets, 20 billion RDF
triples

Why is Linked Open Data Important?   Because in many
cases it’s our data!   Efficiency, reducing redundancy   Promotes a digital society   Opens the door to data innovation and discovery   Holds the promise of creating from data   Knowledge   Wisdom   Benefit for all

Thanks for Your Attention! Questions? Comments? [email protected]

Linked Open Data - A Step Towards A Semantic Web

Linked Open Data - A Step Towards A Semantic Web

Other Decks in Technology

Featured

Transcript