Slide 1

Slide 1 text

SEMANTIC WEB Mastering Linked Data with Python PYDATA B E R L I N Speaker Valerio Maggio [email protected] @leriomaggio +ValerioMaggio

Slide 2

Slide 2 text

This is me… …only half of me, btw… • I’m Valerio (Maggio) • Ph.D. in Computational Science • @University of Naples • PostDoc Researcher • @University of Salerno

Slide 3

Slide 3 text

(Big) Data Analysis Machine Learning Text Mining Natural Language Processing Software Maintenance Information Retrieval Linked Data & Semantic Web

Slide 4

Slide 4 text

PYDATA B E R L I N SEMANTIC
 Mastering Linked Data with Python WEB

Slide 5

Slide 5 text

WWW W O R L D W I D E W E B A BIG BOW TIE The Web Shape

Slide 6

Slide 6 text

WWW W O R L D W I D E W E B A BIG BOW TIE The Web Shape • SCC 
 (Core): • ~30% • Origin Region: • ~24% • Termination Region: • ~24% • Disconnecte d Pages • ~22%

Slide 7

Slide 7 text

THE WORLD WIDE WEB • WWW is full of data • Data published in different formats • e.g., PDF, TIFF, TXT • Linked to/by HTML pages (and other docs) • data that you can link to

Slide 8

Slide 8 text

LIMITATIONS • Data format is for human consumption • Specialized algos (and tools) to: • access, search, reuse data • This is when (and why) 
 Linked Data comes!

Slide 9

Slide 9 text

• Exploiting the basis of the WWW resource model: • URIs (Universal Resource Identifier): 
 Uniquely identify resources • Hyperlinks: Interconnect Resources LINKED D A T A Linked Data refers to a set of principles (and best practices) for publishing and connecting structured data on the Web using international standards of the W3C. . Doc6 Doc3 Doc1 Doc2 Doc4 Doc5

Slide 10

Slide 10 text

THE FIVE STARS PRINCIPLE: ★★★★★ ★: Data is available on the Web, in whatever format ★★: Data is available as machine-readable structured data ★★★: Data is available in a non-proprietary format ★★★★: Data is published using open data standards ★★★★★: All of the above apply, plus links to other data LINKED D A T A

Slide 11

Slide 11 text

LINKED D A T A EXAMPLES Textual Data Linked Data RDFa

Slide 12

Slide 12 text

BBC N A T U R E http://en.wikipedia.org/wiki/Python_(genus) http://dbpedia.org/page/Python_(genus) http://www.bbc.co.uk/nature/life/Python_(genus) http://rs.tdwg.org/dwc/terms/class “Reptile”

Slide 13

Slide 13 text

LINKED D A T A 2007

Slide 14

Slide 14 text

LINKED D A T A 2008

Slide 15

Slide 15 text

LINKED D A T A 2009

Slide 16

Slide 16 text

LINKED D A T A 2011

Slide 17

Slide 17 text

PYDATA B E R L I N SEMANTIC Mastering Linked Data with Python WEB

Slide 18

Slide 18 text

A SET OF PRINCIPLES SEMAN TIC W E B A system that enables machines to "understand" and respond to complex human requests based on their meaning. Tim Berners Lee A set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications. Bob DuCharme

Slide 19

Slide 19 text

SEMAN TIC W E B STANDARD
 STACK

Slide 20

Slide 20 text

SEMAN TIC W E B STANDARD
 STACK

Slide 21

Slide 21 text

RESOURCE DESCRIPTION FRAMEWORK • RDF is not a data format • RDF: Data Model to express relationships between (arbitrary) data elements • RDF files can be serialized in multiple formats: • Turtle, N3, RDF/XML • Stored in Files & Databases (triple-store)

Slide 22

Slide 22 text

RDF DATA MODEL subject predicate object Set of Triples Graph-based model … Resources: subject is a Resource predicate is a Resource object is a Resource URIref(s) Identified by

Slide 23

Slide 23 text

BLANK NODES Anonymous nodes when you don’t know the URI of the thing you would like to reference blank node 1 “http://blog.johndoe.name” (_:ax1, "weblog", “http://blog.johndoe.name”)
 (_:ax1, "secondName", "Doe")
 (_:ax1, "firstName", "John")
 (_:ax1, "knows", _:zb7)
 (_:zb7, "secondName", "Taylor")
 (_:zb7, "firstName", "Steve")
 (_:zb7, "email", “[email protected]") weblog “John” secondName “Doe” knows blank node 2 firstName “[email protected]” email “Steve” secondName “Taylor” firstName Literal Values may optionally have: • language • type

Slide 24

Slide 24 text

SERIALIZATION FORMATS • N-Triples • N3 • RDF/XML • RDFa • RDF/JSON RDF allows for different serialisation formats • simple format - very readable • simple & compact • classic & original W3C Recommendation • compact & integrable within other formats • useful in case of RESTful APIs REUSE

Slide 25

Slide 25 text

EXAMPLE dbpedia.org/page/Python_(programming_language)

Slide 26

Slide 26 text

N-TRIPLES @prefix dbpedia-owl: . @prefix dbpedia: . dbpedia-owl:designer dbpedia:Guido_van_Rossum. ! dbpedia-owl:developer dbpedia:Python_Software_Foundation. ! dbpedia-owl:influenced . ! dbpedia-owl:influenced . ! dbpedia-owl:influenced http://dbpedia.org/resource/Ruby_(programming_language)>.

Slide 27

Slide 27 text

N-3 @prefix dbpedia-owl: . @prefix dbpedia: . dbpedia-owl:designer dbpedia:Guido_van_Rossum; dbpedia-owl:developer dbpedia:Python_Software_Foundation; dbpedia-owl:influenced , , .

Slide 28

Slide 28 text

RDF/XML

Slide 29

Slide 29 text

VOCABULARIES RDF V O C A B U L A R I E S (RDF) vocabulary is a defined set of predicates that can be used in an application. http://lov.okfn.org/dataset/lov/

Slide 30

Slide 30 text

LINKED RDF V O C A B U L A R I E S

Slide 31

Slide 31 text

SIMPLE PROTOCOL AND RDF QUERY LANGUAGE • SPARQL: standard query language for RDF Graphs • SPARQL attempts to match patterns in the graph • binds wildcard variables to find a solution

Slide 32

Slide 32 text

SPARQL EXAMPLES

Slide 33

Slide 33 text

PYDATA B E R L I N SEMANTIC
 Mastering Linked Data with Python WEB

Slide 34

Slide 34 text

TAMING THE SEMANTIC WEB RDFLIB P Y T H O N github.com/RDFLib

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

PERSISTENCE IN RDFLIB rdflib-sqlite rdflib-sqlalchemy rdflib-postgresql rdflib-mysql rdflib-sparqlstore rdflib-rdfjson rdflib-zodb rdflib-leveldb

Slide 37

Slide 37 text

PERSISTENCE IN RDFLIB rdflib-sqlite store load

Slide 38

Slide 38 text

RDFLIB & SPARQL

Slide 39

Slide 39 text

RDFLIB in action (Semantic) Social Network Analysis

Slide 40

Slide 40 text

THE D A T A FOAF w3.org/People/Berners-Lee/card#i foaf:knows dbpedia.org/resource/John_Markoff “Tim Berners-Lee” foaf:name foaf:Person

Slide 41

Slide 41 text

THE D A T A SKOS catalog:KnowledgeRepresentation skos:broader catalog:DescriptionLogic “Knoweldge Representation” skos:prefLabel skos:concept “Artificial Intelligence” skos:prefLabel skos:narrower catalog:ArtificialIntelligence “Artificial Intelligence” skos:prefLabel

Slide 42

Slide 42 text

THE D A T A SIOC ns:blogpost39 sioc:has_creator w3.org/People/Berners-Lee/card#i “title” sioc:title sioc:Post foaf:Person sioc:topic skos:Concept

Slide 43

Slide 43 text

SOCIAL NETWORK ANALYSIS • Who are the most connected people? • Who are the most influential people? • Where are the cliques? SNA R D F

Slide 44

Slide 44 text

LOAD & PARSE

Slide 45

Slide 45 text

BUILD SOCIAL GRAPH SPARQL Query

Slide 46

Slide 46 text

SOCIAL GRAPH ANALYSIS Find Cliques: vertices of a connected component Betweenness Centrality: the number of shortest paths from all vertices to all others that pass through that node.

Slide 47

Slide 47 text

EXTENDING THE NETWORK OF FRIENDS Leverage on
 SPARQL flexibility to look for new friends

Slide 48

Slide 48 text

INFERENCE

Slide 49

Slide 49 text

WHAT’S NEXT • OWL-RL: https://github.com/RDFLib/OWL-RL • FuXi Reasoner: https://github.com/RDFLib/FuXi • Python 3 Support keeps improving

Slide 50

Slide 50 text

SUGGESTED READINGS

Slide 51

Slide 51 text

THANKS FOR YOUR KIND ATTENTION [email protected] @leriomaggio +ValerioMaggio