Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mastering Linked Data with Ptyhon @PyData Berli...

Mastering Linked Data with Ptyhon @PyData Berlin 2014

In this talk, a general overview of the main features provided by the
rdflib package will be presented.
To this end, several code examples will be discussed regarding the DBPedia project, along with a case study concerning the analysis of a (semantic) social graph.

The case study will be focused on the integration between the
networkx (http://networkx.github.io) module and the rdflib library
in order to crawl, access (via SPARQL), and analyze a
Social Linked Data Graph represented using the FOAF (Friend of a Friend) schema (http://xmlns.com/foaf/0.1/).

Valerio Maggio

July 26, 2014
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. SEMANTIC WEB Mastering Linked Data with Python PYDATA B E

    R L I N Speaker Valerio Maggio [email protected] @leriomaggio +ValerioMaggio
  2. This is me… …only half of me, btw… • I’m

    Valerio (Maggio) • Ph.D. in Computational Science • @University of Naples • PostDoc Researcher • @University of Salerno
  3. (Big) Data Analysis Machine Learning Text Mining Natural Language Processing

    Software Maintenance Information Retrieval Linked Data & Semantic Web
  4. WWW W O R L D W I D E

    W E B A BIG BOW TIE The Web Shape
  5. WWW W O R L D W I D E

    W E B A BIG BOW TIE The Web Shape • SCC 
 (Core): • ~30% • Origin Region: • ~24% • Termination Region: • ~24% • Disconnecte d Pages • ~22%
  6. THE WORLD WIDE WEB • WWW is full of data

    • Data published in different formats • e.g., PDF, TIFF, TXT • Linked to/by HTML pages (and other docs) • data that you can link to
  7. LIMITATIONS • Data format is for human consumption • Specialized

    algos (and tools) to: • access, search, reuse data • This is when (and why) 
 Linked Data comes!
  8. • Exploiting the basis of the WWW resource model: •

    URIs (Universal Resource Identifier): 
 Uniquely identify resources • Hyperlinks: Interconnect Resources LINKED D A T A Linked Data refers to a set of principles (and best practices) for publishing and connecting structured data on the Web using international standards of the W3C. . Doc6 Doc3 Doc1 Doc2 Doc4 Doc5
  9. THE FIVE STARS PRINCIPLE: ★★★★★ ★: Data is available on

    the Web, in whatever format ★★: Data is available as machine-readable structured data ★★★: Data is available in a non-proprietary format ★★★★: Data is published using open data standards ★★★★★: All of the above apply, plus links to other data LINKED D A T A
  10. A SET OF PRINCIPLES SEMAN TIC W E B A

    system that enables machines to "understand" and respond to complex human requests based on their meaning. Tim Berners Lee A set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications. Bob DuCharme
  11. RESOURCE DESCRIPTION FRAMEWORK • RDF is not a data format

    • RDF: Data Model to express relationships between (arbitrary) data elements • RDF files can be serialized in multiple formats: • Turtle, N3, RDF/XML • Stored in Files & Databases (triple-store)
  12. RDF DATA MODEL subject predicate object Set of Triples Graph-based

    model <subj1, pred1, obj1> <subj1, pred2, obj2> <subj2, pred1, obj1> <subj3, pred2, obj3> … Resources: subject is a Resource predicate is a Resource object is a Resource URIref(s) Identified by
  13. BLANK NODES Anonymous nodes when you don’t know the URI

    of the thing you would like to reference blank node 1 “http://blog.johndoe.name” (_:ax1, "weblog", “http://blog.johndoe.name”)
 (_:ax1, "secondName", "Doe")
 (_:ax1, "firstName", "John")
 (_:ax1, "knows", _:zb7)
 (_:zb7, "secondName", "Taylor")
 (_:zb7, "firstName", "Steve")
 (_:zb7, "email", “[email protected]") weblog “John” secondName “Doe” knows blank node 2 firstName “[email protected]” email “Steve” secondName “Taylor” firstName Literal Values may optionally have: • language • type
  14. SERIALIZATION FORMATS • N-Triples • N3 • RDF/XML • RDFa

    • RDF/JSON RDF allows for different serialisation formats • simple format - very readable • simple & compact • classic & original W3C Recommendation • compact & integrable within other formats • useful in case of RESTful APIs REUSE
  15. N-TRIPLES @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix dbpedia: <http://dbpedia.org/resource/> . <http://dbpedia.org/resource/Python_(programming_language)>

    dbpedia-owl:designer dbpedia:Guido_van_Rossum. ! <http://dbpedia.org/resource/Python_(programming_language)> dbpedia-owl:developer dbpedia:Python_Software_Foundation. ! <http://dbpedia.org/resource/Python_(programming_language)> dbpedia-owl:influenced <http://dbpedia.org/resource/Go_(programming_language)>. ! <http://dbpedia.org/resource/Python_(programming_language)> dbpedia-owl:influenced <http://dbpedia.org/resource/Julia_(programming_language)>. ! <http://dbpedia.org/resource/Python_(programming_language)> dbpedia-owl:influenced http://dbpedia.org/resource/Ruby_(programming_language)>.
  16. N-3 @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix dbpedia: <http://dbpedia.org/resource/> . <http://dbpedia.org/resource/Python_(programming_language)>

    dbpedia-owl:designer dbpedia:Guido_van_Rossum; dbpedia-owl:developer dbpedia:Python_Software_Foundation; dbpedia-owl:influenced <http://dbpedia.org/resource/Go_(programming_language)>, <http://dbpedia.org/resource/Julia_(programming_language)>, <http://dbpedia.org/resource/Ruby_(programming_language)>.
  17. VOCABULARIES RDF V O C A B U L A

    R I E S (RDF) vocabulary is a defined set of predicates that can be used in an application. http://lov.okfn.org/dataset/lov/
  18. SIMPLE PROTOCOL AND RDF QUERY LANGUAGE • SPARQL: standard query

    language for RDF Graphs • SPARQL attempts to match patterns in the graph • binds wildcard variables to find a solution
  19. THE D A T A SKOS catalog:KnowledgeRepresentation skos:broader catalog:DescriptionLogic “Knoweldge

    Representation” skos:prefLabel skos:concept “Artificial Intelligence” skos:prefLabel skos:narrower catalog:ArtificialIntelligence “Artificial Intelligence” skos:prefLabel
  20. SOCIAL NETWORK ANALYSIS • Who are the most connected people?

    • Who are the most influential people? • Where are the cliques? SNA R D F
  21. SOCIAL GRAPH ANALYSIS Find Cliques: vertices of a connected component

    Betweenness Centrality: the number of shortest paths from all vertices to all others that pass through that node.