Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Public PhD defense

Pieter Colpaert
September 27, 2017

Public PhD defense

Introduction slides to Linked Open Data and a short overview of my research.

Every time a red arrow appears, a demo was given on the other screen. Demos that were shown:
* Real-time query log visualizer of the iRail API: https://irail.github.io/RealtimeQueriesMap/
* Finding an existing HTTP URI in Linked Open Vocabularies: http://lov.okfn.org
* Calculating the nearest Belgian railway station in your browser https://codepen.io/pietercolpaert/pen/BQrJGv
* Calculating a shortest path through the Belgian railway system network with Linked Connections: http://linkedconnections.org

Pieter Colpaert

September 27, 2017
Tweet

More Decks by Pieter Colpaert

Other Decks in Science

Transcript

  1. Publishing Transport Data for Maximum Reuse Het publiceren van datasets

    in het transportdomein voor maximaal hergebruik DEPARTMENT ELIS RESEARCH GROUP INTERNET & DATA LAB Drs. Pieter Colpaert
  2. Where can you get in what amount of time? taking

    into account: multiple modes, criminality statistics, subscriptions, how fast you can walk...
  3. Sharing data between 2 systems Your system Third party system

    Make agreements Ask specific questions
  4. Sharing data on the Web Your system ? ? ?

    ? ? ? Maximizing reuse → increasing interoperability
  5. Try it yourself with: • BeTrains (Android) • HyperRail (Android)

    • RailerApp (iPhone) • iRail.be (website)
  6. Content 1. Publishing data: A story of raising interoperability 2.

    The research a. Three use cases b. A lightweight Interface for public transit route planning
  7. ↓ Querying syntactic semantic technical legal A story of raising

    interoperability When I have 2 datasets, how easy is it to turn them into 1 dataset?
  8. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses
  9. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide
  10. name type city population StP-Plein Parking Gent 257k { "StP-Plein"

    : { "type": "Parking", "city": "Gent", "population" : "257k" } } <StP-Plein> <type>Parking</type> <city>Gent</city> <population> 257k </population> </StP-Plein> Table / CSV / Spreadsheet JSON XML ↓ Querying syntactic semantic technical legal Serializations
  11. <StP-Plein> <type>Parking</type> <city>Gent</city> <population> 257k </population> </StP-Plein> { "StP-Plein" :

    { "type": "Parking", "city": "Gent", "population" : "257k" } } name type city population StP-Plein Parking Gent 257k <StP-Plein> <type> <Parking> . <StP-Plein> <city> <Gent> . <Gent> <population> "257k" . Table / CSV / Spreadsheet 3 triples JSON XML
  12. <StP-Plein> <type> <Parking> . <StP-Plein> <city> <Gent> . <Gent> <population>

    "257k" . Triples: <subject> <predicate> <object> . ↓ Querying syntactic semantic technical legal Serializations standardized by the RDF1.1 W3C specification
  13. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations
  14. World Wide Web St-P Plein city Gent St Pietersplein type

    Parking Gent population 257k HTTP Machine 1 HTTP Machine 2 HTTP Machine 3 Decentralized publishing A user agent visiting each machine knows more than any of the machines independently
  15. Solution Sint Pietersplein → https://stad.gent/id/parking/P10 is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Parking

    → http://vocab.datex.org/terms#UrbanParkingSite Uniform Resource Identifiers (URIs)
  16. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability
  17. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability → International, regional and local domain models
  18. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability → International, regional and local domain models → How do we solve questions?
  19. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations ?nearby=3.14159265;51.312345
  20. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations ?nearby=3.14159265;51.312345 Problems: What about privacy? What about caching? What about federated querying?
  21. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations “We should build smarter agents, not smarter servers”
  22. data dump Ask any question How to allow for asking

    any kind of query? Your system 3d party Your system ? ? ? ? ? ?
  23. Content 1. Publishing data: A story of raising interoperability 2.

    The research a. Three use cases b. A lightweight Interface for public transit route planning
  24. Re-user Publisher Cost vs. benefit? → cost for adoption needs

    to lower: automation of data integration Maximizing reuse of a dataset Raising interoperability Open Data: research question How can the data source interoperability of public transport datasets be raised?
  25. Studying 3 cases in Flanders 1. Metadata in Data Portals

    2. Open Data at the Dep. for Transport and Public Works 3. Local Decisions as Linked Open Data
  26. Let’s Web-engineer route planning! REST for a high user perceived

    performance, caching and cost-efficiency Hypermedia for enabling intelligent agents Linked Data for semantic interoperability
  27. time Connection Scan Algorithm ~ creating a minimum spanning tree

    through a sorted directed acyclic graph Squares are connections
  28. Resource X Resource ... Resource 2 Resource 1 time hydra:next

    hydra:next X requests needed instead of just 1
  29. Three set-ups a. A query server b. Linked Connections with

    only one user agent over the entire Web c. Linked Connections with always unique user agents Real cost-efficiency of Linked Connections will be found in-between
  30. Results 1. CPU time on the server 2. Average time

    spent by the client per connection = an indication of the user perceived performance 3. Non measured benefits
  31. Results 1. CPU time on the server 2. Average time

    spent by the client per connection = an indication of the user perceived performance 3. Other benefits
  32. Linked Connections is more cost-efficient Real world between these 2

    values Scanning a connection becomes more cost-efficient for data publishers, when publishing cacheable fragments (~78% hit-rate) instead of solving all queries on one machine
  33. Other benefits Privacy by design You do not send your

    user’s profile to a third party Client can execute algorithm in any way E.g., you can only transfer at places with an elevator? And...
  34. Conclusion New trade-off established for cost-efficiently maximizing possible reuse of

    public transport data Data dumps Linked Connections Answer any question on the server Route planning algorithms as a service Data publishing Data services http://api.{myapp}/?from={A}&to={B} http://{myhost}/{datafragmentid} Average cache hit-rate of 78%
  35. Lessons learned building Open Data interfaces 1. Fragment your datasets

    over HTTP 2. Summaries, fragments or functionality for faster queries 3. Caching headers on each document 4. Hypermedia descriptions between fragments 5. A web address (HTTP URI) per object 6. Link to an Open Data license 7. Cross Origin Resource Sharing 8. Increase discoverability with DCAT descriptions Guidelines ∀ new data publishing strategy
  36. Open research questions Linked Data Interfaces: trade-off studies Fragmentation strategies

    for Linked Data Interfaces in different domains: • Geo-spatial Fragmentation • Multi-dimensional time-series • Full-text search (suffix tree may allow for federation) Transport Data How to design “contraction hierarchies” over decentral Linked Transport Datasets? Organizational challenges What Linked Data Interfaces must be hosted by whom in a decentral Open Data strategy?
  37. Imagine a world where knowledge creates power for the many,

    not the few DEPARTMENT ELIS RESEARCH GROUP INTERNET & DATA LAB Drs. Pieter Colpaert