Public PhD defense

25b6db9c0680e598186d819051ad9e4b?s=47 Pieter Colpaert
September 27, 2017

Public PhD defense

Introduction slides to Linked Open Data and a short overview of my research.

Every time a red arrow appears, a demo was given on the other screen. Demos that were shown:
* Real-time query log visualizer of the iRail API: https://irail.github.io/RealtimeQueriesMap/
* Finding an existing HTTP URI in Linked Open Vocabularies: http://lov.okfn.org
* Calculating the nearest Belgian railway station in your browser https://codepen.io/pietercolpaert/pen/BQrJGv
* Calculating a shortest path through the Belgian railway system network with Linked Connections: http://linkedconnections.org

25b6db9c0680e598186d819051ad9e4b?s=128

Pieter Colpaert

September 27, 2017
Tweet

Transcript

  1. Publishing Transport Data for Maximum Reuse Het publiceren van datasets

    in het transportdomein voor maximaal hergebruik DEPARTMENT ELIS RESEARCH GROUP INTERNET & DATA LAB Drs. Pieter Colpaert
  2. How far do you live from here?

  3. Kilometers or Minutes?

  4. Where can you get in what amount of time? taking

    into account: multiple modes, criminality statistics, subscriptions, how fast you can walk...
  5. Instead of a mathematical problem, Route planning today is an

    organizational data sharing problem.
  6. Sharing data between 2 systems Your system Third party system

    Make agreements Ask specific questions
  7. Sharing data on the Web Your system ? ? ?

    ? ? ? Maximizing reuse → increasing interoperability
  8. Try it yourself with: • BeTrains (Android) • HyperRail (Android)

    • RailerApp (iPhone) • iRail.be (website)
  9. Content 1. Publishing data: A story of raising interoperability 2.

    The research a. Three use cases b. A lightweight Interface for public transit route planning
  10. ↓ Querying syntactic semantic technical legal A story of raising

    interoperability When I have 2 datasets, how easy is it to turn them into 1 dataset?
  11. OpenDefinition.org

  12. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses
  13. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide
  14. name type city population StP-Plein Parking Gent 257k { "StP-Plein"

    : { "type": "Parking", "city": "Gent", "population" : "257k" } } <StP-Plein> <type>Parking</type> <city>Gent</city> <population> 257k </population> </StP-Plein> Table / CSV / Spreadsheet JSON XML ↓ Querying syntactic semantic technical legal Serializations
  15. <StP-Plein> <type>Parking</type> <city>Gent</city> <population> 257k </population> </StP-Plein> { "StP-Plein" :

    { "type": "Parking", "city": "Gent", "population" : "257k" } } name type city population StP-Plein Parking Gent 257k <StP-Plein> <type> <Parking> . <StP-Plein> <city> <Gent> . <Gent> <population> "257k" . Table / CSV / Spreadsheet 3 triples JSON XML
  16. <StP-Plein> <type> <Parking> . <StP-Plein> <city> <Gent> . <Gent> <population>

    "257k" . Triples: <subject> <predicate> <object> . ↓ Querying syntactic semantic technical legal Serializations standardized by the RDF1.1 W3C specification
  17. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations
  18. World Wide Web St-P Plein city Gent St Pietersplein type

    Parking Gent population 257k HTTP Machine 1 HTTP Machine 2 HTTP Machine 3 Decentralized publishing A user agent visiting each machine knows more than any of the machines independently
  19. Problem Sint-Pietersplein is a Parking Site ? ↓ Querying syntactic

    semantic technical legal
  20. Problem Sint-Pietersplein is a Parking Site ? ↓ Querying syntactic

    semantic technical legal
  21. Solution Sint Pietersplein → https://stad.gent/id/parking/P10 is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Parking

    → http://vocab.datex.org/terms#UrbanParkingSite Uniform Resource Identifiers (URIs)
  22. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability
  23. What discussing semantics looks like today

  24. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability → International, regional and local domain models
  25. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → RDF serializations → URIs help to discuss the semantic interoperability → International, regional and local domain models → How do we solve questions?
  26. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations ?nearby=3.14159265;51.312345
  27. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations ?nearby=3.14159265;51.312345 Problems: What about privacy? What about caching? What about federated querying?
  28. Let’s design an HTTP interface* to retrieve the closest station

    https://example.org/stations “We should build smarter agents, not smarter servers”
  29. data dump Ask any question How to allow for asking

    any kind of query? Your system 3d party Your system ? ? ? ? ? ?
  30. Content 1. Publishing data: A story of raising interoperability 2.

    The research a. Three use cases b. A lightweight Interface for public transit route planning
  31. Re-user Publisher Cost vs. benefit? → cost for adoption needs

    to lower: automation of data integration Maximizing reuse of a dataset Raising interoperability Open Data: research question How can the data source interoperability of public transport datasets be raised?
  32. Studying 3 cases in Flanders 1. Metadata in Data Portals

    2. Open Data at the Dep. for Transport and Public Works 3. Local Decisions as Linked Open Data
  33. How can we integrate all this data into a route

    planning query?
  34. Let’s Web-engineer route planning! REST for a high user perceived

    performance, caching and cost-efficiency Hypermedia for enabling intelligent agents Linked Data for semantic interoperability
  35. Can we decouple data publishing from the execution of the

    algorithm?
  36. Let’s have a look at the data arrivalStop departureStop departureTime

    arrivalTime = a connection Vehicle
  37. time Connection Scan Algorithm ~ creating a minimum spanning tree

    through a sorted directed acyclic graph Squares are connections
  38. Resource X Resource ... Resource 2 Resource 1 time hydra:next

    hydra:next X requests needed instead of just 1
  39. Make your browser calculate a route for you LinkedConnections.org

  40. Evaluation Is this new data interface more cost-efficient? How much

    slower is it for the data reuser?
  41. Three set-ups a. A query server b. Linked Connections with

    only one user agent over the entire Web c. Linked Connections with always unique user agents Real cost-efficiency of Linked Connections will be found in-between
  42. Three set-ups

  43. Query mix Open Data at https://api.irail.be/logs

  44. Results 1. CPU time on the server 2. Average time

    spent by the client per connection = an indication of the user perceived performance 3. Non measured benefits
  45. Real world between these 2 values Linked Connections is more

    cost-efficient
  46. Results 1. CPU time on the server 2. Average time

    spent by the client per connection = an indication of the user perceived performance 3. Other benefits
  47. Linked Connections is more cost-efficient Real world between these 2

    values Scanning a connection becomes more cost-efficient for data publishers, when publishing cacheable fragments (~78% hit-rate) instead of solving all queries on one machine
  48. Other benefits Privacy by design You do not send your

    user’s profile to a third party Client can execute algorithm in any way E.g., you can only transfer at places with an elevator? And...
  49. Federated route planning becomes straightforward

  50. Conclusion New trade-off established for cost-efficiently maximizing possible reuse of

    public transport data Data dumps Linked Connections Answer any question on the server Route planning algorithms as a service Data publishing Data services http://api.{myapp}/?from={A}&to={B} http://{myhost}/{datafragmentid} Average cache hit-rate of 78%
  51. Lessons learned building Open Data interfaces 1. Fragment your datasets

    over HTTP 2. Summaries, fragments or functionality for faster queries 3. Caching headers on each document 4. Hypermedia descriptions between fragments 5. A web address (HTTP URI) per object 6. Link to an Open Data license 7. Cross Origin Resource Sharing 8. Increase discoverability with DCAT descriptions Guidelines ∀ new data publishing strategy
  52. Open research questions Linked Data Interfaces: trade-off studies Fragmentation strategies

    for Linked Data Interfaces in different domains: • Geo-spatial Fragmentation • Multi-dimensional time-series • Full-text search (suffix tree may allow for federation) Transport Data How to design “contraction hierarchies” over decentral Linked Transport Datasets? Organizational challenges What Linked Data Interfaces must be hosted by whom in a decentral Open Data strategy?
  53. Imagine a world where knowledge creates power for the many,

    not the few DEPARTMENT ELIS RESEARCH GROUP INTERNET & DATA LAB Drs. Pieter Colpaert