Slide 1

Slide 1 text

Maximizing the reuse of Open Transport Data Pieter Colpaert Ghent University - iMinds/imec

Slide 2

Slide 2 text

How far do you live from work?

Slide 3

Slide 3 text

km or min?

Slide 4

Slide 4 text

Imagine a program calculating distance in minutes What data would you need?

Slide 5

Slide 5 text

Transport has become a data sharing problem How can we fix it?

Slide 6

Slide 6 text

Sharing data between 2 systems Your system Third party system Agree on a protocol Will determine which questions can be answered in a timely fashion Can ask questions to your system as previously agreed

Slide 7

Slide 7 text

Sharing data on the Web Your system ? ? ? ? ? ? Maximizing reuse → need to raise the interoperability

Slide 8

Slide 8 text

↓ Querying syntactic semantic technical legal When I have got 2 datasets, how easy is it to use them as if they were 1?

Slide 9

Slide 9 text

OpenDefinition.org ↓ Querying syntactic semantic technical legal

Slide 10

Slide 10 text

reuse is allowed Documents on the web reuse in a gray zone unauthorised reuse

Slide 11

Slide 11 text

A story of raising interoperability ↓ Querying syntactic semantic technical legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses

Slide 12

Slide 12 text

A story of raising interoperability ↓ Querying syntactic semantic technical legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide

Slide 13

Slide 13 text

A story of raising interoperability ↓ Querying syntactic semantic technical legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → JSON, XML, CSV, … Open Standards

Slide 14

Slide 14 text

name type same as location iMinds company IBBT Gaston Crommenlaan 8 { “iMinds” : { “type” : “company”, “same as” : “IBBT, “location” : “Gaston Crommenlaan 8” } } company IBBT Gaston Crommenlaan 8 Table / CSV / Spreadsheet JSON XML Serialisations

Slide 15

Slide 15 text

name type same as location iMinds company IBBT Gaston Crommenlaan 8 . . “Gaston Crommenlaan 8” . Table / CSV / Spreadsheet triples Triple structure { “iMinds” : { “type” : “company”, “same as” : “IBBT, “location” : “Gaston Crommenlaan 8” } } company IBBT Gaston Crommenlaan 8 JSON XML

Slide 16

Slide 16 text

World Wide Web iMinds same as IBBT iMinds is a company IBBT located at Gaston Crommenlaan 8 Machine 1 Machine 2 Machine 3 Linked data

Slide 17

Slide 17 text

Problem The word company is ambiguous. How can we make sure that machines understand each other? semantic interoperability What about “is a”? and what about “iMinds”?

Slide 18

Slide 18 text

Solution iMinds → http://data.kbodata.be/organisation/0866_386_380#id is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company → http://www.w3.org/ns/regorg#RegisteredOrganization Uniform Resource Identifiers (URIs)

Slide 19

Slide 19 text

E.g., Linked Datex and Linked GTFS Vocabularies at http://vocab.datex.org/terms http://vocab.gtfs.org/terms E.g., Searching for Parking Facilities with Linked Data thanks to “rich snippets” But is that it?

Slide 20

Slide 20 text

A story of raising interoperability ↓ Querying syntactic semantic technical legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → JSON, XML, CSV, … Open Standards → using URIs instead of local identifiers

Slide 21

Slide 21 text

Where can you get in what amount of time? under specific conditions: taking into account: multimodality, criminality, your subscriptions, what you’re carrying, disabilities, etc

Slide 22

Slide 22 text

data dump Route planning algorithms as a service

Slide 23

Slide 23 text

A long tail for transport data services ... Hard to guess which kind of queries will be needed

Slide 24

Slide 24 text

Can we find a way to publish for example public transport data while minimizing federated reuse cost?

Slide 25

Slide 25 text

Data needed for algorithm a connection departureTime + departureStop arrivalTime + arrivalStop another connection departureTime + departureStop arrivalTime + arrivalStop

Slide 26

Slide 26 text

time * The Connection Scan Algorithm (CSA) And this is the algorithm* ~ creating a minimum spanning tree through a sorted directed acyclic graph Squares are connections

Slide 27

Slide 27 text

Resource X Resource ... Resource 2 Resource 1 time nextPage nextPage When published in pages on the Web, route planning will need X requests instead of 1

Slide 28

Slide 28 text

Try it yourself at http://LinkedConnections.org

Slide 29

Slide 29 text

Striking the golden mean? Data dumps Smart servers Data publishing (cheap/reliable) Data services (rather expensive/unreliable) Entire query languages over HTTP Dataset split in fragments Smart agents algorithms as a service

Slide 30

Slide 30 text

Global interoperability for Route Planners? ↓ Querying syntactic semantic technical legal → Open Definition & open licenses → The Internet: exchanging data world-wide → JSON, XML, CSV, … Open Standards → Work in progress linkedconnections.org → using URIs instead of local identifiers

Slide 31

Slide 31 text

Checklist Open (Transport) Data Do you have an open license on your data? Is it shared publicly on the Web in an open format (html/css/xml/json…)? Do you identify things in a globally interoperable way? How easy is it to include your dataset in a federated query? Are you exposing basic reusable building blocks for your dataset?

Slide 32

Slide 32 text

A world where knowledge creates power for the many, not the few Questions? @pietercolpaert http://pieter.pm