Data for maximum reuse at Solvay Brussels School

Data for maximum reuse at Solvay Brussels School

Open Data is only a legal definition. The goal behind Open Data however is to maximize the reuse. In this talk I explain the theory of maximizing the interoperability between open datasets and hint towards possible business models today upon Open Data

25b6db9c0680e598186d819051ad9e4b?s=128

Pieter Colpaert

March 17, 2017
Tweet

Transcript

  1. Data for maximum reuse @pietercolpaert Trying to maximise the reuse

    of your datasets Reusing open data to enrich your own business model
  2. Open Data in the world For example Data Portal from

    Worldbank http://data.worldbank.org
  3. Open Data in Europe Public Sector Information INSPIRE directive PSI

    directive
  4. PSI & Open Data? Open Data PSI

  5. SNCB STIB De Lijn TEC Schedules shared shared shared open

    Real-time planned shared shared planned Tickets no no yes no Historic no no no open Status of e.g., Public Transit in BE?
  6. Open Data vs. data sharing?

  7. Sharing data between 2 systems Your system Third party system

    Agree on a protocol Will determine which questions can be answered in a timely fashion Can ask questions to your system as previously agreed
  8. Sharing data on the Web Your system ? ? ?

    ? ? ? Maximizing reuse → need to raise the interoperability
  9. Costs Benefits

  10. ↓ Querying syntactic semantic technical legal When I have got

    2 datasets, how easy is it to use them as if they were 1?
  11. As a reuser, you need certainty that you won’t get

    sued https://github.com/iRail/stations
  12. OpenDefinition.org ↓ Querying syntactic semantic technical legal

  13. reuse is allowed Documents on the web reuse in a

    gray zone unauthorised reuse
  14. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → JSON, XML, CSV, HTML… Open Standards
  15. name type same as location imec company iMinds X {

    “imec” : { “type” : “company”, “same as” : “iMinds, “location” : “X” } } <imec> <type>company</type> <sameas>iMinds</sameas> <location> X </location> </imec> Table / CSV / Spreadsheet JSON XML Serialisations
  16. name type same as location imec company iMinds X <imec>

    <type> <company> . <imec> <sameas> <iMinds> . <imec> <location> “X” . Table / CSV / Spreadsheet triples Triple structure <imec> <type>company</type> <sameas>iMinds</sameas> <location> X </location> </imec> { “iMinds” : { “type” : “company”, “same as” : “iMinds”, “location” : “X” } } JSON XML
  17. World Wide Web imec same as iMinds imec is a

    company iMinds located at X Machine 1 Machine 2 Machine 3 Linked data Solving semantic interoperability?
  18. Solution iMinds → http://data.kbodata.be/organisation/0866_386_380#id is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type Company →

    http://www.w3.org/ns/regorg#RegisteredOrganization Uniform Resource Identifiers (URIs)
  19. A story of raising interoperability ↓ Querying syntactic semantic technical

    legal When I have 2 datasets, how easy is it to turn them into 1 dataset? → Open Definition & open licenses → The Internet: exchanging data world-wide → JSON, XML, CSV, HTML… Open Standards → Linked Data: work in progress
  20. Open Data is only a legal definition… but: The 5

    stars of Linked Open Data 5stardata.info
  21. Sharing data between 2 systems Your system Third party system

    Agree on a protocol Will determine which questions can be answered in a timely fashion Can ask questions to your system as previously agreed
  22. Sharing data on the Web Your system ? ? ?

    ? ? ? Maximizing reuse → need to raise the interoperability
  23. data dump Ask any question How to allow for asking

    any kind of query? Your system 3d party Your system ? ? ? ? ? ?
  24. data dump Ask any question Asking questions Your system 3d

    party Your system ? ? ? ? ? ? Data publishing: Scalable Every request is cacheable Dataset split in fragments
  25. A long tail for for e.g., transport data services ...

    Hard to guess which kind of queries will be needed … More specific features Size of audience Google maps Proximus CityMapper Go-OV Ally Transit App NextTrain smartwatch
  26. Proposal http://api.{mycompany}/?from={A}&to={B} &departuretime=2016-10-16T14:45.024Z &wheelchairaccessible=true &transit_modes=plane,railway,bus,car &algoritm_mode=shortest ... Yet this interface

    will need to answer all questions for all third party apps…
  27. data dump Route planning algorithms as a service Asking questions

    Your system 3d party Your system ? ? ? ? ? ? Does not scale: Extra users comes with extra load Does not give necessary flexibility to companies
  28. Discover all the necessary data on the Web Just like

    websites, we want your data to be high available
  29. API fanboys Real data reusers Need Open Data Want services

    on top of data What we ask data owners
  30. What we ask data owners Data dumps Smart servers Data

    publishing (cheap/reliable) Data services (rather expensive/unreliable) Entire query languages over HTTP Dataset split in fragments Smart agents algorithms as a service Read more at http://linkeddatafragments.org API fanboys Open Data
  31. Business model? API fanboys Real data reusers Need Open Data

    Need services on top of data Business opportunity?
  32. Servers publishing Open Data e.g., • all the planned and

    actual arrivals and departures • the network of roads in a certain region worldwide web-services e.g., • a route planner: from → to • the closest station to your current location? Scalable businesses $$$ $ $ $$$ end-users
  33. We want a world where knowledge creates power for the

    many, not the few.