The impact of an extra feature on the scalability of Linked Connections

The impact of an extra feature on the scalability of Linked Connections

A presentation on a paper at the Consuming Linked Data workshop at ISWC2016: https://www.dcc.uchile.cl/cold2016/#accepted

Check out the demo as well: http://linkedconnections.org

25b6db9c0680e598186d819051ad9e4b?s=128

Pieter Colpaert

October 18, 2016
Tweet

Transcript

  1. The impact of an extra feature on the scalability of

    Pieter Colpaert, Sander Ballieu, Ruben Verborgh, Erik Mannens
  2. How to publish public transport data for everyone?

  3. Proposal http://api.{mycompany}/?from={A}&to={B} &departuretime=2016-10-16T14:45.024Z &wheelchairaccessible=true &transit_modes=plane,railway,bus,car &algoritm_mode=shortest ...

  4. Proposal http://api.{mycompany}/?from={A}&to={B} &departuretime=2016-10-16T14:45.024Z &wheelchairaccessible=true &transit_modes=plane,railway,bus,car &algoritm_mode=shortest ... One service for

    everything/everyone: unscalable
  5. Linked Connections publishes paged collection of departure/arrival (connections) objects instead

    GTFS data dump Route planning algorithms as a service
  6. Page i Page 3 Page 2 Page 1 time hydra:next

    When published in pages, route planning needs i requests instead of 1 hydra:next http://data.{yourcompany}/?page={i}
  7. Demo at ISWC 2015 http://linkedconnections.org

  8. Open source LC client code: Let’s build specialized user-agents* *

    User-agent can be anything, also a third party API LC server HTTP cache HTTP cache Fetch pages LC client Private API
  9. Can we extend this interface with an extra feature? E.g.,

    for people in a wheelchair? GTFS data dump Route planning algorithms as a service http://data.{yourcompany}/?page={i}&wheelchair={true/false} Hypothesis: faster query response times when server helps filtering the connections
  10. Wheelchair accessibility feature Step 1: trip based filtering get all

    wheelchair accessible connections ordered in time Step 2: stop based filtering when getting on/off/transferring at a stop, the stop itself must also be wheelchair accessible In the Linked Connections framework, only step 1 could be done on the server
  11. Evaluation Each time x times the normal load: With x:

    0.5, 1, 2, 4, 8, 12 and 16 Grab this query mix over here: https://github.com/linkedconnections/belgianrail-query-load Re-playing all queries from a route planning API for Belgian railways for 15 min
  12. Experiment 1: client filters trips and stops LC server HTTP

    cache HTTP cache Fetch pages LC client + trips and stops filter Accessible trips database Accessible stops database http://data.{yourcompany}/?page={i}
  13. Experiment 2: server filters trips LC server + trips filter

    HTTP cache HTTP cache http://data.{yourcompany}/?page={i}&wheelchair={true/false} Fetch pages LC client + stops filter Accessible trips database Accessible stops database
  14. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  15. The cache performance lowers with an extra boolean filter on

    the server: 3-6%
  16. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  17. CPU usage on server increases when doing filtering on server

  18. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  19. The CPU usage of client is higher when filtering on

    the server
  20. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  21. Ms/scanning a connection is faster (under lower load) when filtering

    on client
  22. Conclusion For Linked Connections: Hypothesis was wrong: more CPU needed

    for slower overall query times For Linked Data consumer/publisher community: Adding server functionality when publishing data for maximum reuse does not always mean helping user-agents Let’s enable the power of datasets published on the Web for the many, not the few → http://linkedconnections.org