Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The impact of an extra feature on the scalabili...

The impact of an extra feature on the scalability of Linked Connections

A presentation on a paper at the Consuming Linked Data workshop at ISWC2016: https://www.dcc.uchile.cl/cold2016/#accepted

Check out the demo as well: http://linkedconnections.org

Pieter Colpaert

October 18, 2016
Tweet

More Decks by Pieter Colpaert

Other Decks in Technology

Transcript

  1. The impact of an extra feature on the scalability of

    Pieter Colpaert, Sander Ballieu, Ruben Verborgh, Erik Mannens
  2. Page i Page 3 Page 2 Page 1 time hydra:next

    When published in pages, route planning needs i requests instead of 1 hydra:next http://data.{yourcompany}/?page={i}
  3. Open source LC client code: Let’s build specialized user-agents* *

    User-agent can be anything, also a third party API LC server HTTP cache HTTP cache Fetch pages LC client Private API
  4. Can we extend this interface with an extra feature? E.g.,

    for people in a wheelchair? GTFS data dump Route planning algorithms as a service http://data.{yourcompany}/?page={i}&wheelchair={true/false} Hypothesis: faster query response times when server helps filtering the connections
  5. Wheelchair accessibility feature Step 1: trip based filtering get all

    wheelchair accessible connections ordered in time Step 2: stop based filtering when getting on/off/transferring at a stop, the stop itself must also be wheelchair accessible In the Linked Connections framework, only step 1 could be done on the server
  6. Evaluation Each time x times the normal load: With x:

    0.5, 1, 2, 4, 8, 12 and 16 Grab this query mix over here: https://github.com/linkedconnections/belgianrail-query-load Re-playing all queries from a route planning API for Belgian railways for 15 min
  7. Experiment 1: client filters trips and stops LC server HTTP

    cache HTTP cache Fetch pages LC client + trips and stops filter Accessible trips database Accessible stops database http://data.{yourcompany}/?page={i}
  8. Experiment 2: server filters trips LC server + trips filter

    HTTP cache HTTP cache http://data.{yourcompany}/?page={i}&wheelchair={true/false} Fetch pages LC client + stops filter Accessible trips database Accessible stops database
  9. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  10. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  11. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  12. Results 1. Difference in cache hit-rate 2. Difference in CPU

    use on the server 3. Difference in CPU use on the client 4. Average time to relax one connection
  13. Conclusion For Linked Connections: Hypothesis was wrong: more CPU needed

    for slower overall query times For Linked Data consumer/publisher community: Adding server functionality when publishing data for maximum reuse does not always mean helping user-agents Let’s enable the power of datasets published on the Web for the many, not the few → http://linkedconnections.org