CloudEast: How Shutl uses Neo4j to delivery even faster

CloudEast: How Shutl uses Neo4j to delivery even faster

intro into graph databases and cypher

4494217973e7e9595da2e48172e89e5f?s=128

Volker Pacher

May 24, 2013
Tweet

Transcript

  1. Tuesday, 28 May 13

  2. Tuesday, 28 May 13

  3. How Neo4j helps Shutl to delivery even faster... Tuesday, 28

    May 13
  4. Volker Pacher senior developer @shutl @vpacher http://github.com/vpacher Tuesday, 28 May

    13
  5. Tuesday, 28 May 13

  6. • SaaS platform Tuesday, 28 May 13

  7. • SaaS platform • we provide an API for carriers

    and merchants Tuesday, 28 May 13
  8. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform Tuesday, 28 May 13
  9. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: Tuesday, 28 May 13
  10. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase Tuesday, 28 May 13
  11. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice Tuesday, 28 May 13
  12. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) Tuesday, 28 May 13
  13. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min Tuesday, 28 May 13
  14. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min • SOA with services built using jRuby, sinatra, mongoDB and neo4j Tuesday, 28 May 13
  15. Tuesday, 28 May 13

  16. Problems? Tuesday, 28 May 13

  17. http://xkcd.com/287/ Tuesday, 28 May 13

  18. problems with our previous attempt (v1): Tuesday, 28 May 13

  19. • exponential growth of joins in mysql with added features

    problems with our previous attempt (v1): Tuesday, 28 May 13
  20. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable problems with our previous attempt (v1): Tuesday, 28 May 13
  21. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable • api response time growing too large the more data was added problems with our previous attempt (v1): Tuesday, 28 May 13
  22. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable • api response time growing too large the more data was added • our fastest delivery was quicker then our slowest query! problems with our previous attempt (v1): Tuesday, 28 May 13
  23. The case for graph databases: Tuesday, 28 May 13

  24. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) Tuesday, 28 May 13
  25. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) Tuesday, 28 May 13
  26. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly Tuesday, 28 May 13
  27. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less Tuesday, 28 May 13
  28. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query Tuesday, 28 May 13
  29. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query • traversals of relationships are easy and very fast Tuesday, 28 May 13
  30. What is a graph anyway? Node 1 Node 2 Node

    4 Node 3 a collection of vertices (nodes) connected by edges (relationships) Tuesday, 28 May 13
  31. a short history Leonard Euler the seven bridges of Königsberg

    (1735) Tuesday, 28 May 13
  32. directed graph Node 1 Node 2 Node 4 Node 3

    each relationship has a direction or one start node and one end node Tuesday, 28 May 13
  33. property graph name: Volker •nodes contain properties (key, value) •relationships

    have a type and are always directed •relationships can contain properties too name: Sam :friends name: Megan :knows since: 2005 name: Paul :friends :works_for :knows Tuesday, 28 May 13
  34. a graph is its own index (constant query performance) Tuesday,

    28 May 13
  35. the case for Neo4j Tuesday, 28 May 13

  36. the case for Neo4j • we can run it embedded

    in the same jvm Tuesday, 28 May 13
  37. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already Tuesday, 28 May 13
  38. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) Tuesday, 28 May 13
  39. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher Tuesday, 28 May 13
  40. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher • the guys from neotech are awesome Tuesday, 28 May 13
  41. neo4j (jvm) flockdb (jvm) DEX (c++) OrientDB (jvm) Sones GraphDB

    (c#) some graph dbs available: Tuesday, 28 May 13
  42. embedded vs. standalone pros: cons: better performance transaction support neo4j

    gem is available we can use cypher and traversal only the code running the db has access to the db access via rest api and cypher language independent and code doesn’t need to run on JVM not as performant only works with cypher transaction is on a per query basis need to write model wrappers for ourselves Tuesday, 28 May 13
  43. gotchas and other stuff to consider: Tuesday, 28 May 13

  44. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools Tuesday, 28 May 13
  45. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs Tuesday, 28 May 13
  46. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard Tuesday, 28 May 13
  47. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory Tuesday, 28 May 13
  48. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial Tuesday, 28 May 13
  49. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial • nested datastructure (hashes and array) can’t be stored and need to be converted to json Tuesday, 28 May 13
  50. Querying the graph: Cypher Tuesday, 28 May 13

  51. Querying the graph: Cypher • declarative query language specific to

    neo4j Tuesday, 28 May 13
  52. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive Tuesday, 28 May 13
  53. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) Tuesday, 28 May 13
  54. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) Tuesday, 28 May 13
  55. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it Tuesday, 28 May 13
  56. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it • switch from a mySQl world is made easier by the use of cypher instead of having to learn a traversal framework straight away Tuesday, 28 May 13
  57. • START: Starting points in the graph, obtained via index

    lookups or by element IDs. • MATCH: The graph pattern to match, bound to the starting points in START. • WHERE: Filtering criteria. • RETURN: What to return. • CREATE: Creates nodes and relationships. • DELETE: Removes nodes, relationships and properties. • SET: Set values to properties. • FOREACH: Performs updating actions once per element in a list. • WITH: Divides a query into multiple, distinct parts cypher clauses Tuesday, 28 May 13
  58. an example graph Node 1 me Node 2 Steve Node

    3 Sam Node 4 David Node 5 Megan me - [:knows] -> Steve - [:knows] -> David me - [:knows] -> Sam - [:knows] -> Megan Megan - [:knows] -> David knows knows knows knows knows Tuesday, 28 May 13
  59. START me=node(1) MATCH me-[:knows]->()-[:knows]->fof RETURN fof the query Tuesday, 28

    May 13
  60. START me=node(1) MATCH me-[:knows*2..]->fof WHERE fof.name =~ 'Da.*' RETURN fof

    Tuesday, 28 May 13
  61. a good place to try it out: http://console.neo4j.org/ Tuesday, 28

    May 13
  62. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens representing dates/times Tuesday, 28 May 13
  63. find all events on a specific day START root=node(0) MATCH

    root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()- [:happens]-event RETURN event Tuesday, 28 May 13
  64. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13
  65. find all events for a given range START root=node(0) MATCH

    root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start, root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end, start-[:next*0..]-middle-[:next*0..]-end, middle-[:happens]-event RETURN event Tuesday, 28 May 13
  66. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 (20) Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13
  67. does an event happen on a certain date? START event=node(20)

    MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-() RETURN event Tuesday, 28 May 13
  68. testing and importing: • we are using rspec for all

    tests on the api and practice tdd/bdd • setting up ‘scenarios’ for an integration test was difficult and slow with existing tools • we decided to built our own dsl based on the geoff notation developed by Nigel Small to allow for the setting up of scenarios and for the import of data from mysql Tuesday, 28 May 13
  69. geoff: developed by Nigel Small (@technige, http://geoff.nigelsmall.net/) allows modelling of

    graphs in a human readable form (A) {"name": "Alice"} (B) {"name": "Bob"} (A)-[:KNOWS]->(B) and provides a java interface to insert them into an existing graph Tuesday, 28 May 13
  70. • imports any geoff file into a neo4j db •

    it is open source geoff-importer gem (https://github.com/shutl/geoff-importer) Tuesday, 28 May 13
  71. • provides a dsl for creating a graph and inserting

    it into the db • it is open source • it works together with FactoryGirl (https://github.com/thoughtbot/factory_girl) • it supports only the graph structure of the neo4j gem at the moment • we haven’t solved all the issues with event listeners yet geoff gem (https://github.com/shutl/geoff) Tuesday, 28 May 13
  72. Geoff(Company, Person) do company 'Acme' do address "13 Something Road"

    outgoing :employees do person 'Geoff' person 'Nigel' do name 'Nigel Small' end end end company 'Github' do outgoing :customers do person 'Tom' person 'Dick' person 'Harry' end end person 'Harry' do incoming :customers do company 'NeoTech' end end end geoff gem (https://github.com/shutl/ geoff) Tuesday, 28 May 13
  73. root node :company :person acme 13 somthing road NeoTech GitHub

    :all :all :all Geoff Nigel Small Tom Dick Harry :all :all :all :all :all :employees :employees :customers :customers :customers Tuesday, 28 May 13
  74. QUESTIONS? Volker Pacher volker@shutl.com www.shutl.com Tuesday, 28 May 13