Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CloudEast: How Shutl uses Neo4j to delivery eve...

CloudEast: How Shutl uses Neo4j to delivery even faster

intro into graph databases and cypher

Volker Pacher

May 24, 2013
Tweet

Other Decks in Technology

Transcript

  1. • SaaS platform • we provide an API for carriers

    and merchants Tuesday, 28 May 13
  2. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform Tuesday, 28 May 13
  3. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: Tuesday, 28 May 13
  4. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase Tuesday, 28 May 13
  5. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice Tuesday, 28 May 13
  6. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) Tuesday, 28 May 13
  7. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min Tuesday, 28 May 13
  8. • SaaS platform • we provide an API for carriers

    and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min • SOA with services built using jRuby, sinatra, mongoDB and neo4j Tuesday, 28 May 13
  9. • exponential growth of joins in mysql with added features

    problems with our previous attempt (v1): Tuesday, 28 May 13
  10. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable problems with our previous attempt (v1): Tuesday, 28 May 13
  11. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable • api response time growing too large the more data was added problems with our previous attempt (v1): Tuesday, 28 May 13
  12. • exponential growth of joins in mysql with added features

    • code base too complex and unmaintanable • api response time growing too large the more data was added • our fastest delivery was quicker then our slowest query! problems with our previous attempt (v1): Tuesday, 28 May 13
  13. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) Tuesday, 28 May 13
  14. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) Tuesday, 28 May 13
  15. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly Tuesday, 28 May 13
  16. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less Tuesday, 28 May 13
  17. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query Tuesday, 28 May 13
  18. The case for graph databases: • relationships are explicit stored

    (RDBS lack relationships) • domain modelling is simplified because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query • traversals of relationships are easy and very fast Tuesday, 28 May 13
  19. What is a graph anyway? Node 1 Node 2 Node

    4 Node 3 a collection of vertices (nodes) connected by edges (relationships) Tuesday, 28 May 13
  20. directed graph Node 1 Node 2 Node 4 Node 3

    each relationship has a direction or one start node and one end node Tuesday, 28 May 13
  21. property graph name: Volker •nodes contain properties (key, value) •relationships

    have a type and are always directed •relationships can contain properties too name: Sam :friends name: Megan :knows since: 2005 name: Paul :friends :works_for :knows Tuesday, 28 May 13
  22. the case for Neo4j • we can run it embedded

    in the same jvm Tuesday, 28 May 13
  23. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already Tuesday, 28 May 13
  24. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) Tuesday, 28 May 13
  25. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher Tuesday, 28 May 13
  26. the case for Neo4j • we can run it embedded

    in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher • the guys from neotech are awesome Tuesday, 28 May 13
  27. neo4j (jvm) flockdb (jvm) DEX (c++) OrientDB (jvm) Sones GraphDB

    (c#) some graph dbs available: Tuesday, 28 May 13
  28. embedded vs. standalone pros: cons: better performance transaction support neo4j

    gem is available we can use cypher and traversal only the code running the db has access to the db access via rest api and cypher language independent and code doesn’t need to run on JVM not as performant only works with cypher transaction is on a per query basis need to write model wrappers for ourselves Tuesday, 28 May 13
  29. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools Tuesday, 28 May 13
  30. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs Tuesday, 28 May 13
  31. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard Tuesday, 28 May 13
  32. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory Tuesday, 28 May 13
  33. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial Tuesday, 28 May 13
  34. gotchas and other stuff to consider: • testing proved to

    be difficult and we had to write our own tools • migrations of schemaless dbs are more difficult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial • nested datastructure (hashes and array) can’t be stored and need to be converted to json Tuesday, 28 May 13
  35. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive Tuesday, 28 May 13
  36. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) Tuesday, 28 May 13
  37. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) Tuesday, 28 May 13
  38. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it Tuesday, 28 May 13
  39. Querying the graph: Cypher • declarative query language specific to

    neo4j • easy to learn and intuitive • enables the user to specify specific patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it • switch from a mySQl world is made easier by the use of cypher instead of having to learn a traversal framework straight away Tuesday, 28 May 13
  40. • START: Starting points in the graph, obtained via index

    lookups or by element IDs. • MATCH: The graph pattern to match, bound to the starting points in START. • WHERE: Filtering criteria. • RETURN: What to return. • CREATE: Creates nodes and relationships. • DELETE: Removes nodes, relationships and properties. • SET: Set values to properties. • FOREACH: Performs updating actions once per element in a list. • WITH: Divides a query into multiple, distinct parts cypher clauses Tuesday, 28 May 13
  41. an example graph Node 1 me Node 2 Steve Node

    3 Sam Node 4 David Node 5 Megan me - [:knows] -> Steve - [:knows] -> David me - [:knows] -> Sam - [:knows] -> Megan Megan - [:knows] -> David knows knows knows knows knows Tuesday, 28 May 13
  42. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens representing dates/times Tuesday, 28 May 13
  43. find all events on a specific day START root=node(0) MATCH

    root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()- [:happens]-event RETURN event Tuesday, 28 May 13
  44. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13
  45. find all events for a given range START root=node(0) MATCH

    root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start, root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end, start-[:next*0..]-middle-[:next*0..]-end, middle-[:happens]-event RETURN event Tuesday, 28 May 13
  46. root (0) Year: 2013 Month: 05 Month 01 2014 01

    05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 (20) Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13
  47. does an event happen on a certain date? START event=node(20)

    MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-() RETURN event Tuesday, 28 May 13
  48. testing and importing: • we are using rspec for all

    tests on the api and practice tdd/bdd • setting up ‘scenarios’ for an integration test was difficult and slow with existing tools • we decided to built our own dsl based on the geoff notation developed by Nigel Small to allow for the setting up of scenarios and for the import of data from mysql Tuesday, 28 May 13
  49. geoff: developed by Nigel Small (@technige, http://geoff.nigelsmall.net/) allows modelling of

    graphs in a human readable form (A) {"name": "Alice"} (B) {"name": "Bob"} (A)-[:KNOWS]->(B) and provides a java interface to insert them into an existing graph Tuesday, 28 May 13
  50. • imports any geoff file into a neo4j db •

    it is open source geoff-importer gem (https://github.com/shutl/geoff-importer) Tuesday, 28 May 13
  51. • provides a dsl for creating a graph and inserting

    it into the db • it is open source • it works together with FactoryGirl (https://github.com/thoughtbot/factory_girl) • it supports only the graph structure of the neo4j gem at the moment • we haven’t solved all the issues with event listeners yet geoff gem (https://github.com/shutl/geoff) Tuesday, 28 May 13
  52. Geoff(Company, Person) do company 'Acme' do address "13 Something Road"

    outgoing :employees do person 'Geoff' person 'Nigel' do name 'Nigel Small' end end end company 'Github' do outgoing :customers do person 'Tom' person 'Dick' person 'Harry' end end person 'Harry' do incoming :customers do company 'NeoTech' end end end geoff gem (https://github.com/shutl/ geoff) Tuesday, 28 May 13
  53. root node :company :person acme 13 somthing road NeoTech GitHub

    :all :all :all Geoff Nigel Small Tom Dick Harry :all :all :all :all :all :employees :employees :customers :customers :customers Tuesday, 28 May 13