CloudEast: How Shutl uses Neo4j to delivery even faster

Tuesday, 28 May 13

How Neo4j helps Shutl to delivery even faster... Tuesday, 28
May 13

Volker Pacher senior developer @shutl @vpacher http://github.com/vpacher Tuesday, 28 May
13

Tuesday, 28 May 13

• SaaS platform Tuesday, 28 May 13

• SaaS platform • we provide an API for carriers
and merchants Tuesday, 28 May 13

and merchants • shutl.it C2C platform Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min Tuesday, 28 May 13

and merchants • shutl.it C2C platform • customers can chose between a delivery either: within 90 minutes of purchase or a 1 hour window of their choice (same day or any day) • fastest delivery to date 15:00 min • SOA with services built using jRuby, sinatra, mongoDB and neo4j Tuesday, 28 May 13

Tuesday, 28 May 13

Problems? Tuesday, 28 May 13

http://xkcd.com/287/ Tuesday, 28 May 13

problems with our previous attempt (v1): Tuesday, 28 May 13

• exponential growth of joins in mysql with added features
problems with our previous attempt (v1): Tuesday, 28 May 13

• code base too complex and unmaintanable problems with our previous attempt (v1): Tuesday, 28 May 13

• code base too complex and unmaintanable • api response time growing too large the more data was added problems with our previous attempt (v1): Tuesday, 28 May 13

• code base too complex and unmaintanable • api response time growing too large the more data was added • our fastest delivery was quicker then our slowest query! problems with our previous attempt (v1): Tuesday, 28 May 13

The case for graph databases: Tuesday, 28 May 13

The case for graph databases: • relationships are explicit stored
(RDBS lack relationships) Tuesday, 28 May 13

(RDBS lack relationships) • domain modelling is simpliﬁed because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) Tuesday, 28 May 13

(RDBS lack relationships) • domain modelling is simpliﬁed because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly Tuesday, 28 May 13

(RDBS lack relationships) • domain modelling is simpliﬁed because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less Tuesday, 28 May 13

(RDBS lack relationships) • domain modelling is simpliﬁed because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query Tuesday, 28 May 13

(RDBS lack relationships) • domain modelling is simpliﬁed because adding new ‘subgraphs‘ doesn’t affect the existing structure and queries (additive model) • white board friendly • schema-less • db performance remains relatively constant because queries are localized to its portion of the graph. O(1) for same query • traversals of relationships are easy and very fast Tuesday, 28 May 13

What is a graph anyway? Node 1 Node 2 Node
4 Node 3 a collection of vertices (nodes) connected by edges (relationships) Tuesday, 28 May 13

a short history Leonard Euler the seven bridges of Königsberg
(1735) Tuesday, 28 May 13

directed graph Node 1 Node 2 Node 4 Node 3
each relationship has a direction or one start node and one end node Tuesday, 28 May 13

property graph name: Volker •nodes contain properties (key, value) •relationships
have a type and are always directed •relationships can contain properties too name: Sam :friends name: Megan :knows since: 2005 name: Paul :friends :works_for :knows Tuesday, 28 May 13

a graph is its own index (constant query performance) Tuesday,
28 May 13

the case for Neo4j Tuesday, 28 May 13

the case for Neo4j • we can run it embedded
in the same jvm Tuesday, 28 May 13

in the same jvm • we can use jruby as we know ruby very well already Tuesday, 28 May 13

in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) Tuesday, 28 May 13

in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher Tuesday, 28 May 13

in the same jvm • we can use jruby as we know ruby very well already • lots of good ruby libraries are available, we chose the neo4j gem by Andreas Ronge (https://github.com/andreasronge/neo4j) • it speaks cypher • the guys from neotech are awesome Tuesday, 28 May 13

neo4j (jvm) ﬂockdb (jvm) DEX (c++) OrientDB (jvm) Sones GraphDB
(c#) some graph dbs available: Tuesday, 28 May 13

embedded vs. standalone pros: cons: better performance transaction support neo4j
gem is available we can use cypher and traversal only the code running the db has access to the db access via rest api and cypher language independent and code doesn’t need to run on JVM not as performant only works with cypher transaction is on a per query basis need to write model wrappers for ourselves Tuesday, 28 May 13

gotchas and other stuff to consider: Tuesday, 28 May 13

gotchas and other stuff to consider: • testing proved to
be difﬁcult and we had to write our own tools Tuesday, 28 May 13

be difﬁcult and we had to write our own tools • migrations of schemaless dbs are more difﬁcult to stay on top of and require special solutions in the case of graph dbs Tuesday, 28 May 13

be difﬁcult and we had to write our own tools • migrations of schemaless dbs are more difﬁcult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard Tuesday, 28 May 13

be difﬁcult and we had to write our own tools • migrations of schemaless dbs are more difﬁcult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory Tuesday, 28 May 13

be difﬁcult and we had to write our own tools • migrations of schemaless dbs are more difﬁcult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial Tuesday, 28 May 13

be difﬁcult and we had to write our own tools • migrations of schemaless dbs are more difﬁcult to stay on top of and require special solutions in the case of graph dbs • seeding an embedded database is hard • graph db partioning is almost impossible and the whole graph needs to be in memory • encoding Dates and Times that are stored in UTC and work across timezone is non-trivial • nested datastructure (hashes and array) can’t be stored and need to be converted to json Tuesday, 28 May 13

Querying the graph: Cypher Tuesday, 28 May 13

Querying the graph: Cypher • declarative query language speciﬁc to
neo4j Tuesday, 28 May 13

neo4j • easy to learn and intuitive Tuesday, 28 May 13

neo4j • easy to learn and intuitive • enables the user to specify speciﬁc patterns to query for (something that looks like ‘this’) Tuesday, 28 May 13

neo4j • easy to learn and intuitive • enables the user to specify speciﬁc patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) Tuesday, 28 May 13

neo4j • easy to learn and intuitive • enables the user to specify speciﬁc patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it Tuesday, 28 May 13

neo4j • easy to learn and intuitive • enables the user to specify speciﬁc patterns to query for (something that looks like ‘this’) • inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching) • focuses on what to query for and not how to query for it • switch from a mySQl world is made easier by the use of cypher instead of having to learn a traversal framework straight away Tuesday, 28 May 13

• START: Starting points in the graph, obtained via index
lookups or by element IDs. • MATCH: The graph pattern to match, bound to the starting points in START. • WHERE: Filtering criteria. • RETURN: What to return. • CREATE: Creates nodes and relationships. • DELETE: Removes nodes, relationships and properties. • SET: Set values to properties. • FOREACH: Performs updating actions once per element in a list. • WITH: Divides a query into multiple, distinct parts cypher clauses Tuesday, 28 May 13

an example graph Node 1 me Node 2 Steve Node
3 Sam Node 4 David Node 5 Megan me - [:knows] -> Steve - [:knows] -> David me - [:knows] -> Sam - [:knows] -> Megan Megan - [:knows] -> David knows knows knows knows knows Tuesday, 28 May 13

START me=node(1) MATCH me-[:knows]->()-[:knows]->fof RETURN fof the query Tuesday, 28
May 13

START me=node(1) MATCH me-[:knows*2..]->fof WHERE fof.name =~ 'Da.*' RETURN fof
Tuesday, 28 May 13

a good place to try it out: http://console.neo4j.org/ Tuesday, 28
May 13

root (0) Year: 2013 Month: 05 Month 01 2014 01
05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens representing dates/times Tuesday, 28 May 13

ﬁnd all events on a speciﬁc day START root=node(0) MATCH
root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()- [:happens]-event RETURN event Tuesday, 28 May 13

05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13

ﬁnd all events for a given range START root=node(0) MATCH
root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start, root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end, start-[:next*0..]-middle-[:next*0..]-end, middle-[:happens]-event RETURN event Tuesday, 28 May 13

05 2013 Year: 2014 Month: 06 06 Day: 24 Day: 25 24 25 Day: 26 26 Event 1 (20) Event 2 Event 3 happens happens happens happens next next representing dates/times Tuesday, 28 May 13

does an event happen on a certain date? START event=node(20)
MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-() RETURN event Tuesday, 28 May 13

testing and importing: • we are using rspec for all
tests on the api and practice tdd/bdd • setting up ‘scenarios’ for an integration test was difﬁcult and slow with existing tools • we decided to built our own dsl based on the geoff notation developed by Nigel Small to allow for the setting up of scenarios and for the import of data from mysql Tuesday, 28 May 13

geoff: developed by Nigel Small (@technige, http://geoff.nigelsmall.net/) allows modelling of
graphs in a human readable form (A) {"name": "Alice"} (B) {"name": "Bob"} (A)-[:KNOWS]->(B) and provides a java interface to insert them into an existing graph Tuesday, 28 May 13

• imports any geoff ﬁle into a neo4j db •
it is open source geoff-importer gem (https://github.com/shutl/geoff-importer) Tuesday, 28 May 13

• provides a dsl for creating a graph and inserting
it into the db • it is open source • it works together with FactoryGirl (https://github.com/thoughtbot/factory_girl) • it supports only the graph structure of the neo4j gem at the moment • we haven’t solved all the issues with event listeners yet geoff gem (https://github.com/shutl/geoff) Tuesday, 28 May 13

Geoff(Company, Person) do company 'Acme' do address "13 Something Road"
outgoing :employees do person 'Geoff' person 'Nigel' do name 'Nigel Small' end end end company 'Github' do outgoing :customers do person 'Tom' person 'Dick' person 'Harry' end end person 'Harry' do incoming :customers do company 'NeoTech' end end end geoff gem (https://github.com/shutl/ geoff) Tuesday, 28 May 13

root node :company :person acme 13 somthing road NeoTech GitHub
:all :all :all Geoff Nigel Small Tom Dick Harry :all :all :all :all :all :employees :employees :customers :customers :customers Tuesday, 28 May 13

QUESTIONS? Volker Pacher [email protected] www.shutl.com Tuesday, 28 May 13

CloudEast: How Shutl uses Neo4j to delivery eve...

CloudEast: How Shutl uses Neo4j to delivery even faster

Other Decks in Technology

Featured

Transcript