Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeCYPHERing Graph Databases

DeCYPHERing Graph Databases

What if you could design a database on paper, and translate that design directly into your application without having to make any changes? What if you could sit down and design that database with a product owner or business analyst, and they could understand that design without any training? That's the future that graph databases has to offer.

In this talk, Stuart will walk you through a real-life graph database that powers one of his client's sites. He'll show you how he built it in Neo4J, how to query it using CYPHER - Neo4J's query language and secret weapon - and how to hook it all up to a PHP web-app.

Presented at PHPEm on 5th April 2018.

Stuart Herbert

April 05, 2018
Tweet

More Decks by Stuart Herbert

Other Decks in Programming

Transcript

  1. Industry veteran: architect, engineer, leader, manager, mentor F/OSS contributor since

    1994 Talking and writing about PHP since 2004 Chief Software Archaeologist @GanbaroDigital About Stuart
  2. @GanbaroDigital ?? ?? What if we could use the same

    model for the business and the database?
  3. @GanbaroDigital ?? ?? What if we could use the same

    model for the business and the database?
  4. @GanbaroDigital Same Model • Same entities • ... with the

    same names • ... and the same terms for connections
  5. @GanbaroDigital I don’t mean use an ORM to pretend it’s

    the same. I mean the database itself uses the same model that the business does.
  6. @GanbaroDigital I don’t mean use an ORM to pretend it’s

    the same. I mean the database itself uses the same model that the business does.
  7. @GanbaroDigital In This Talk 1. 100 Years Of History 2.

    Querying Graphs With CYPHER 3. BOLTing On PHP
  8. @GanbaroDigital ChelseaFan12 Fanzone • Fixture list back to 1905 •

    Key events from each match • Squad data to match • Rich data on each squad member • + data on Chelsea opponents
  9. @GanbaroDigital Underlying Dataset • 420,000+ records • 3.2 million+ foreign

    keys • average of nearly 8 foreign keys per record
  10. @GanbaroDigital How To Read CYPHER • Anything in brackets is

    a record
 (a node in graph terms) • Anything in square brackets is a foreign key
 (a relationship in graph terms) • Relationships can have direction
  11. @GanbaroDigital How To Read CYPHER • Anything in brackets is

    a record
 (a node in graph terms) • Anything in square brackets is a foreign key
 (a relationship in graph terms) • Relationships can have direction
  12. @GanbaroDigital How To Read CYPHER • Anything in brackets is

    a record
 (a node in graph terms) • Anything in square brackets is a foreign key
 (a relationship in graph terms) • Relationships can have direction
  13. @GanbaroDigital CYPHER MATCH • ‘a’ and ‘b’ are named results

    • if you want to use it later in your query,
 give it a name • think of labels as record types / table names
  14. @GanbaroDigital MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
 WHERE a.field = value AND c.field =

    value WITH b, c MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b) RETURN c, d
  15. @GanbaroDigital MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
 WHERE a.field = value AND c.field =

    value WITH b, c MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b) RETURN c, d
  16. @GanbaroDigital MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
 WHERE a.field = value AND c.field =

    value WITH b, c MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b) RETURN c, d
  17. @GanbaroDigital MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
 WHERE a.field = value AND c.field =

    value WITH b, c MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b) RETURN c, d
  18. @GanbaroDigital 3 Things About This Query • Text searching •

    Duplicate rows • Pre-calculated sort order
  19. @GanbaroDigital 3 Things About This Query • Text searching •

    Duplicate rows • Pre-calculated sort order
  20. @GanbaroDigital Players may have played for the club multiple times.

    Use DISTINCT() to avoid returning duplicate rows.
  21. @GanbaroDigital 3 Things About This Query • Text searching •

    Duplicate rows • Pre-calculated sort order
  22. @GanbaroDigital We can do all this in an RDBMS. It’s

    just so much easier to do it in a Graph database.
  23. @GanbaroDigital Graph Relationships • No schema changes / migrations •

    No extra columns for the foreign keys • No NULLs for empty foreign keys • Just get on and do it :)
  24. @GanbaroDigital Shipped and deployed an average of 3 updated schemas*

    per month for 12 months. * equivalent to RDBMS
  25. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  26. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  27. @GanbaroDigital The second query can reduce the size of the

    final result set. It can also add new records to the final result set.
  28. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  29. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  30. @GanbaroDigital In this case, my client doesn’t have a profile

    picture for every player who has played for a given team.
  31. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  32. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  33. @GanbaroDigital Note how the name of the relationship tells us

    what kind of image the ‘media’ record is.
  34. @GanbaroDigital The list of possible addresses would be the records.

    The type of address - the context - would be the relationship name.
  35. @GanbaroDigital MATCH (t:team)-[:played_for]->(p:person)
 WHERE t.name =~ “(?i)chelsea”
 WITH DISTINCT(p), t

    OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
 WHERE m.category = “thumbnail” RETURN p, COLLECT([m]) as profile_pics
 ORDER BY p.sortname ASC
  36. @GanbaroDigital Data dimensions are a classic solution to making records

    discoverable. And they are perfectly suited to graph databases :)
  37. @GanbaroDigital You can use whatever datasets you want as data

    dimensions. The point is to standardise them, and preload them in so that you can link to them.
  38. @GanbaroDigital We link each ‘season’ to a ‘start year’ and

    an ‘end year’ to make them easy to find.
  39. @GanbaroDigital MATCH (s:season)-[:started_on_year]->(y:year)
 WHERE y.year = 2016
 WITH s
 MATCH

    (t:team)<-[:home_team|away_team]-(m:match) -[:competition]->(c:competition)
 WHERE c.name =~ “(?i)Premier League” AND t.name =~ “(?i)Chelsea”
 AND (m)-[:season]->(s)
  40. @GanbaroDigital MATCH (s:season)-[:started_on_year]->(y:year)
 WHERE y.year = 2016
 WITH s
 MATCH

    (t:team)<-[:home_team|away_team]-(m:match) -[:competition]->(c:competition)
 WHERE c.name =~ “(?i)Premier League” AND t.name =~ “(?i)Chelsea”
 AND (m)-[:season]->(s)
  41. @GanbaroDigital MATCH (s:season)-[:started_on_year]->(y:year)
 WHERE y.year = 2016
 WITH s
 MATCH

    (t:team)<-[:home_team|away_team]-(m:match) -[:competition]->(c:competition)
 WHERE c.name =~ “(?i)Premier League” AND t.name =~ “(?i)Chelsea”
 AND (m)-[:season]->(s)
  42. @GanbaroDigital MATCH (s:season)-[:started_on_year]->(y:year)
 WHERE y.year = 2016
 WITH s
 MATCH

    (t:team)<-[:home_team|away_team]-(m:match) -[:competition]->(c:competition)
 WHERE c.name =~ “(?i)Premier League” AND t.name =~ “(?i)Chelsea”
 AND (m)-[:season]->(s)
  43. @GanbaroDigital We’ve Covered ... • Finding by relationships • Filtering

    by content & relationships • Aggregate queries • Identity by context • Discoverability by data dimensions
  44. @GanbaroDigital How do we get at all this graph goodness

    from the world’s best back-office programming language?
  45. @GanbaroDigital // connect to Neo4J using the BOLT protocol //

    it’s a little faster than the HTTP API use GraphAware\Neo4j\Client\Client; use GraphAware\Neo4j\Client\ClientBuilder; $client = ClientBuilder::create() ->addConnection(‘default’, ‘bolt://neo4j:7687') ->build();
  46. @GanbaroDigital // build the query using a HEREDOC // use

    { } as placeholders for query parameters
 $query = <<<EOS MATCH (t:team)-[:current_squad]->(p:person)
 WHERE t.name =~ {teamname}
 RETURN p
 ORDER BY p.sortname ASC EOS;
  47. @GanbaroDigital // run the query $result = $client->run($query, $queryParams); //

    the records don’t come back as an assoc array :( $records = $result->getRecords();