A presentation by @stuherbert
for @GanbaroDigital
DeCYPHERing
Graph Databases
With PHP And Neo4J
Slide 2
Slide 2 text
Industry veteran: architect, engineer,
leader, manager, mentor
F/OSS contributor since 1994
Talking and writing about PHP
since 2004
Chief Software Archaeologist
@GanbaroDigital
About Stuart
Slide 3
Slide 3 text
Follow me
I do tweet a lot about
non-tech stuff though :)
@stuherbert
@GanbaroDigital
ChelseaFan12 Fanzone
• Fixture list back to 1905
• Key events from each match
• Squad data to match
• Rich data on each squad member
• + data on Chelsea opponents
Slide 50
Slide 50 text
@GanbaroDigital
Underlying Dataset
• 420,000+ records
• 3.2 million+ foreign keys
• average of nearly 8 foreign keys per record
Slide 51
Slide 51 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Biography
Competition
Slide 52
Slide 52 text
@GanbaroDigital
We’ll use this real dataset
to learn how to write queries.
Slide 53
Slide 53 text
@GanbaroDigital
Querying Graphs
With CYPHER
Slide 54
Slide 54 text
@GanbaroDigital
CYPHER
is Neo4J’s query language
Slide 55
Slide 55 text
@GanbaroDigital
CYPHER
is awesome*
Slide 56
Slide 56 text
@GanbaroDigital
What does CYPHER
look like?
Slide 57
Slide 57 text
@GanbaroDigital
With CYPHER
your queries
look like a graph diagram.
Slide 58
Slide 58 text
@GanbaroDigital
MATCH (a)-[r]->(b)
Slide 59
Slide 59 text
@GanbaroDigital
How To Read CYPHER
• Anything in brackets is a record
(a node in graph terms)
• Anything in square brackets is a foreign key
(a relationship in graph terms)
• Relationships can have direction
Slide 60
Slide 60 text
@GanbaroDigital
MATCH (a)-[r]->(b)
Slide 61
Slide 61 text
@GanbaroDigital
How To Read CYPHER
• Anything in brackets is a record
(a node in graph terms)
• Anything in square brackets is a foreign key
(a relationship in graph terms)
• Relationships can have direction
Slide 62
Slide 62 text
@GanbaroDigital
MATCH (a)-[r]->(b)
Slide 63
Slide 63 text
@GanbaroDigital
How To Read CYPHER
• Anything in brackets is a record
(a node in graph terms)
• Anything in square brackets is a foreign key
(a relationship in graph terms)
• Relationships can have direction
Slide 64
Slide 64 text
@GanbaroDigital
MATCH (a)-[r]->(b)
Slide 65
Slide 65 text
@GanbaroDigital
How do we find records
using CYPHER?
Slide 66
Slide 66 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 67
Slide 67 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 68
Slide 68 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
{
from
Slide 69
Slide 69 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
{
to
Slide 70
Slide 70 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
{
connected by
Slide 71
Slide 71 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
{
to
{
from
{
connected by
Slide 72
Slide 72 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 73
Slide 73 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 74
Slide 74 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 75
Slide 75 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
Slide 76
Slide 76 text
@GanbaroDigital
CYPHER MATCH
• ‘a’ and ‘b’ are named results
• if you want to use it later in your query,
give it a name
• think of labels as record types / table names
Slide 77
Slide 77 text
@GanbaroDigital
“
With CYPHER,
you find records by relationships
and filter them by content
Slide 78
Slide 78 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
WHERE a.field = value
Slide 79
Slide 79 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
WHERE a.field = value
Slide 80
Slide 80 text
@GanbaroDigital
Tell CYPHER what data
to return from the query.
Slide 81
Slide 81 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
WHERE a.field = value
RETURN b
Slide 82
Slide 82 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)
WHERE a.field = value
RETURN b
Slide 83
Slide 83 text
@GanbaroDigital
MATCH can take multiple
relationships.
Slide 84
Slide 84 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-[c:label]
WHERE a.field = value AND c.field = value
RETURN b
Slide 85
Slide 85 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-[c:label]
WHERE a.field = value AND c.field = value
RETURN b
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
WHERE a.field = value AND c.field = value
WITH b, c
MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b)
RETURN c, d
Slide 88
Slide 88 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
WHERE a.field = value AND c.field = value
WITH b, c
MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b)
RETURN c, d
Slide 89
Slide 89 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
WHERE a.field = value AND c.field = value
WITH b, c
MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b)
RETURN c, d
Slide 90
Slide 90 text
@GanbaroDigital
MATCH (a:label)-[r:label]->(b:label)<-[r2:label]-(c:label)
WHERE a.field = value AND c.field = value
WITH b, c
MATCH (c)-[r3:label]->(d:label)<-[r4:label]-(b)
RETURN c, d
@GanbaroDigital
MATCH (t:team)-[:current_squad]->(p:person)
WHERE t.name =~ “(?i)chelsea”
RETURN p
ORDER BY p.sortname ASC
Slide 131
Slide 131 text
@GanbaroDigital
MATCH (t:team)-[:current_squad]->(p:person)
WHERE t.name =~ “(?i)chelsea”
RETURN p
ORDER BY p.sortname ASC
Slide 132
Slide 132 text
@GanbaroDigital
We can do all this in an RDBMS.
It’s just so much easier
to do it in a Graph database.
Slide 133
Slide 133 text
@GanbaroDigital
Graph Relationships
• No schema changes / migrations
• No extra columns for the foreign keys
• No NULLs for empty foreign keys
• Just get on and do it :)
Slide 134
Slide 134 text
@GanbaroDigital
Shipped and deployed
an average of
3 updated schemas*
per month for 12 months.
* equivalent to RDBMS
Slide 135
Slide 135 text
@GanbaroDigital
Zero schema migrations
required :-)
Slide 136
Slide 136 text
@GanbaroDigital
Query #2
Profile Cards
Slide 137
Slide 137 text
@GanbaroDigital
How do we build
each player’s profile card
for the A-Z page?
Slide 138
Slide 138 text
@GanbaroDigital
Slide 139
Slide 139 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Biography
Competition
Slide 140
Slide 140 text
@GanbaroDigital
Person
Spell
Match
Season Goal
Media
Biography
Competition
Team
Slide 141
Slide 141 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 142
Slide 142 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 143
Slide 143 text
@GanbaroDigital
Use WITH
to feed the result rows
into a second query.
Slide 144
Slide 144 text
@GanbaroDigital
The second query can reduce
the size of the final result set.
It can also add new records
to the final result set.
Slide 145
Slide 145 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Biography
Competition
Slide 146
Slide 146 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 147
Slide 147 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 148
Slide 148 text
@GanbaroDigital
Use OPTIONAL MATCH
when querying
incomplete datasets.
Slide 149
Slide 149 text
@GanbaroDigital
In this case,
my client doesn’t have
a profile picture
for every player
who has played for a given team.
Slide 150
Slide 150 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 151
Slide 151 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 152
Slide 152 text
@GanbaroDigital
Note how the name
of the relationship
tells us
what kind of image
the ‘media’ record is.
Slide 153
Slide 153 text
@GanbaroDigital
“In real life, identity often
depends on context.
Graphs make it easy
for us to model that.
Slide 154
Slide 154 text
@GanbaroDigital
For example,
UK addresses can be
many different things
all at once.
Slide 155
Slide 155 text
@GanbaroDigital
The list of possible addresses
would be the records.
The type of address
- the context -
would be the relationship name.
Slide 156
Slide 156 text
@GanbaroDigital
CYPHER returns result rows.
Use COLLECT to
put multiple records
into a single result row.
Slide 157
Slide 157 text
@GanbaroDigital
MATCH (t:team)-[:played_for]->(p:person)
WHERE t.name =~ “(?i)chelsea”
WITH DISTINCT(p), t
OPTIONAL MATCH (p)-[:profile_pic]->(m:media)-[:team]->(t)
WHERE m.category = “thumbnail”
RETURN p, COLLECT([m]) as profile_pics
ORDER BY p.sortname ASC
Slide 158
Slide 158 text
@GanbaroDigital
Query #3:
Matches In A Season
Slide 159
Slide 159 text
@GanbaroDigital
Which matches were played
in a season?
Slide 160
Slide 160 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Biography
Competition
Slide 161
Slide 161 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Biography
Competition
Slide 162
Slide 162 text
@GanbaroDigital
“Find every league match
played by Chelsea
in the 2016/2017 season”
Slide 163
Slide 163 text
@GanbaroDigital
“Find every league match
played by Chelsea
in the 2016/2017 season”
Slide 164
Slide 164 text
@GanbaroDigital
MATCH (m:match)-[:competition]->(c:competition)
WHERE c.name = “Premier League”
Slide 165
Slide 165 text
@GanbaroDigital
“Find every league match
played by Chelsea
in the 2016/2017 season”
Slide 166
Slide 166 text
@GanbaroDigital
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
Slide 167
Slide 167 text
@GanbaroDigital
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
Slide 168
Slide 168 text
@GanbaroDigital
“Find every league match
played by Chelsea
in the 2016/2017 season”
Slide 169
Slide 169 text
@GanbaroDigital
??
??
How do we find
the season?
Slide 170
Slide 170 text
@GanbaroDigital
Football seasons
start in one year
and finish in the next.
Slide 171
Slide 171 text
@GanbaroDigital
“
Design data sets
to make records
discoverable.
Slide 172
Slide 172 text
@GanbaroDigital
Data dimensions
are a classic solution
to making records discoverable.
And they are perfectly
suited to graph databases :)
Slide 173
Slide 173 text
@GanbaroDigital
https://www.kimballgroup.com
Slide 174
Slide 174 text
@GanbaroDigital
Data dimensions are
standardised data sets
that you can connect
different record types to.
Slide 175
Slide 175 text
@GanbaroDigital
For example,
the list of UK addresses
could be a data dimension.
Slide 176
Slide 176 text
@GanbaroDigital
Use relationships
into the data dimension
(e.g. “delivery address”,
“billing address”)
to provide identity.
Slide 177
Slide 177 text
@GanbaroDigital
You can use whatever datasets
you want as data dimensions.
The point is to standardise them,
and preload them in
so that you can link to them.
Slide 178
Slide 178 text
@GanbaroDigital
Different aspects of time
are a classic set
of data dimensions.
Slide 179
Slide 179 text
@GanbaroDigital
We link each ‘season’
to a ‘start year’ and
an ‘end year’
to make them easy to find.
Slide 180
Slide 180 text
@GanbaroDigital
Person
Spell Team
Match
Season Goal
Media
Year
Competition
Slide 181
Slide 181 text
@GanbaroDigital
MATCH (s:season)-[:started_on_year]->(y:year)
WHERE y.year = 2016
Slide 182
Slide 182 text
@GanbaroDigital
“Find every league match
played by Chelsea
in the 2016/2017 season”
Slide 183
Slide 183 text
@GanbaroDigital
The query could search for things
in the same order
that we’d say it out loud.
Slide 184
Slide 184 text
@GanbaroDigital
Sometimes, it’s better
to change the query order
to find the smaller result sets
first.
Slide 185
Slide 185 text
@GanbaroDigital
MATCH (s:season)-[:started_on_year]->(y:year)
WHERE y.year = 2016
WITH s
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
AND (m)-[:season]->(s)
Slide 186
Slide 186 text
@GanbaroDigital
MATCH (s:season)-[:started_on_year]->(y:year)
WHERE y.year = 2016
WITH s
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
AND (m)-[:season]->(s)
Slide 187
Slide 187 text
@GanbaroDigital
MATCH (s:season)-[:started_on_year]->(y:year)
WHERE y.year = 2016
WITH s
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
AND (m)-[:season]->(s)
Slide 188
Slide 188 text
@GanbaroDigital
MATCH (s:season)-[:started_on_year]->(y:year)
WHERE y.year = 2016
WITH s
MATCH (t:team)<-[:home_team|away_team]-(m:match)
-[:competition]->(c:competition)
WHERE c.name =~ “(?i)Premier League”
AND t.name =~ “(?i)Chelsea”
AND (m)-[:season]->(s)
Slide 189
Slide 189 text
@GanbaroDigital
In CYPHER,
we can use relationships
between records
in WHERE clauses too.
Slide 190
Slide 190 text
@GanbaroDigital
We’ve Covered ...
• Finding by relationships
• Filtering by content & relationships
• Aggregate queries
• Identity by context
• Discoverability by data dimensions
Slide 191
Slide 191 text
@GanbaroDigital
BOLTing On PHP
Slide 192
Slide 192 text
@GanbaroDigital
How do we get at
all this graph goodness
from the world’s best
back-office programming language?
@GanbaroDigital
// connect to Neo4J using the BOLT protocol
// it’s a little faster than the HTTP API
use GraphAware\Neo4j\Client\Client;
use GraphAware\Neo4j\Client\ClientBuilder;
$client = ClientBuilder::create()
->addConnection(‘default’, ‘bolt://neo4j:7687')
->build();
Slide 195
Slide 195 text
@GanbaroDigital
// build the query using a HEREDOC
// use { } as placeholders for query parameters
$query = <<(p:person)
WHERE t.name =~ {teamname}
RETURN p
ORDER BY p.sortname ASC
EOS;
Slide 196
Slide 196 text
@GanbaroDigital
// build the query parameters list
$queryParams = [
‘teamname’ => ‘(?i)chelsea’,
];
Slide 197
Slide 197 text
@GanbaroDigital
// run the query
$result = $client->run($query, $queryParams);
// the records don’t come back as an assoc array :(
$records = $result->getRecords();