The Ubiquitous Graph
Use cases from the real world
Tareq Abedrabbo - Big Data eXchange 2014
Slide 2
Slide 2 text
About me
• CTO at OpenCredo
• Working on graph applications for 4 years on a
number of different projects
• Co-author of Neo4j in Action (Manning)
Slide 3
Slide 3 text
This talk is about the
data rather than the
technology
Slide 4
Slide 4 text
Use cases:
- Impact Analysis
- Flow Optimisation
Slide 5
Slide 5 text
Use case 1:
Network Impact
Analysis
Slide 6
Slide 6 text
Domain: a telco network.
Millions of connected network
components, services and
customers
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Requirement: Identify
the impact of failing
components
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
Requirement: Identify
interesting patterns, such
as single points of failure
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
Observations
Slide 14
Slide 14 text
The network is
semi-structured
Slide 15
Slide 15 text
Labelled property
graph is a natural fit
for the model
Slide 16
Slide 16 text
Additional dimensions can be added
to capture abstract concepts:
network redundancy, load-balancing
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Neo4j Cypher queries
are a natural solution
to the problem
Slide 19
Slide 19 text
Possible evolution:
- Multiple starting points
- Impact on quality of service
- Abstraction of repeatable patterns
Slide 20
Slide 20 text
Why should I care?
Slide 21
Slide 21 text
Variation:
Supply Chain
Management
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
Use case 2:
Oil Flow Optimisation
Slide 24
Slide 24 text
Domain: an oil extraction
network. Hundreds of
connected components with
complex configuration options
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
Requirement: Identify
potential configurations
to maximise flow
Slide 27
Slide 27 text
Interlude
= Genetic Algorithms =
Slide 28
Slide 28 text
Search heuristic that
mimics the process of
natural selection - Wikipedia
Slide 29
Slide 29 text
Recipe
Slide 30
Slide 30 text
1. Start from an initial population of candidate
solutions
2. Assess each solution using a fitness function
3. Apply genetic operators to derive a new and
potentially fitter generation
4. Rinse and repeat!
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
More in detail...
Slide 33
Slide 33 text
1. Start from an initial population of candidate solutions
(individuals or phenotypes)
ideally random, diverse and large
2. Attribute a score to each solution using a fitness function
(the only place with specific business knowledge)
3. Apply genetic operators to create a new generation
- Cross-breeding to retain best characteristics from each
parent
- Mutation to maintain diversity and to avoid converging
to a local optima too quickly
Slide 34
Slide 34 text
Fitness function
Slide 35
Slide 35 text
No content
Slide 36
Slide 36 text
Crossbreeding
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
Mutation
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
There are other genetic operators
- Copy n fittest solutions unchanged
- Carry over n unfit candidates
- Carry over n randomly chosen candidates
Slide 41
Slide 41 text
Pros:
- All domain knowledge is in one place
- Explore interesting solutions including
counterintuitive ones
- Tweak parameters to generate different
solutions
- Stop when you want
Slide 42
Slide 42 text
Cons:
- Fitness function can become really complex and
slow
- Resulting solutions are not guaranteed to be
practical or pretty
- Solutions can get worse as the fitness function
improves
- There is almost always a better solution
Slide 43
Slide 43 text
Observations
Slide 44
Slide 44 text
Simply connected
network with complex
components
Slide 45
Slide 45 text
Is this even a valid use
case for graph database?
(hint: yes)
Slide 46
Slide 46 text
Persist and share
calculated solutions
Slide 47
Slide 47 text
Inspect intermediary
steps
Slide 48
Slide 48 text
Use Cypher queries to
interrogate generated
solutions
Slide 49
Slide 49 text
Possible evolution:
- Identify the most practical and valuable
adjustments to the network
Slide 50
Slide 50 text
Why should I care?
Slide 51
Slide 51 text
Variation:
Resource Allocation
Optimisation
Slide 52
Slide 52 text
No content
Slide 53
Slide 53 text
To summarise...
Slide 54
Slide 54 text
Graphs
are
everywhere
Slide 55
Slide 55 text
...but each graph is
different
Slide 56
Slide 56 text
Data-centric
vs
Domain-centric
Slide 57
Slide 57 text
Graph Domain-centric Data-centric
Data model Well-defined Complex
Data structure
Flexible but
predictable
Potentially
unpredictable
Data sources
Mostly the
application
Multiple external
sources
Design approach Top-down Bottom-up
Slide 58
Slide 58 text
Graphs are data-driven
Slide 59
Slide 59 text
Things to do with a graph:
- Traversal and pattern matching: Cypher
- Graph algorithms: shortest path, disconnected
components, graph saliency, etc...
- Optimisation algorithms
- Graph analytics
Slide 60
Slide 60 text
Listen to the data,
know the domain
Slide 61
Slide 61 text
Think in
graphs
Slide 62
Slide 62 text
When designing your
data model
Slide 63
Slide 63 text
But also, when testing
your graph
applications
Slide 64
Slide 64 text
Links
OpenCredo: http://www.opencredo.com
Neo4j in Action: http://www.manning.com/partner/
Twitter: @tareq_abedrabbo
Personal blog: http://www.terminalstate.net
Thank you! Any questions?