Slide 1

Slide 1 text

The Ubiquitous Graph Use cases from the real world Tareq Abedrabbo - Big Data eXchange 2014

Slide 2

Slide 2 text

About me • CTO at OpenCredo • Working on graph applications for 4 years on a number of different projects • Co-author of Neo4j in Action (Manning)

Slide 3

Slide 3 text

This talk is about the data rather than the technology

Slide 4

Slide 4 text

Use cases: - Impact Analysis - Flow Optimisation

Slide 5

Slide 5 text

Use case 1: Network Impact Analysis

Slide 6

Slide 6 text

Domain: a telco network. Millions of connected network components, services and customers

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Requirement: Identify the impact of failing components

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Requirement: Identify interesting patterns, such as single points of failure

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Observations

Slide 14

Slide 14 text

The network is semi-structured

Slide 15

Slide 15 text

Labelled property graph is a natural fit for the model

Slide 16

Slide 16 text

Additional dimensions can be added to capture abstract concepts: network redundancy, load-balancing

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Neo4j Cypher queries are a natural solution to the problem

Slide 19

Slide 19 text

Possible evolution: - Multiple starting points - Impact on quality of service - Abstraction of repeatable patterns

Slide 20

Slide 20 text

Why should I care?

Slide 21

Slide 21 text

Variation: Supply Chain Management

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Use case 2: Oil Flow Optimisation

Slide 24

Slide 24 text

Domain: an oil extraction network. Hundreds of connected components with complex configuration options

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Requirement: Identify potential configurations to maximise flow

Slide 27

Slide 27 text

Interlude = Genetic Algorithms =

Slide 28

Slide 28 text

Search heuristic that mimics the process of natural selection - Wikipedia

Slide 29

Slide 29 text

Recipe

Slide 30

Slide 30 text

1. Start from an initial population of candidate solutions 2. Assess each solution using a fitness function 3. Apply genetic operators to derive a new and potentially fitter generation 4. Rinse and repeat!

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

More in detail...

Slide 33

Slide 33 text

1. Start from an initial population of candidate solutions (individuals or phenotypes) ideally random, diverse and large 2. Attribute a score to each solution using a fitness function (the only place with specific business knowledge) 3. Apply genetic operators to create a new generation - Cross-breeding to retain best characteristics from each parent - Mutation to maintain diversity and to avoid converging to a local optima too quickly

Slide 34

Slide 34 text

Fitness function

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Crossbreeding

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Mutation

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

There are other genetic operators - Copy n fittest solutions unchanged - Carry over n unfit candidates - Carry over n randomly chosen candidates

Slide 41

Slide 41 text

Pros: - All domain knowledge is in one place - Explore interesting solutions including counterintuitive ones - Tweak parameters to generate different solutions - Stop when you want

Slide 42

Slide 42 text

Cons: - Fitness function can become really complex and slow - Resulting solutions are not guaranteed to be practical or pretty - Solutions can get worse as the fitness function improves - There is almost always a better solution

Slide 43

Slide 43 text

Observations

Slide 44

Slide 44 text

Simply connected network with complex components

Slide 45

Slide 45 text

Is this even a valid use case for graph database? (hint: yes)

Slide 46

Slide 46 text

Persist and share calculated solutions

Slide 47

Slide 47 text

Inspect intermediary steps

Slide 48

Slide 48 text

Use Cypher queries to interrogate generated solutions

Slide 49

Slide 49 text

Possible evolution: - Identify the most practical and valuable adjustments to the network

Slide 50

Slide 50 text

Why should I care?

Slide 51

Slide 51 text

Variation: Resource Allocation Optimisation

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

To summarise...

Slide 54

Slide 54 text

Graphs are everywhere

Slide 55

Slide 55 text

...but each graph is different

Slide 56

Slide 56 text

Data-centric vs Domain-centric

Slide 57

Slide 57 text

Graph Domain-centric Data-centric Data model Well-defined Complex Data structure Flexible but predictable Potentially unpredictable Data sources Mostly the application Multiple external sources Design approach Top-down Bottom-up

Slide 58

Slide 58 text

Graphs are data-driven

Slide 59

Slide 59 text

Things to do with a graph: - Traversal and pattern matching: Cypher - Graph algorithms: shortest path, disconnected components, graph saliency, etc... - Optimisation algorithms - Graph analytics

Slide 60

Slide 60 text

Listen to the data, know the domain

Slide 61

Slide 61 text

Think in graphs

Slide 62

Slide 62 text

When designing your data model

Slide 63

Slide 63 text

But also, when testing your graph applications

Slide 64

Slide 64 text

Links OpenCredo: http://www.opencredo.com Neo4j in Action: http://www.manning.com/partner/ Twitter: @tareq_abedrabbo Personal blog: http://www.terminalstate.net Thank you! Any questions?