Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Ubiquitous Graph: Two Use Cases from the Real World

The Ubiquitous Graph: Two Use Cases from the Real World

Data Science London Meetup

Tareq Abedrabbo

December 19, 2013
Tweet

More Decks by Tareq Abedrabbo

Other Decks in Technology

Transcript

  1. The Ubiquitous Graph
    Two Use Cases from the Real World
    Tareq Abedrabbo - Data Science London
    December 2013

    View Slide

  2. About me
    • CTO at OpenCredo
    • Working with Neo4j for (almost) 3 years on a
    number of different projects
    • Co-author of Neo4j in Action (Manning)

    View Slide

  3. “If I'm to believe Twitter, half of the
    earth's population are importing
    Wikipedia into Neo4j, for very
    obscure reasons.”

    View Slide

  4. Agenda
    • Graph applications
    • Use cases
    • Best practices

    View Slide

  5. What type of applications
    can be built with a graph
    database?

    View Slide

  6. Domain-centric
    applications

    View Slide

  7. • Well-defined data model
    • Data changes through user interactions
    • Flexible but predictable data structure(s)
    • Recommendation engines, social networks, etc…
    • Top-down design

    View Slide

  8. Data-centric
    applications

    View Slide

  9. • Complex connected data that typically models real
    world networks
    • Integrated from a variety of different sources
    • Data can be unpredictable
    • Telco networks, utility networks, etc…
    • bottom-up design

    View Slide

  10. Typically applications fall
    somewhere between
    these 2 types

    View Slide

  11. How can I use the
    information available in
    my graph?

    View Slide

  12. • Search and pattern-matching
    • Find a recommendation based on behaviour
    • Graph algorithms
    • Shortest path, disconnected components
    • Optimisation
    • Maximise oil flow while minimising water

    View Slide

  13. Graphs are naturally
    data-driven

    View Slide

  14. Use case 1:
    Network Impact Analysis

    View Slide

  15. Domain: a telco network. Millions
    of connected network
    components, services and
    customers

    View Slide

  16. View Slide

  17. Requirement: Identify the
    impact of failing
    components

    View Slide

  18. View Slide

  19. View Slide

  20. Requirement: Identify
    interesting patterns, such
    as single points of failure

    View Slide

  21. View Slide

  22. The network is “semi-
    structured”

    View Slide

  23. Labelled property graph
    is a natural fit for the
    model

    View Slide

  24. Additional “dimensions” can be
    added to capture abstract concepts:
    network redundancy, load-balancing

    View Slide

  25. Cypher queries are a
    natural solution to delivering
    the different requirements

    View Slide

  26. • Other requirements
    • Multiple starting points
    • Impact on quality of service
    • Abstraction of repeatable patterns

    View Slide

  27. Use case 2:
    Oil flow optimisation

    View Slide

  28. Domain: an oil extraction network.
    Hundreds of connected
    components with complex
    configuration options

    View Slide

  29. View Slide

  30. Requirement: Identify
    candidate configurations
    to maximise flow

    View Slide

  31. Interlude: Genetic
    Algorithms

    View Slide

  32. “Search heuristic that
    mimics the process of
    natural selection” - Wikipedia

    View Slide

  33. 1. Start from an initial population of candidate
    solutions
    2. Assess each solution using a fitness function
    3. Apply genetic operators to derive a new and
    potentially fitter generation
    4. Rinse and repeat!

    View Slide

  34. View Slide

  35. More in detail…

    View Slide

  36. • Start from an initial population of candidate solutions
    (individuals or phenotypes), ideally random and large
    • Attribute a score to each solution using a fitness function
    • The only place with specific business knowledge
    • Apply genetic operators to create a new generation
    • Cross-breeding to retain best characteristics from each
    parent
    • Mutation to maintain diversity and to avoid converging
    to a local optima too quickly

    View Slide

  37. Fitness function

    View Slide

  38. View Slide

  39. Crossbreeding

    View Slide

  40. View Slide

  41. Mutation

    View Slide

  42. View Slide

  43. • There are other genetic operators
    • Copy n fittest solutions unchanged
    • Carry over n unfit candidates
    • Carry over n randomly chosen candidates

    View Slide

  44. • Pros!
    • All domain knowledge is encapsulated in one place
    • Generate interesting solutions including counterintuitive
    ones
    • Stop when you want!
    • Cons!
    • Fitness function can become really complex
    • Generated solutions are not guaranteed to be practical
    or pretty

    View Slide

  45. Simply connected graph
    with complex
    components

    View Slide

  46. Is this even a use
    case for Neo4j?

    View Slide

  47. Persist and share
    calculated solutions

    View Slide

  48. Inspect intermediary
    steps

    View Slide

  49. Use Cypher queries to
    interrogate solutions

    View Slide

  50. • Other requirements
    • Identify the most practical and valuable
    adjustments to the network

    View Slide

  51. Distilled Best
    Practices

    View Slide

  52. • Know your domain
    • Test non-functional aspects
    • Write code that can handle semi-structured data

    View Slide

  53. Links
    • Twitter: @tareq_abedrabbo
    • Blog: http://www.terminalstate.net
    • OpenCredo: http://www.opencredo.com
    Thank you! questions?

    View Slide