The Ubiquitous Graph: Two Use Cases from the Real World

The Ubiquitous Graph: Two Use Cases from the Real World

Data Science London Meetup

76bd3a3821f3bf531c2eeb445a04cbf3?s=128

Tareq Abedrabbo

December 19, 2013
Tweet

Transcript

  1. 1.

    The Ubiquitous Graph Two Use Cases from the Real World

    Tareq Abedrabbo - Data Science London December 2013
  2. 2.

    About me • CTO at OpenCredo • Working with Neo4j

    for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)
  3. 3.

    “If I'm to believe Twitter, half of the earth's population

    are importing Wikipedia into Neo4j, for very obscure reasons.”
  4. 7.

    • Well-defined data model • Data changes through user interactions

    • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design
  5. 9.

    • Complex connected data that typically models real world networks

    • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design
  6. 12.

    • Search and pattern-matching • Find a recommendation based on

    behaviour • Graph algorithms • Shortest path, disconnected components • Optimisation • Maximise oil flow while minimising water
  7. 16.
  8. 18.
  9. 19.
  10. 21.
  11. 26.

    • Other requirements • Multiple starting points • Impact on

    quality of service • Abstraction of repeatable patterns
  12. 29.
  13. 33.

    1. Start from an initial population of candidate solutions 2.

    Assess each solution using a fitness function 3. Apply genetic operators to derive a new and potentially fitter generation 4. Rinse and repeat!
  14. 34.
  15. 36.

    • Start from an initial population of candidate solutions (individuals

    or phenotypes), ideally random and large • Attribute a score to each solution using a fitness function • The only place with specific business knowledge • Apply genetic operators to create a new generation • Cross-breeding to retain best characteristics from each parent • Mutation to maintain diversity and to avoid converging to a local optima too quickly
  16. 38.
  17. 40.
  18. 41.
  19. 42.
  20. 43.

    • There are other genetic operators • Copy n fittest

    solutions unchanged • Carry over n unfit candidates • Carry over n randomly chosen candidates
  21. 44.

    • Pros! • All domain knowledge is encapsulated in one

    place • Generate interesting solutions including counterintuitive ones • Stop when you want! • Cons! • Fitness function can become really complex • Generated solutions are not guaranteed to be practical or pretty
  22. 52.

    • Know your domain • Test non-functional aspects • Write

    code that can handle semi-structured data