Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Ubiquitous Graph: Two Use Cases from the Re...

The Ubiquitous Graph: Two Use Cases from the Real World

Data Science London Meetup

Tareq Abedrabbo

December 19, 2013
Tweet

More Decks by Tareq Abedrabbo

Other Decks in Technology

Transcript

  1. The Ubiquitous Graph Two Use Cases from the Real World

    Tareq Abedrabbo - Data Science London December 2013
  2. About me • CTO at OpenCredo • Working with Neo4j

    for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)
  3. “If I'm to believe Twitter, half of the earth's population

    are importing Wikipedia into Neo4j, for very obscure reasons.”
  4. • Well-defined data model • Data changes through user interactions

    • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design
  5. • Complex connected data that typically models real world networks

    • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design
  6. • Search and pattern-matching • Find a recommendation based on

    behaviour • Graph algorithms • Shortest path, disconnected components • Optimisation • Maximise oil flow while minimising water
  7. • Other requirements • Multiple starting points • Impact on

    quality of service • Abstraction of repeatable patterns
  8. 1. Start from an initial population of candidate solutions 2.

    Assess each solution using a fitness function 3. Apply genetic operators to derive a new and potentially fitter generation 4. Rinse and repeat!
  9. • Start from an initial population of candidate solutions (individuals

    or phenotypes), ideally random and large • Attribute a score to each solution using a fitness function • The only place with specific business knowledge • Apply genetic operators to create a new generation • Cross-breeding to retain best characteristics from each parent • Mutation to maintain diversity and to avoid converging to a local optima too quickly
  10. • There are other genetic operators • Copy n fittest

    solutions unchanged • Carry over n unfit candidates • Carry over n randomly chosen candidates
  11. • Pros! • All domain knowledge is encapsulated in one

    place • Generate interesting solutions including counterintuitive ones • Stop when you want! • Cons! • Fitness function can become really complex • Generated solutions are not guaranteed to be practical or pretty
  12. • Know your domain • Test non-functional aspects • Write

    code that can handle semi-structured data