$30 off During Our Annual Pro Sale. View Details »

The Graph Processing Atelier

The Graph Processing Atelier

Slides used to introduce the graph processing atelier being made at eurucamp 2014.

Pere Urbón

August 01, 2014
Tweet

More Decks by Pere Urbón

Other Decks in Technology

Transcript

  1. Graph Processing Workshop
    Pere Urbón-Bayes

    View Slide

  2. We are hiring!
    Guess what…

    View Slide

  3. $ whoami
    Software Engineer 10+ years
    Data Centric and Graph Analytics
    GraphDevRoom FOSDEM
    Movies and Series geek, runner,
    doing everything I can to enjoy my
    life

    View Slide

  4. baggy-eyed man

    View Slide

  5. View Slide

  6. One Ring to rule them all

    View Slide

  7. Workshop structure
    How to power your graph
    Walking the graph
    Practise Time
    Use cases
    A
    B

    View Slide

  8. How to power your graph

    View Slide

  9. The Graph
    We have nodes, edges and attributes on goths
    Adam Balduim played in Full Metal Jacket

    View Slide

  10. A graph in math
    Adjancency matrix example
    Graph use to be represented as:
    by the node adjancency
    by the edge incicdence
    And modeled with matrices and list.
    What do you think? isn’t this hard?

    View Slide

  11. The Graph in relational databases

    View Slide

  12. Graph as JSON documents

    View Slide

  13. Using a graph database?

    View Slide

  14. Graph databases
    In computing, a graph database is a database
    that uses graph structures with nodes, edges,
    and properties to represent and store data.
    Wikipedia

    View Slide

  15. Graph processing
    A large scale graph processing framework
    is an special engine used to do graph alike
    computation across multiple computers.

    View Slide

  16. Use Cases

    View Slide

  17. Recommenders
    A recommender is a system that seeks to predict the
    rating or preference of a user for a given item.
    Collaborative Filtering Content Based Filtering
    If you are similar to others users
    you are most likely to “like” the
    same items.
    If you like one item you are more
    likely to like similar items.
    Use cases

    View Slide

  18. Recommenders II
    Usually using matrices
    and “complex” algebras,
    this can be hard to
    understand, but we can
    use a graph database.
    Use cases

    View Slide

  19. Fraud detection
    A fraud detection system is a system
    used by banks, insurances, notaries, etc
    in order to detect fraudulent transactions
    and minimise losses.
    Usually fraud rings are organized as a set
    of fraud identities, or actions, that share
    one or a few real items carefully hidden.
    Use cases

    View Slide

  20. Fraud detection
    Traditionally Artificial Intelligence methods like neural networks,
    decision trees, classification or genetic programming has been used.
    Use cases

    View Slide

  21. Social Analytics
    Q: We want to know a node relevance, or
    importance, within his network.
    We can use this technique to know for example:
    !
    • The relevance of a software developer curricula.
    • The importance of train stations within the network.
    • The influence of a professor within a university.
    • ….
    Use cases

    View Slide

  22. Social Analytics
    One way to solve this is using a centrality
    measure like the Betweenness centrality.
    !
    BC: The number of times a node act as a
    bridge along the shortest path between
    two other nodes.
    Use cases

    View Slide

  23. Graph Databases

    View Slide

  24. ● Embedded database
    ● REST api
    ● 100% ACID
    ● High availability
    ● Query language for graph
    ● Drivers for many programing languages
    ● Backup, Monitoring, Security, ….
    Graph Databases

    View Slide

  25. !
    • Embedded database
    • Java, C++ core, REST, gremlin, Blueprints
    • Fully atomic && From ACID to relaxed!
    • Lock server distribution
    • Backup and replication
    • Graph navigation API plus a query language
    • Free (EULA) and Commercial license
    Graph Databases

    View Slide

  26. Apache License, Version 2.0
    • Fully ACID
    • Eventual consistency
    • Data distribution and fault tolerance
    • Exchangeable storage backend
    (Apache Cassandra, Apache HBase, Oracle
    BerkeleyDB, Akiban Persistit)
    • Search via ElasticSearch and/or
    Lucene
    Graph Databases

    View Slide

  27. Large scale graph processing

    View Slide

  28. Graph Processing
    Apache License, Version 2.0
    ● Open source implementation of Google Pregel
    ● Based on top of Apache Hadoop
    ● Integrated with the Apache Hadoop ecosystem
    ● Java API
    ● Initiated by Facebook to power his Graph
    search, now being used by companies like
    Oracle.

    View Slide

  29. Graph Processing
    Apache License, Version 2.0
    !
    • Processing framework for graph
    algorithms created at the UZH
    • Java + Scala API’s
    • Based on a message passing alike idea
    between nodes
    • Synchronous and Asynchronous modes
    • Automatic convergence detection

    View Slide

  30. Multi model databases

    View Slide

  31. Walking the graph

    View Slide

  32. Walking the graph
    From now on we aim to introduce you to the very basic
    operations within a graph database, for this task we will use:
    Neo4jrb is a great gem created by Andreas Ronge that
    makes the neo4j database ruby friendly.
    https://github.com/andreasronge/neo4j
    https://github.com/purbon/neo4j/wiki

    View Slide

  33. No Pain, No Gain
    Live Coding

    View Slide

  34. Going further
    ● Neo4j internals
    http://www.slideshare.net/thobe/an-overview-of-neo4j-internals
    !
    ● DEX, high performance graphed
    https://www.dama.upc.edu/technology-transfer/files/p573-
    martinez.pdf
    !
    ● A discussion on the design of benchmarks
    http://www.tpc.org/tpctc/tpctc2010/tpctc2010-03.pdf

    View Slide

  35. Questions
    Thanks, by the way!

    View Slide