Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Graph Processing Atelier

The Graph Processing Atelier

Slides used to introduce the graph processing atelier being made at eurucamp 2014.

Pere Urbón

August 01, 2014
Tweet

More Decks by Pere Urbón

Other Decks in Technology

Transcript

  1. $ whoami Software Engineer 10+ years Data Centric and Graph

    Analytics GraphDevRoom FOSDEM Movies and Series geek, runner, doing everything I can to enjoy my life
  2. The Graph We have nodes, edges and attributes on goths

    Adam Balduim played in Full Metal Jacket
  3. A graph in math Adjancency matrix example Graph use to

    be represented as: by the node adjancency by the edge incicdence And modeled with matrices and list. What do you think? isn’t this hard?
  4. Graph databases In computing, a graph database is a database

    that uses graph structures with nodes, edges, and properties to represent and store data. Wikipedia
  5. Graph processing A large scale graph processing framework is an

    special engine used to do graph alike computation across multiple computers.
  6. Recommenders A recommender is a system that seeks to predict

    the rating or preference of a user for a given item. Collaborative Filtering Content Based Filtering If you are similar to others users you are most likely to “like” the same items. If you like one item you are more likely to like similar items. Use cases
  7. Recommenders II Usually using matrices and “complex” algebras, this can

    be hard to understand, but we can use a graph database. Use cases
  8. Fraud detection A fraud detection system is a system used

    by banks, insurances, notaries, etc in order to detect fraudulent transactions and minimise losses. Usually fraud rings are organized as a set of fraud identities, or actions, that share one or a few real items carefully hidden. Use cases
  9. Fraud detection Traditionally Artificial Intelligence methods like neural networks, decision

    trees, classification or genetic programming has been used. Use cases
  10. Social Analytics Q: We want to know a node relevance,

    or importance, within his network. We can use this technique to know for example: ! • The relevance of a software developer curricula. • The importance of train stations within the network. • The influence of a professor within a university. • …. Use cases
  11. Social Analytics One way to solve this is using a

    centrality measure like the Betweenness centrality. ! BC: The number of times a node act as a bridge along the shortest path between two other nodes. Use cases
  12. • Embedded database • REST api • 100% ACID •

    High availability • Query language for graph • Drivers for many programing languages • Backup, Monitoring, Security, …. Graph Databases
  13. ! • Embedded database • Java, C++ core, REST, gremlin,

    Blueprints • Fully atomic && From ACID to relaxed! • Lock server distribution • Backup and replication • Graph navigation API plus a query language • Free (EULA) and Commercial license Graph Databases
  14. Apache License, Version 2.0 • Fully ACID • Eventual consistency

    • Data distribution and fault tolerance • Exchangeable storage backend (Apache Cassandra, Apache HBase, Oracle BerkeleyDB, Akiban Persistit) • Search via ElasticSearch and/or Lucene Graph Databases
  15. Graph Processing Apache License, Version 2.0 • Open source implementation

    of Google Pregel • Based on top of Apache Hadoop • Integrated with the Apache Hadoop ecosystem • Java API • Initiated by Facebook to power his Graph search, now being used by companies like Oracle.
  16. Graph Processing Apache License, Version 2.0 ! • Processing framework

    for graph algorithms created at the UZH • Java + Scala API’s • Based on a message passing alike idea between nodes • Synchronous and Asynchronous modes • Automatic convergence detection
  17. Walking the graph From now on we aim to introduce

    you to the very basic operations within a graph database, for this task we will use: Neo4jrb is a great gem created by Andreas Ronge that makes the neo4j database ruby friendly. https://github.com/andreasronge/neo4j https://github.com/purbon/neo4j/wiki
  18. Going further • Neo4j internals http://www.slideshare.net/thobe/an-overview-of-neo4j-internals ! • DEX, high

    performance graphed https://www.dama.upc.edu/technology-transfer/files/p573- martinez.pdf ! • A discussion on the design of benchmarks http://www.tpc.org/tpctc/tpctc2010/tpctc2010-03.pdf