$ whoami Software Engineer 10+ years Data Centric and Graph Analytics GraphDevRoom FOSDEM Movies and Series geek, runner, doing everything I can to enjoy my life
A graph in math Adjancency matrix example Graph use to be represented as: by the node adjancency by the edge incicdence And modeled with matrices and list. What do you think? isn’t this hard?
Graph databases In computing, a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. Wikipedia
Recommenders A recommender is a system that seeks to predict the rating or preference of a user for a given item. Collaborative Filtering Content Based Filtering If you are similar to others users you are most likely to “like” the same items. If you like one item you are more likely to like similar items. Use cases
Fraud detection A fraud detection system is a system used by banks, insurances, notaries, etc in order to detect fraudulent transactions and minimise losses. Usually fraud rings are organized as a set of fraud identities, or actions, that share one or a few real items carefully hidden. Use cases
Fraud detection Traditionally Artificial Intelligence methods like neural networks, decision trees, classification or genetic programming has been used. Use cases
Social Analytics Q: We want to know a node relevance, or importance, within his network. We can use this technique to know for example: ! • The relevance of a software developer curricula. • The importance of train stations within the network. • The influence of a professor within a university. • …. Use cases
Social Analytics One way to solve this is using a centrality measure like the Betweenness centrality. ! BC: The number of times a node act as a bridge along the shortest path between two other nodes. Use cases
● Embedded database ● REST api ● 100% ACID ● High availability ● Query language for graph ● Drivers for many programing languages ● Backup, Monitoring, Security, …. Graph Databases
! • Embedded database • Java, C++ core, REST, gremlin, Blueprints • Fully atomic && From ACID to relaxed! • Lock server distribution • Backup and replication • Graph navigation API plus a query language • Free (EULA) and Commercial license Graph Databases
Graph Processing Apache License, Version 2.0 ● Open source implementation of Google Pregel ● Based on top of Apache Hadoop ● Integrated with the Apache Hadoop ecosystem ● Java API ● Initiated by Facebook to power his Graph search, now being used by companies like Oracle.
Graph Processing Apache License, Version 2.0 ! • Processing framework for graph algorithms created at the UZH • Java + Scala API’s • Based on a message passing alike idea between nodes • Synchronous and Asynchronous modes • Automatic convergence detection
Walking the graph From now on we aim to introduce you to the very basic operations within a graph database, for this task we will use: Neo4jrb is a great gem created by Andreas Ronge that makes the neo4j database ruby friendly. https://github.com/andreasronge/neo4j https://github.com/purbon/neo4j/wiki
Going further ● Neo4j internals http://www.slideshare.net/thobe/an-overview-of-neo4j-internals ! ● DEX, high performance graphed https://www.dama.upc.edu/technology-transfer/files/p573- martinez.pdf ! ● A discussion on the design of benchmarks http://www.tpc.org/tpctc/tpctc2010/tpctc2010-03.pdf