Using Graph Databases to Operationalize Insights from Big Data

Using Graph Databases to Operationalize Insights from Big Data Emil
Eifrem – CEO @ Neo Technology Tim Williamson – Data Scientist @ Monsanto

Why are we here Today? 1.What is a Graph? 2.Graphs
in Real-Time 3.Graphs are Feeding the World

@TimWilliate Data Management in 1980 Paper Forms Tiny RAM Spinning
Platters (Low Capacity / Sequential IO)

Traditional DBMS Technology

Data Management in 2016 Dynamic Real-World Systems SSD/Flash (High-Capacity Storage
& Ultra-Fast Random I/O) Abundant RAM

A Way of Representing Data DATA DATA

A Way of Representing Data Relational Database Good for: •
Well-understood data structures that don’t change too frequently • Known problems involving discrete parts of the data, or minimal connectivity DATA 1980s

A Way of Representing Data Graph Database Relational Database Good
for: • Dynamic systems: where the data topology is difficult to predict • Dynamic requirements: that evolve with the business • Problems where the relationships in data contribute meaning & value Good for: • Well-understood data structures that don’t change too frequently • Known problems involving discrete parts of the data, or minimal connectivity 1980s 2016

KNOWS NAME: ANN AGE: 32 NODE PROPERTIES RELATIONSHIP A Graph
Is

A Graph Is

Describing Graphs Business Domain Ann Dan Loves Graph Data Model
(Dan) (Ann) -[:LOVES]-> Cypher Query

Cypher Example HR Query in SQL The Same Query using
Cypher MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Project Impact Less time writing queries • More time understanding the answers • Leaving time to ask the next question Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting

Users Love Cypher

openCypher

Low Latency Query Performance “We found Neo4j to be literally
thousands of times faster than our prior MySQL solution, with queries that require 10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.” - Volker Pacher, Senior Developer “Minutes to milliseconds” performance Queries up to 1000x faster than RDBMS or other NoSQL

Fastest Growing Category in Big Data Sep 2015 May 2015
Jan 2015 Sep 2014 May 2014 Jan 2014 Sep 2013 May 2013 100 Popularity Changes 500 600 700 200 300 400 Jan 2013 © DB-Engines.com 2015 • Wide column stores • RDF stores • Document stores • Search engines • Native XML DBMS • Key-value stores • Object oriented DBMS • Multivalue DBMS • Times Series DBMS Relational database Graph Database

Popular Graph Database Use Cases Real-Time Recommendations Fraud Detection Network
& IT Operations Master Data Management Graph-Based Search Identity & Access Management

What is Real-Time? @TimWilliate

Real-Time When Emil Was in School “A system is said
to be real-time if the total correctness of an operation depends not only upon its logical correctness, but also upon the time limit in which it is performed.” Shin, K.G.; Ramanathan, P. (Jan 1994)."Real-time computing: a new discipline of computer science and engineering”. Proceedings of the IEEE.

Real-Time In Web 2.0 “My focus will be companies exploiting
‘real-time data,’ which is ‘the next billion dollar market opportunity.’” Interview in TechCrunch, 2009 Ron Conway, angel investor godfather of silicon valley

Real-Time Emil and Tim’s Definition of Real-Time Data

Graphs Are Feeding the World @TimWilliate

Improving Genetics has Scaled Agricultural Output for Millennia @TimWilliate

Modern Breeding Techniques Accelerated this Gain Source: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains-yearbook-tables.aspx @TimWilliate

Selecting Better Plants via Field Trial @TimWilliate

Rapid Breeding Improvement Derives from Cycling @TimWilliate

The Operational Uses for Ancestry are Numerous § Which crosses
are predicted to be the most effective? § Where in the pipeline are the descendants of a cross? § Are the results of high-throughput genotyping correct? § What is the frequency of commercial success? § Etc… @TimWilliate Questions like these are asked from applications across the pipeline, all serving scientists expecting to make rapid decisions

Operationalizing Ancestry Requires Low-Latency Reads A population at the “advancing”
horizon of the pipeline can easily have an ancestry > 50 levels deep @TimWilliate

Low Latency Reads + Fresh Data = Real-Time Data @TimWilliate

Accessing Genetic Ancestry in a RESTful Style @TimWilliate § Domain-centric
API § ~ 40 API resources § ~ 20 query grammar elements

API § ~ 40 API resources § ~ 20 query grammar elements {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4}, {“id”: 5} ], “relationships”: [ {“from”: 1, “to”: 3, “parental_role”: “female”}, {“from”: 2, “to”: 3, “parental_role”: “male”}, {“from”: 3, “to”: 4, “parental_role”: “female”}, {“from”: 4, “to”: 5, “parental_role”: “female”} ]} /population/5/ancestors

API § ~ 40 API resources § ~ 20 query grammar elements {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4}, {“id”: 5} ], “relationships”: [ {“from”: 1, “to”: 3, “parental_role”: “female”}, {“from”: 2, “to”: 3, “parental_role”: “male”}, {“from”: 3, “to”: 4, “parental_role”: “female”}, {“from”: 4, “to”: 5, “parental_role”: “female”} ]} { “female”: {“id”: 1}, “male”: {“id”: 2} } /population/5/ancestors /population/5/binary-cross

An Ops View of Ancestry-as-a-Service § 2 years continuous production
operation § > 200 application and data scientist users § Store Size - ~ 800 million nodes - ~ 1.3 billion relationships - ~ 1.8 billion properties Continuous and peaky mixed read/write load @TimWilliate

The Ultimate Value of Ancestry is Realized in the Biological
Information it Allows to be Linked @TimWilliate

Corn Parent Galaxy The complete genetic history of every corn
parent at Monsanto

Selecting Better Plants via Genome Wide Selection @TimWilliate

Thank You!

Using Graph Databases to Operationalize Insight...

Using Graph Databases to Operationalize Insights from Big Data

More Decks by Tim Williamson

Other Decks in Programming

Featured

Transcript