Slide 1

Slide 1 text

TITAN MARKO A. RODRIGUEZ MATTHIAS BROECHELER http://THINKAURELIUS.COM THE RISE OF BIG GRAPH DATA

Slide 2

Slide 2 text

A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012. ABSTRACT

Slide 3

Slide 3 text

Dr. Marko A. Rodriguez is the founder of the graph consulting firm Aurelius. He has focused his academic and commercial career on the theoretical and applied aspects of graphs. Marko is a cofounder of TinkerPop and the primary developer of the Gremlin graph traversal language. Dr. Matthias Broecheler has been researching and developing large-scale graph database systems for many years in both academia and in his role as a cofounder of the Aurelius graph consulting firm. He is the primary developer of the distributed graph database Titan. Matthias focuses most of his time and effort on novel OLTP and OLAP graph processing solutions. SPEAKER BIOGRAPHIES

Slide 4

Slide 4 text

SPONSORS As the leading education services company, Pearson is serious about evolving how the world learns. We apply our deep education experience and research, invest in innovative technologies, and promote collaboration throughout the education ecosystem. Real change is our commitment and its results are delivered through connecting capabilities to create actionable, scalable solutions that improve access, affordability, and achievement. Aurelius is a team of software engineers and scientists committed to applying graph theory and network science to problems in numerous domains. Aurelius develops the theory and technology whereby graphs can be used to model, understand, predict, and influence the behavior of complex, interrelated social, economic, and physical networks. Jive is the pioneer and world's leading provider of social business solutions. Our products apply powerful technology that helps people connect, communicate and collaborate to get more work done and solve their biggest business challenges. Millions of users and many of the worldʼs most successful companies rely on Jive day in and day out to get work done, serve their customers and stay ahead of their competitors.

Slide 5

Slide 5 text

1. ThE GRAPH LANDSCAPE OUTLINE 2. INTRODUCTION TO TITAN 3. THE FUTURE OF AURELIUS An introduction to graph computing. Graph technologies on the market today. Getting up and running with Titan. Titan's techniques for scalability. Satellite technologies and the OLAP story. The graph landscape reprise.

Slide 6

Slide 6 text

PART 1: ThE GRAPH LANDSCAPE MARKO A. RODRIGUEZ

Slide 7

Slide 7 text

GRAPH

Slide 8

Slide 8 text

VERTEX EDGE GRAPH

Slide 9

Slide 9 text

VERTEX EDGE GRAPH G = (V, E) Graph Vertices Edges

Slide 10

Slide 10 text

G = (V, E) Classic Textbook Graph Structure

Slide 11

Slide 11 text

V A homogenous set of vertices...

Slide 12

Slide 12 text

E ...connected by a homogenous set of edges.

Slide 13

Slide 13 text

RESTRICTED MODELING People and follows relationships...

Slide 14

Slide 14 text

RESTRICTED MODELING People and follows relationships... ...xor webpages and citations.

Slide 15

Slide 15 text

AN INTEGRATED MODEL IS TYPICALLY DESIRED mentions follows references references createdBy references follows

Slide 16

Slide 16 text

AN INTEGRATED MODEL IS USEFUL mentions follows references references createdBy references follows Allows for more interesting/novel algorithms. Allows for a universal model of things and their relationships. (beyond "textbook" graph algorithms) (a single, unified model of a domain of interest)

Slide 17

Slide 17 text

THE PROPERTY GRAPH Current Popular Graph Structure G = (V, E, λ) * Directed, attributed, edge-labeled graph * Multi-relational graph with key/value pairs on the elements

Slide 18

Slide 18 text

VERTEX

Slide 19

Slide 19 text

VERTEX name:hercules PROPERTIES

Slide 20

Slide 20 text

VERTEX name:hercules PROPERTIES KEY VALUE

Slide 21

Slide 21 text

name:hercules

Slide 22

Slide 22 text

name:hercules mother name:alcmene type:human

Slide 23

Slide 23 text

name:hercules mother name:alcmene type:human EDGE LABEL

Slide 24

Slide 24 text

name:hercules mother name:alcmene type:human

Slide 25

Slide 25 text

name:hercules mother name:alcmene type:human name:jupiter type:god father

Slide 26

Slide 26 text

name:hercules mother name:alcmene type:human name:jupiter type:god father IS HERCULES A DEMIGOD? DEMIGOD = HALF HUMAN + HALF GOD

Slide 27

Slide 27 text

name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules ==>v[0]

Slide 28

Slide 28 text

name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.out('mother','father') ==>v[1] ==>v[2]

Slide 29

Slide 29 text

name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.out('mother','father').type ==>human ==>god DEMIGOD = HALF HUMAN + HALF GOD

Slide 30

Slide 30 text

name:hercules type:demigod mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.type = 'demigod' ==>demigod DEMIGOD = HALF HUMAN + HALF GOD

Slide 31

Slide 31 text

STRUCTURE PROCESS COMPUTING

Slide 32

Slide 32 text

GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING

Slide 33

Slide 33 text

GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING GRAPH-BASED COMPUTING

Slide 34

Slide 34 text

WhY GRAPH-BASED COMPUTING?

Slide 35

Slide 35 text

INTUITIVE MODELING WhY GRAPH-BASED COMPUTING?

Slide 36

Slide 36 text

INTUITIVE MODELING EXPRESSIVE QUERYING WhY GRAPH-BASED COMPUTING?

Slide 37

Slide 37 text

INTUITIVE MODELING EXPRESSIVE QUERYING NUMEROUS ANALYSES Centrality Mixing Patterns Geodesics Path Expressions Ranking Inference Motifs Scoring WhY GRAPH-BASED COMPUTING?

Slide 38

Slide 38 text

f( )ˠ ANALYSES ARE THE EPIPHENOMENA OF TRAVERSAL

Slide 39

Slide 39 text

WHAT IS THE SIGNIFICANCE OF GRAPH ANALYSIS?

Slide 40

Slide 40 text

ANALYSES YIELD INSIGHTS ABOUT THE MODEL = DATA PRODUCTS DATA-DRIVEN DECISION SUPPORT

Slide 41

Slide 41 text

RECOMMENDATION People you may know. Products you might like. Movies you should watch and the friends you should watch them with. SOCIAL GRAPH RATINGS GRAPH SOCIAL+RATINGS GRAPH

Slide 42

Slide 42 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows WHO ELSE MIGHT HERCULES KNOW?

Slide 43

Slide 43 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules ==>v[0]

Slide 44

Slide 44 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows') ==>v[1] ==>v[2] ==>v[3]

Slide 45

Slide 45 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows') ==>v[4] ==>v[5] ==>v[5] ==>v[6] ==>v[5]

Slide 46

Slide 46 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows').groupCount.cap ==>v[4]=1 ==>v[5]=3 ==>v[6]=1

Slide 47

Slide 47 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows knows HERCULES PROBABLY KNOWS NEPTUNE

Slide 48

Slide 48 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows knows HERCULES PROBABLY KNOWS NEPTUNE THIS IS A "TEXTBOOK STYLE" GRAPH

Slide 49

Slide 49 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother ...PROBABLY MORE SO WHEN OTHER TYPES OF EDGES ARE ANALYZED HERCULES PROBABLY KNOWS NEPTUNE

Slide 50

Slide 50 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother

Slide 51

Slide 51 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes

Slide 52

Slide 52 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes SOCIAL GRAPH

Slide 53

Slide 53 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes SOCIAL GRAPH

Slide 54

Slide 54 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes SOCIAL GRAPH

Slide 55

Slide 55 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 tartarus SOCIAL GRAPH

Slide 56

Slide 56 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 likes likes tartarus dislikes SOCIAL GRAPH

Slide 57

Slide 57 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 likes likes tartarus dislikes SOCIAL GRAPH RATINGS GRAPH

Slide 58

Slide 58 text

0 2 hercules 1 3 5 4 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes composedOf 8 likes likes tartarus dislikes NEMEAN MIGHT LIKE TARTARUS smellsOf SOCIAL GRAPH RATINGS GRAPH PRODUCT GRAPH * Collaborative Filtering + Content-Based Recommendation

Slide 59

Slide 59 text

PATH FINDING How is this person related to this film? Which authors of this book also wrote a New York Times bestseller? Which movies are based on a book by a New York Times bestseller? MOVIE GRAPH BOOK GRAPH MOVIE+BOOK GRAPH

Slide 60

Slide 60 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 61

Slide 61 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules ==>v[0] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 62

Slide 62 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn') ==>v[7] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 63

Slide 63 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie') ==>v[7] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 64

Slide 64 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') ==>v[8] ==>v[10] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 65

Slide 65 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role') ==>v[0] ==>v[6] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 66

Slide 66 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules) ==>v[0] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 67

Slide 67 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2) ==>v[8] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 68

Slide 68 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') ==>v[9] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 69

Slide 69 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star') ==>v[9] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 70

Slide 70 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select ==>[movie:v[7], star:v[9]] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 71

Slide 71 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select{it.name} ==>[movie:hercules in new york, star:arnold schwarzenegger] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?

Slide 72

Slide 72 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn

Slide 73

Slide 73 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn

Slide 74

Slide 74 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules depictedIn

Slide 75

Slide 75 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy

Slide 76

Slide 76 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn

Slide 77

Slide 77 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe

Slide 78

Slide 78 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn

Slide 79

Slide 79 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn thinksHeIs

Slide 80

Slide 80 text

0 hercules arnold schwarzenegger hasActor 7 hercules in new york depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn thinksHeIs MOVIE GRAPH BOOK GRAPH TRANSPORTATION GRAPH PROFILE GRAPH

Slide 81

Slide 81 text

SOCIAL INFLUENCE Who are the most influential people in java, mathematics, art, surreal art, politics, ...? Which region of the social graph will propagate this advertisement this furthest? Which 3 experts should review this submitted article? Which people should I talk to at the upcoming conference and what topics should I talk to them about? SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH

Slide 82

Slide 82 text

PATTERN IDENTIFICATION This connectivity pattern is a sign of financial fraud. When this motif is found, a red flag will be raised. Healthy discourse is typified by a discussion board with a branch factor in this range and a concept clique score in this range. TRANSACTION GRAPH DISCUSSION GRAPH

Slide 83

Slide 83 text

KNOWLEDGE DISCOVERY The terms "ice", "fans", "stanley cup," are classified as "sports" Given that all identified birds fly, it can be deduced that all birds fly. If contrary evidence is provided, then this "fact" can be retracted. WIKIPEDIA GRAPH EVIDENTIAL LOGIC GRAPH

Slide 84

Slide 84 text

WORLD MODEL

Slide 85

Slide 85 text

WORLD MODEL WORLD PROCESSES

Slide 86

Slide 86 text

WORLD MODEL WORLD PROCESSES A single world model and various types of traversers moving through that model to solve problems.

Slide 87

Slide 87 text

GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING GRAPH-BASED COMPUTING

Slide 88

Slide 88 text

GRAPH COMPUTING ENGINES

Slide 89

Slide 89 text

MEMORY-BASED GRAPHS Application iGraph http://igraph.sourceforge.net/ NetworkX http://networkx.lanl.gov/ JUNG http://jung.sourceforge.net/ Graph Framework

Slide 90

Slide 90 text

Application Application DISK-BASED GRAPHS Application Neo4j http://neo4j.org/ OrientDB http://orientdb.org Graph Database InfiniteGraph http://objectivity.com DEX http://www.sparsity-technologies.com/dex

Slide 91

Slide 91 text

CLUSTER-BASED GRAPHS Hama http://incubator.apache.org/hama/ Giraph http://incubator.apache.org/giraph/ GoldenOrb http://goldenorbos.org/ Application 3 Application 2 Application 1 Bulk Synchronous Parallel Processing * In the same spirit as Google's Pregel

Slide 92

Slide 92 text

MEMORY-bASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. * Based on typical behavior

Slide 93

Slide 93 text

MEMORY-bASED GRAPHS DISK-BASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. * Based on typical behavior

Slide 94

Slide 94 text

MEMORY-bASED GRAPHS DISK-BASED GRAPHS CLUSTER-BASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. Graph size is constrained to cluster's total RAM. Optimized for global graph algorithms. Oriented towards "textbook-style" graphs. * Based on typical behavior

Slide 95

Slide 95 text

TINKERPOP Open source graph product group Support for various graph vendors Provides a vendor-agnostic graph framework * Encompassing the various graph computing styles Simple, well-defined products * Based on future directions http://tinkerpop.com

Slide 96

Slide 96 text

TINKERPOP Generic Graph API Dataflow Processing Traversal Language Object-Graph Mapper Graph Algorithms Graph Server http://tinkerpop.com http://${project.name}.tinkerpop.com

Slide 97

Slide 97 text

TINKERPOP INTEGRATION http://tinkerpop.com

Slide 98

Slide 98 text

AND NOW THERE IS ANOTHER...

Slide 99

Slide 99 text

No content

Slide 100

Slide 100 text

No content

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

TITAN

Slide 105

Slide 105 text

PART 2: INTRODUCTION TO TITAN MATTHIAS BROECHELER

Slide 106

Slide 106 text

...need to represent and process graphs at the 100+ billion edge scale w/ thousands of concurrent transactions. ...desire a free, open source distributed graph database. ...need both local graph traversals (OLTP) and batch graph processing (OLAP). WhY CREATE TITAN? A number of Aurelius' clients...

Slide 107

Slide 107 text

..."infinite size" graphs and "unlimited" users by means of a distributed storage engine. ...distribution via the liberal, free, open source Apache2 license. ...real-time local traversals (OLTP) and support for global batch processing via Hadoop (OLAP). TITAN's KEY FEATURES Titan provides...

Slide 108

Slide 108 text

matthias$

Slide 109

Slide 109 text

matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$

Slide 110

Slide 110 text

matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$

Slide 111

Slide 111 text

matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$

Slide 112

Slide 112 text

matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin>

Slide 113

Slide 113 text

gremlin> g = TitanFactory.open('/tmp/local-titan') ==>titangraph[local:/tmp/local-titan]

Slide 114

Slide 114 text

gremlin> g = TitanFactory.open('/tmp/local-titan') ==>titangraph[local:/tmp/local-titan] LOCAL MACHINE MODE

Slide 115

Slide 115 text

gremlin> g.createKeyIndex('name',Vertex.class) ==>null gremlin> g.stopTransaction(SUCCESS) ==>null

Slide 116

Slide 116 text

gremlin> g.loadGraphML('data/graph-of-the-gods.xml') ==>null name:tartarus type:location name:pluto type:god lives brother name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12 * The Graph of the Gods is a toy dataset distributed with Titan

Slide 117

Slide 117 text

gremlin> hercules = g.V('name','hercules').next() ==>v[24] name:tartarus type:location name:pluto type:god lives brother name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12

Slide 118

Slide 118 text

gremlin> hercules.out('mother','father') ==>v[44] ==>v[16] name:tartarus type:location name:pluto type:god lives brother name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12

Slide 119

Slide 119 text

gremlin> hercules.out('mother','father').name ==>alcmene ==>jupiter name:tartarus type:location name:pluto type:god lives brother name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12

Slide 120

Slide 120 text

THAT WAS TITAN LOCAL. NEXT IS TITAN DISTRIBUTED. Broecheler, M., Pugliese, A., Subrahmanian, V.S., "COSI: Cloud Oriented Subgraph Identification in Massive Social Networks," Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248-255, 2010. http://www.knowledgefrominformation.com/2010/08/01/cosi-cloud-oriented-subgraph-identification-in-massive-social-networks/

Slide 121

Slide 121 text

-OR- BACKEND AGNOSTIC

Slide 122

Slide 122 text

titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","cassandra"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77] gremlin> TITAN DISTRIBUTED VIA CASSANDRA * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration

Slide 123

Slide 123 text

INHERITED FEATURES Continuously available with no single point of failure. Cassandra available at http://cassandra.apache.org/ No write bottlenecks to the graph as there is no master/slave architecture. Elastic scalability allows for the introduction and removal of machines. Caching layer ensures that continuously accessed data is available in memory. Built-in replication ensures data is available during machine failure.

Slide 124

Slide 124 text

titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","hbase"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[hbase:77.77.77.77] gremlin> TITAN DISTRIBUTED VIA HBASE * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration

Slide 125

Slide 125 text

INHERITED FEATURES Linear scalability with the addition of machines. HBase available at http://hbase.apache.org/ Strictly consistent reads and writes. HDFS-based data replication. Base classes for backing Hadoop MapReduce jobs with HBase tables. Generally good integration with the tools in the Hadoop ecosystem.

Slide 126

Slide 126 text

TITAN AND THE CAP THEOREM Consistency Partitionability Availability

Slide 127

Slide 127 text

Titan is all about ...

Slide 128

Slide 128 text

Titan is all about numerous concurrent users...

Slide 129

Slide 129 text

Titan is all about numerous concurrent users... high availability....

Slide 130

Slide 130 text

Titan is all about numerous concurrent users... high availability.... dynamic scalability...

Slide 131

Slide 131 text

EDGE COMPRESSION VERTEX-CENTRIC INDICES DATA MANAGEMENT THE HOW OF TITAN

Slide 132

Slide 132 text

DATA MANAGEMENT THE HOW OF TITAN

Slide 133

Slide 133 text

DATA MANAGEMENT MAIN DESIGN PRINCIPLES Optimistic Concurrency Control Fined-Grained Locking Control Immutable, Atomic Edges battled hercules cerberus battled hercules time:12 cerberus battled hercules time:12 successful:true cerberus 1 2 3 + + + + + -

Slide 134

Slide 134 text

DATA MANAGEMENT hercules jupiter father father mars Functional Declarations Datatype Constraints TYPE DEFINITION TitanKey timeKey = g.makeType().name("time") .dataType(Integer.class) time:12 TitanLabel father = g.makeType().name("father") .functional() Edge Label Signatures TitanLabel battled = g.makeType().name("battled") .signature(timeKey) battled hercules time:12 cerberus time:"twelve" Data management configurations allow Titan to optimize how information is stored/retrieved from disk.

Slide 135

Slide 135 text

DATA MANAGEMENT Unique Property Key/Value Pairs TYPE DEFINITION Endogenous Indices g.createKeyIndex("name",Vertex.class) name:hercules name:hermes name:jupiter name:jupiter status:king of the gods name:neptune status:king of the gods TitanKey status = g.makeType().name("status") .unique() Data management configurations allow Titan to optimize how information is stored/retrieved from disk.

Slide 136

Slide 136 text

DATA MANAGEMENT Ensures consistency over non-consistent storage backends. LOCKING SYSTEM hercules neptune father father jupiter hercules write write father jupiter hercules 1. Acquire lock at the end of the transaction. - locking mechanism depends on storage layer consistency guarantees. 2. Verify original read. 3. Fail transaction if any precondition is violated.

Slide 137

Slide 137 text

DATA MANAGEMENT ID MANAGEMENT Global ID Pool Maintained by Storage Engine [0,1,2,3,4,5,6,7,8,9,10,11]

Slide 138

Slide 138 text

DATA MANAGEMENT ID MANAGEMENT Pool Subsets Assigned to Individual Instances Global ID Pool Maintained by Storage Engine [0,1,2] [3,4,5] [6,7,8] [9,10,11] [0,1,2,3,4,5,6,7,8,9,10,11]

Slide 139

Slide 139 text

EDGE COMPRESSION THE HOW OF TITAN

Slide 140

Slide 140 text

EDGE COMPRESSION Natural graphs have a small world, community/cluster property. Watts, D. J., Strogatz, S. H., "Collective Dynamics of 'Small-World' Networks," Nature 393 (6684), pp. 440–442, 1998. Community 1 Community 2 High intra-connectivity within a community and low inter-connectivity between communities.

Slide 141

Slide 141 text

EDGE COMPRESSION

Slide 142

Slide 142 text

EDGE COMPRESSION 12345678 12345683 knows

Slide 143

Slide 143 text

EDGE COMPRESSION 12345678 12345683 knows

Slide 144

Slide 144 text

EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes

Slide 145

Slide 145 text

EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes 12345678 9 +5

Slide 146

Slide 146 text

EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes 12345678 9 +5 12345678 9 + 5 7 bytes

Slide 147

Slide 147 text

VERTEX-CENTRIC INDICES THE HOW OF TITAN

Slide 148

Slide 148 text

VERTEX-CENTRIC INDICES Natural, real-world graphs contain vertices of high degree. Even if rare, their degree ensures that they exist on many paths. Traversing a high degree vertex means touching numerous incident edges and potentially touching most of the graph in only a few steps. THE SUPER NODE PROBLEM

Slide 149

Slide 149 text

VERTEX-CENTRIC INDICES A "super node" only exists from the vantage point of classic "textbook style" graphs. In the world of property graphs, intelligent disk-level filtering can interpret a "super node" as a more manageable low-degree vertex. Vertex-centric querying utilizes B-Trees and sort orders for speedy lookup of incident edges with particular qualities. A SUPER NODE SOLUTION

Slide 150

Slide 150 text

VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES knows knows knows likes likes likes likes likes stars:5 stars:3 stars:3 stars:2 stars:2 vertex.query() 8 edges

Slide 151

Slide 151 text

VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES knows knows likes likes likes likes likes stars:5 stars:3 stars:3 stars:2 stars:2 vertex.query().direction(OUT) 7 edges

Slide 152

Slide 152 text

VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES likes likes likes likes likes stars:5 stars:3 stars:3 stars:2 stars:2 vertex.query().direction(OUT) .labels("likes") 5 edges

Slide 153

Slide 153 text

VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES likes stars:5 1 edge vertex.query().direction(OUT) .labels("likes").has("stars",5)

Slide 154

Slide 154 text

VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES Query Query.direction(Direction) Query Query.labels(String... labels) Query Query.has(String, Object, Compare) Query Query.has(String, Object) Query Query.range(String, Object, Object) Iterable Query.vertices() Iterable Query.edges() PREDICATES GETTERS

Slide 155

Slide 155 text

VERTEX-CENTRIC INDICES battled battled battled knows knows time:1 time:2 time:12 DISK-LEVEL SORTING/INDEXING

Slide 156

Slide 156 text

VERTEX-CENTRIC INDICES battled battled battled knows knows battled knows time:1 time:2 time:12 DISK-LEVEL SORTING/INDEXING

Slide 157

Slide 157 text

VERTEX-CENTRIC INDICES battled battled battled knows knows battled w/ time 1-5 knows TitanLabel battled = g.makeType().name("battled") .primaryKey(time) time:1 time:2 time:12 battled w/ time 5-10 DISK-LEVEL SORTING/INDEXING

Slide 158

Slide 158 text

VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother

Slide 159

Slide 159 text

VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother

Slide 160

Slide 160 text

VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother family TypeGroup family = TypeGroup.of(2,"family"); TitanLabel father = g.makeType().name("father") .group(family).makeEdgeLabel(); TitanLabel mother = g.makeType().name("mother") .group(family).makeEdgeLabel(); TitanLabel brother = g.makeType().name("brother") .group(family).makeEdgeLabel();

Slide 161

Slide 161 text

VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother family vertex.query().group("family")...

Slide 162

Slide 162 text

EDGE COMPRESSION VERTEX-CENTRIC INDICES DATA MANAGEMENT THAT IS HOW TITAN WORKS

Slide 163

Slide 163 text

WHAT IF YOU WANTED TO CREATE TWITTER FROM SCRATCH? SIMULATING TWITTER

Slide 164

Slide 164 text

3 BILLION EDGES 100 MILLION VERTICES 10000 CONCURRENT USERS 50 MACHINES 1 GRAPH DATABASE COMING JULY 2012

Slide 165

Slide 165 text

PART 3: THE FUTURE OF AURELIUS MATTHIAS BROECHELER MARKO A. RODRIGUEZ

Slide 166

Slide 166 text

AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed graph database solution. OLTP

Slide 167

Slide 167 text

AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed graph database solution. Titan as the source (and potential sink) for other graph processing solutions. OLTP OLAP

Slide 168

Slide 168 text

FAUNUS GOD OF HERDS

Slide 169

Slide 169 text

FAUNUS PATH ALGEBRA FOR HADOOP hercules battled battled theseus cretan bull theseus hercules ally Derived graphs are single-relational and are typically much smaller than their multi-relational source. Therefore, derived graphs can be subjected to "textbook-style" graph algorithms in both a meaningful and efficient manner. WHO IS THE MOST CENTRAL ALLY? A · A ◦ n(I)

Slide 170

Slide 170 text

FAUNUS PATH ALGEBRA FOR HADOOP ally ally ally ally ally ally ally ally ally ally ally ally ally B · B ◦ n(I) "My allies' allies are my allies." B = A · A ◦ n(I) (A · A)2 ◦ n(I)

Slide 171

Slide 171 text

FAUNUS PATH ALGEBRA FOR HADOOP Implements the multi-relational path algebra as a collection of Map/Reduce operations Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29-41, 2009. http://arxiv.org/abs/0806.2274 Support for "HadoopGraph" and HDFS file formats Project codename: TinkerPoop Reduce a massive property graph into a smaller semantically-rich single-relational graph. Used for global graph operations.

Slide 172

Slide 172 text

FULGORA GODDESS OF LIGHTNING

Slide 173

Slide 173 text

FULGORA AN EFFICIENt IN-MEMORY GRAPH ENGINE Non-transactional, in-memory graph engine. It is not a "database." Process ~90 billion edges in 68-Gigs of RAM assuming a small world topology. Perform complex graph algorithms in-memory. global graph analysis multi-relational graph analysis Similar in spirit to Twitter's Cassovary: https://github.com/twitter/cassovary

Slide 174

Slide 174 text

Stores a massive-scale property graph Generates a large-scale single-relational graph Analyzes compressed, large-scale single or multi-relational graphs in memory Map/Reduce Load into RAM on a single-machine Update element properties with algorithm results THE AURELIUS OLAP FLOW to a stats package Update graph with derived edges

Slide 175

Slide 175 text

Stores a massive-scale property graph Generates a large-scale single-relational graph Analyzes compressed, large-scale single or multi-relational graphs in memory Map/Reduce Load into RAM on a single-machine THE AURELIUS OLAP FLOW to a stats package theseus hercules ally hercules ally_centrality:0.0123

Slide 176

Slide 176 text

Stores a massive-scale property graph Generates a large-scale single-relational graph Analyzes compressed, large-scale single or multi-relational graphs in memory THE AURELIUS OLAP FLOW to a stats package

Slide 177

Slide 177 text

AURELIUS' USE OF BLUEPRINTS Aurelius products use the Blueprints API so any graph product can communicate with any other graph product. The code for graph databases, frameworks, algorithms, and batch-processing are written in terms of the Blueprints API. Aurelius encourages developers to use Blueprints/ TinkerPop in order to grow a rich ecosystem of interoperable graph technologies.

Slide 178

Slide 178 text

THE GRAPH LANDSCAPE REPRISE Speed of Traversal/Process Size of Graph/Structure * Not to scale. Did not want to overlap logos.

Slide 179

Slide 179 text

NEXT STEPS http://thinkaurelius.com http://thinkaurelius.github.com/titan/ Learn about applying graph theory and network science. Make use of and/or contribute to the free, open source Titan product.

Slide 180

Slide 180 text

THANK YOU

Slide 181

Slide 181 text

CREDITS PRESENTERS MARKO A. RODRIGUEZ MATTHIAS BROCHELER FINANCIAL SUPPORT PEARSON EDUCATION AURELIUS LOCATION PROVISIONS JIVE SOFTWARE MANY THANKS TO DAN LAROCQUE TINKERPOP COMMUNITY STEPHEN MALLETTE BOBBY NORTON KETRINA YIM