TITAN
MARKO A. RODRIGUEZ
MATTHIAS BROECHELER
http://THINKAURELIUS.COM
THE RISE OF BIG GRAPH DATA
Slide 2
Slide 2 text
A graph is a data structure composed of vertices/dots and
edges/lines. A graph database is a software system used to
persist and process graphs. The common conception in today's
database community is that there is a tradeoff between the
scale of data and the complexity/interlinking of data. To
challenge this understanding, Aurelius has developed Titan
under the liberal Apache 2 license. Titan supports both the size
of modern data and the modeling power of graphs to usher in
the era of Big Graph Data. Novel techniques in edge
compression, data layout, and vertex-centric indices that
exploit significant orders are used to facilitate the
representation and processing of a single atomic graph
structure across a multi-machine cluster. To ensure ease of
adoption by the graph community, Titan natively implements
the TinkerPop 2 Blueprints API. This presentation will review
the graph landscape, Titan's techniques for scale by
distribution, and a collection of satellite graph technologies to
be released by Aurelius in the coming summer months of 2012.
ABSTRACT
Slide 3
Slide 3 text
Dr. Marko A. Rodriguez is the founder of the graph consulting firm Aurelius.
He has focused his academic and commercial career on the theoretical
and applied aspects of graphs. Marko is a cofounder of TinkerPop and the
primary developer of the Gremlin graph traversal language.
Dr. Matthias Broecheler has been researching and developing large-scale
graph database systems for many years in both academia and in his role
as a cofounder of the Aurelius graph consulting firm. He is the primary
developer of the distributed graph database Titan. Matthias focuses most
of his time and effort on novel OLTP and OLAP graph processing
solutions.
SPEAKER BIOGRAPHIES
Slide 4
Slide 4 text
SPONSORS
As the leading education services company, Pearson is serious about evolving how
the world learns. We apply our deep education experience and research, invest in
innovative technologies, and promote collaboration throughout the education
ecosystem. Real change is our commitment and its results are delivered through
connecting capabilities to create actionable, scalable solutions that improve access,
affordability, and achievement.
Aurelius is a team of software engineers and scientists committed to applying
graph theory and network science to problems in numerous domains. Aurelius
develops the theory and technology whereby graphs can be used to model,
understand, predict, and influence the behavior of complex, interrelated
social, economic, and physical networks.
Jive is the pioneer and world's leading provider of social business solutions. Our products
apply powerful technology that helps people connect, communicate and collaborate to get
more work done and solve their biggest business challenges. Millions of users and many
of the worldʼs most successful companies rely on Jive day in and day out to get work
done, serve their customers and stay ahead of their competitors.
Slide 5
Slide 5 text
1. ThE GRAPH LANDSCAPE
OUTLINE
2. INTRODUCTION TO TITAN
3. THE FUTURE OF AURELIUS
An introduction to graph computing.
Graph technologies on the market today.
Getting up and running with Titan.
Titan's techniques for scalability.
Satellite technologies and the OLAP story.
The graph landscape reprise.
Slide 6
Slide 6 text
PART 1:
ThE GRAPH LANDSCAPE
MARKO A. RODRIGUEZ
Slide 7
Slide 7 text
GRAPH
Slide 8
Slide 8 text
VERTEX
EDGE
GRAPH
Slide 9
Slide 9 text
VERTEX
EDGE
GRAPH
G = (V, E)
Graph Vertices Edges
Slide 10
Slide 10 text
G = (V, E)
Classic Textbook Graph Structure
Slide 11
Slide 11 text
V
A homogenous set of vertices...
Slide 12
Slide 12 text
E
...connected by a homogenous set of edges.
Slide 13
Slide 13 text
RESTRICTED MODELING
People and follows relationships...
Slide 14
Slide 14 text
RESTRICTED MODELING
People and follows relationships... ...xor webpages and citations.
Slide 15
Slide 15 text
AN INTEGRATED MODEL
IS TYPICALLY DESIRED
mentions
follows
references
references
createdBy
references
follows
Slide 16
Slide 16 text
AN INTEGRATED MODEL
IS USEFUL
mentions
follows
references
references
createdBy
references
follows
Allows for more interesting/novel algorithms.
Allows for a universal model of things and their relationships.
(beyond "textbook" graph algorithms)
(a single, unified model of a domain of interest)
Slide 17
Slide 17 text
THE PROPERTY GRAPH
Current Popular Graph Structure
G = (V, E, λ)
* Directed, attributed, edge-labeled graph
* Multi-relational graph with key/value pairs on the elements
ANALYSES YIELD
INSIGHTS ABOUT THE MODEL
=
DATA
PRODUCTS
DATA-DRIVEN
DECISION SUPPORT
Slide 41
Slide 41 text
RECOMMENDATION
People you may know.
Products you might like.
Movies you should watch and
the friends you should watch them with.
SOCIAL GRAPH
RATINGS GRAPH
SOCIAL+RATINGS
GRAPH
PATH FINDING
How is this person related to this film?
Which authors of this book also
wrote a New York Times bestseller?
Which movies are based on a book by a
New York Times bestseller?
MOVIE GRAPH
BOOK GRAPH
MOVIE+BOOK
GRAPH
Slide 60
Slide 60 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 61
Slide 61 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules
==>v[0]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 62
Slide 62 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn')
==>v[7]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 63
Slide 63 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie')
==>v[7]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 64
Slide 64 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
==>v[8]
==>v[10]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 65
Slide 65 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role')
==>v[0]
==>v[6]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 66
Slide 66 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules)
==>v[0]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 67
Slide 67 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules).back(2)
==>v[8]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 68
Slide 68 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules).back(2).out('actor')
==>v[9]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 69
Slide 69 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules).back(2).out('actor')
.as('star')
==>v[9]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie star
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 70
Slide 70 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules).back(2).out('actor')
.as('star').select
==>[movie:v[7], star:v[9]]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie star
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 71
Slide 71 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
gremlin> hercules.out('depictedIn').as('movie').out('hasActor')
.out('role').retain(hercules).back(2).out('actor')
.as('star').select{it.name}
==>[movie:hercules in new york, star:arnold schwarzenegger]
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
movie star
WHO PLAYED HERCULES
IN WHAT MOVIE?
Slide 72
Slide 72 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
Slide 73
Slide 73 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
Slide 74
Slide 74 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
depictedIn
Slide 75
Slide 75 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
Slide 76
Slide 76 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
14
albuquerque
livesIn
Slide 77
Slide 77 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
14
albuquerque
livesIn
15
25-North
santa fe
Slide 78
Slide 78 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
14
albuquerque
livesIn
15
25-North
santa fe
16
marko
rodriguez
livesIn
Slide 79
Slide 79 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
14
albuquerque
livesIn
15
25-North
santa fe
16
marko
rodriguez
livesIn
thinksHeIs
Slide 80
Slide 80 text
0
hercules
arnold
schwarzenegger
hasActor
7
hercules in
new york
depictedIn
10 8
actor
role
9
hasActor
6
role
jupiter
ernest
graves
actor
11
depictedIn
12 the arms of
hercules
fred
saberhagen
13
depictedIn
writtenBy
14
albuquerque
livesIn
15
25-North
santa fe
16
marko
rodriguez
livesIn
thinksHeIs
MOVIE GRAPH
BOOK GRAPH
TRANSPORTATION GRAPH
PROFILE
GRAPH
Slide 81
Slide 81 text
SOCIAL INFLUENCE
Who are the most influential people in
java, mathematics, art, surreal art, politics, ...?
Which region of the social graph will propagate this
advertisement this furthest?
Which 3 experts should review this submitted article?
Which people should I talk to at the upcoming
conference and what topics should
I talk to them about?
SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH
Slide 82
Slide 82 text
PATTERN IDENTIFICATION
This connectivity pattern is a sign of financial fraud.
When this motif is found, a red flag will be raised.
Healthy discourse is typified by a discussion board
with a branch factor in this range and a concept
clique score in this range.
TRANSACTION GRAPH
DISCUSSION GRAPH
Slide 83
Slide 83 text
KNOWLEDGE DISCOVERY
The terms "ice", "fans", "stanley cup,"
are classified as "sports"
Given that all identified birds fly,
it can be deduced that all birds fly.
If contrary evidence is provided,
then this "fact" can be retracted.
WIKIPEDIA GRAPH
EVIDENTIAL LOGIC GRAPH
Slide 84
Slide 84 text
WORLD MODEL
Slide 85
Slide 85 text
WORLD MODEL
WORLD PROCESSES
Slide 86
Slide 86 text
WORLD MODEL
WORLD PROCESSES
A single world model and various types of traversers
moving through that model to solve problems.
Slide 87
Slide 87 text
GRAPH
TRAVERSAL
STRUCTURE
PROCESS
COMPUTING GRAPH-BASED
COMPUTING
CLUSTER-BASED GRAPHS
Hama
http://incubator.apache.org/hama/
Giraph
http://incubator.apache.org/giraph/
GoldenOrb
http://goldenorbos.org/
Application
3
Application
2
Application
1
Bulk Synchronous Parallel Processing
* In the same spirit as Google's Pregel
Slide 92
Slide 92 text
MEMORY-bASED GRAPHS
Graph size is constrained by local machine's RAM.
Rich graph algorithm and visualization packages.
Oriented towards "textbook-style" graphs.
* Based on typical behavior
Slide 93
Slide 93 text
MEMORY-bASED GRAPHS
DISK-BASED GRAPHS
Graph size is constrained by local machine's RAM.
Rich graph algorithm and visualization packages.
Oriented towards "textbook-style" graphs.
Graph size is constrained by local disk.
Optimized for local graph algorithms.
Oriented towards property graphs.
* Based on typical behavior
Slide 94
Slide 94 text
MEMORY-bASED GRAPHS
DISK-BASED GRAPHS
CLUSTER-BASED GRAPHS
Graph size is constrained by local machine's RAM.
Rich graph algorithm and visualization packages.
Oriented towards "textbook-style" graphs.
Graph size is constrained by local disk.
Optimized for local graph algorithms.
Oriented towards property graphs.
Graph size is constrained to cluster's total RAM.
Optimized for global graph algorithms.
Oriented towards "textbook-style" graphs.
* Based on typical behavior
Slide 95
Slide 95 text
TINKERPOP
Open source graph product group
Support for various graph vendors
Provides a vendor-agnostic graph framework
* Encompassing the various graph computing styles
Simple, well-defined products
* Based on future directions
http://tinkerpop.com
Slide 96
Slide 96 text
TINKERPOP
Generic
Graph API
Dataflow
Processing
Traversal
Language
Object-Graph
Mapper
Graph
Algorithms
Graph
Server
http://tinkerpop.com
http://${project.name}.tinkerpop.com
Slide 97
Slide 97 text
TINKERPOP INTEGRATION
http://tinkerpop.com
Slide 98
Slide 98 text
AND NOW
THERE IS ANOTHER...
Slide 99
Slide 99 text
No content
Slide 100
Slide 100 text
No content
Slide 101
Slide 101 text
No content
Slide 102
Slide 102 text
No content
Slide 103
Slide 103 text
No content
Slide 104
Slide 104 text
TITAN
Slide 105
Slide 105 text
PART 2:
INTRODUCTION TO TITAN
MATTHIAS BROECHELER
Slide 106
Slide 106 text
...need to represent and process
graphs at the 100+ billion edge
scale w/ thousands of concurrent
transactions.
...desire a free, open source
distributed graph database.
...need both local graph traversals
(OLTP) and batch graph
processing (OLAP).
WhY CREATE TITAN?
A number of Aurelius' clients...
Slide 107
Slide 107 text
..."infinite size" graphs and
"unlimited" users by means of a
distributed storage engine.
...distribution via the liberal, free,
open source Apache2 license.
...real-time local traversals (OLTP)
and support for global batch
processing via Hadoop (OLAP).
TITAN's KEY FEATURES
Titan provides...
Slide 108
Slide 108 text
matthias$
Slide 109
Slide 109 text
matthias$ wget http://thinkaurelius/titan.zip
% Total % Received % Xferd Average Speed Time Time
100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01
matthias$
Slide 110
Slide 110 text
matthias$ wget http://thinkaurelius/titan.zip
% Total % Received % Xferd Average Speed Time Time
100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01
matthias$ unzip titan.zip
Archive: titan.zip
creating: titan/
...
matthias$
Slide 111
Slide 111 text
matthias$ wget http://thinkaurelius/titan.zip
% Total % Received % Xferd Average Speed Time Time
100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01
matthias$ unzip titan.zip
Archive: titan.zip
creating: titan/
...
matthias$ cd titan
titan$
Slide 112
Slide 112 text
matthias$ wget http://thinkaurelius/titan.zip
% Total % Received % Xferd Average Speed Time Time
100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01
matthias$ unzip titan.zip
Archive: titan.zip
creating: titan/
...
matthias$ cd titan
titan$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin>
Slide 113
Slide 113 text
gremlin> g = TitanFactory.open('/tmp/local-titan')
==>titangraph[local:/tmp/local-titan]
Slide 114
Slide 114 text
gremlin> g = TitanFactory.open('/tmp/local-titan')
==>titangraph[local:/tmp/local-titan]
LOCAL MACHINE MODE
THAT WAS TITAN LOCAL.
NEXT IS TITAN DISTRIBUTED.
Broecheler, M., Pugliese, A., Subrahmanian, V.S., "COSI: Cloud Oriented Subgraph Identification in Massive Social Networks,"
Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248-255, 2010.
http://www.knowledgefrominformation.com/2010/08/01/cosi-cloud-oriented-subgraph-identification-in-massive-social-networks/
Slide 121
Slide 121 text
-OR-
BACKEND AGNOSTIC
Slide 122
Slide 122 text
titan$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> conf = new BaseConfiguration();
==>org.apache.commons.configuration.BaseConfiguration@763861e6
gremlin> conf.setProperty("storage.backend","cassandra");
gremlin> conf.setProperty("storage.hostname","77.77.77.77");
gremlin> g = TitanFactory.open(conf);
==>titangraph[cassandra:77.77.77.77]
gremlin>
TITAN DISTRIBUTED
VIA CASSANDRA
* There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
Slide 123
Slide 123 text
INHERITED FEATURES
Continuously available with no single point of failure.
Cassandra available at http://cassandra.apache.org/
No write bottlenecks to the graph as there is no master/slave architecture.
Elastic scalability allows for the introduction and removal of machines.
Caching layer ensures that continuously accessed data is available in memory.
Built-in replication ensures data is available during machine failure.
Slide 124
Slide 124 text
titan$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> conf = new BaseConfiguration();
==>org.apache.commons.configuration.BaseConfiguration@763861e6
gremlin> conf.setProperty("storage.backend","hbase");
gremlin> conf.setProperty("storage.hostname","77.77.77.77");
gremlin> g = TitanFactory.open(conf);
==>titangraph[hbase:77.77.77.77]
gremlin>
TITAN DISTRIBUTED
VIA HBASE
* There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
Slide 125
Slide 125 text
INHERITED FEATURES
Linear scalability with the addition of machines.
HBase available at http://hbase.apache.org/
Strictly consistent reads and writes.
HDFS-based data replication.
Base classes for backing Hadoop MapReduce jobs with HBase tables.
Generally good integration with the tools in the Hadoop ecosystem.
Slide 126
Slide 126 text
TITAN AND THE CAP THEOREM
Consistency
Partitionability
Availability
Slide 127
Slide 127 text
Titan is all about ...
Slide 128
Slide 128 text
Titan is all about numerous concurrent users...
Slide 129
Slide 129 text
Titan is all about numerous concurrent users...
high availability....
Slide 130
Slide 130 text
Titan is all about numerous concurrent users...
high availability....
dynamic scalability...
Slide 131
Slide 131 text
EDGE COMPRESSION
VERTEX-CENTRIC INDICES
DATA MANAGEMENT
THE HOW OF TITAN
Slide 132
Slide 132 text
DATA MANAGEMENT
THE HOW OF TITAN
Slide 133
Slide 133 text
DATA MANAGEMENT
MAIN DESIGN PRINCIPLES
Optimistic Concurrency Control
Fined-Grained Locking Control
Immutable, Atomic Edges
battled
hercules cerberus
battled
hercules time:12 cerberus
battled
hercules
time:12
successful:true cerberus
1
2
3
+
+
+
+
+
-
Slide 134
Slide 134 text
DATA MANAGEMENT
hercules jupiter
father
father
mars
Functional Declarations
Datatype Constraints
TYPE DEFINITION
TitanKey timeKey =
g.makeType().name("time")
.dataType(Integer.class)
time:12
TitanLabel father =
g.makeType().name("father")
.functional()
Edge Label Signatures
TitanLabel battled =
g.makeType().name("battled")
.signature(timeKey)
battled
hercules
time:12
cerberus
time:"twelve"
Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
Slide 135
Slide 135 text
DATA MANAGEMENT
Unique Property Key/Value Pairs
TYPE DEFINITION
Endogenous Indices
g.createKeyIndex("name",Vertex.class)
name:hercules
name:hermes
name:jupiter
name:jupiter
status:king of the gods
name:neptune
status:king of the gods
TitanKey status =
g.makeType().name("status")
.unique()
Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
Slide 136
Slide 136 text
DATA MANAGEMENT
Ensures consistency over non-consistent storage backends.
LOCKING SYSTEM
hercules
neptune
father
father jupiter
hercules
write
write
father
jupiter
hercules
1. Acquire lock at the end of the transaction.
- locking mechanism depends on storage
layer consistency guarantees.
2. Verify original read.
3. Fail transaction if any precondition is violated.
Slide 137
Slide 137 text
DATA MANAGEMENT
ID MANAGEMENT
Global ID Pool Maintained by Storage Engine
[0,1,2,3,4,5,6,7,8,9,10,11]
Slide 138
Slide 138 text
DATA MANAGEMENT
ID MANAGEMENT
Pool Subsets Assigned to Individual Instances
Global ID Pool Maintained by Storage Engine
[0,1,2] [3,4,5]
[6,7,8] [9,10,11]
[0,1,2,3,4,5,6,7,8,9,10,11]
Slide 139
Slide 139 text
EDGE COMPRESSION
THE HOW OF TITAN
Slide 140
Slide 140 text
EDGE COMPRESSION
Natural graphs have a small world, community/cluster property.
Watts, D. J., Strogatz, S. H., "Collective Dynamics of 'Small-World' Networks,"
Nature 393 (6684), pp. 440–442, 1998.
Community 1 Community 2
High intra-connectivity within a community and
low inter-connectivity between communities.
VERTEX-CENTRIC INDICES
Natural, real-world graphs contain
vertices of high degree.
Even if rare, their degree ensures that
they exist on many paths.
Traversing a high degree vertex
means touching numerous incident
edges and potentially touching most
of the graph in only a few steps.
THE SUPER NODE PROBLEM
Slide 149
Slide 149 text
VERTEX-CENTRIC INDICES
A "super node" only exists from the
vantage point of classic "textbook
style" graphs.
In the world of property graphs,
intelligent disk-level filtering can
interpret a "super node" as a more
manageable low-degree vertex.
Vertex-centric querying utilizes B-Trees
and sort orders for speedy lookup of
incident edges with particular qualities.
A SUPER NODE SOLUTION
VERTEX-CENTRIC INDICES
battled
battled
battled
knows
knows
battled w/ time 1-5
knows
TitanLabel battled =
g.makeType().name("battled")
.primaryKey(time)
time:1
time:2
time:12
battled w/ time 5-10
DISK-LEVEL SORTING/INDEXING
Slide 158
Slide 158 text
VERTEX-CENTRIC INDICES
DISK-LEVEL SORTING/INDEXING
father
battled
knows
brother
mother
Slide 159
Slide 159 text
VERTEX-CENTRIC INDICES
DISK-LEVEL SORTING/INDEXING
father
battled
knows
brother
mother
Slide 160
Slide 160 text
VERTEX-CENTRIC INDICES
DISK-LEVEL SORTING/INDEXING
father
battled
knows
brother
mother
family
TypeGroup family =
TypeGroup.of(2,"family");
TitanLabel father =
g.makeType().name("father")
.group(family).makeEdgeLabel();
TitanLabel mother =
g.makeType().name("mother")
.group(family).makeEdgeLabel();
TitanLabel brother =
g.makeType().name("brother")
.group(family).makeEdgeLabel();
Slide 161
Slide 161 text
VERTEX-CENTRIC INDICES
DISK-LEVEL SORTING/INDEXING
father
battled
knows
brother
mother
family
vertex.query().group("family")...
Slide 162
Slide 162 text
EDGE COMPRESSION
VERTEX-CENTRIC INDICES
DATA MANAGEMENT
THAT IS HOW TITAN WORKS
Slide 163
Slide 163 text
WHAT IF YOU WANTED TO CREATE
TWITTER FROM SCRATCH?
SIMULATING TWITTER
Slide 164
Slide 164 text
3 BILLION EDGES
100 MILLION VERTICES
10000 CONCURRENT USERS
50 MACHINES
1 GRAPH DATABASE
COMING JULY 2012
Slide 165
Slide 165 text
PART 3:
THE FUTURE OF AURELIUS
MATTHIAS BROECHELER
MARKO A. RODRIGUEZ
Slide 166
Slide 166 text
AURELIUS' GRAPH
COMPUTING STORY
Titan as the highly scalable, distributed graph database solution.
OLTP
Slide 167
Slide 167 text
AURELIUS' GRAPH
COMPUTING STORY
Titan as the highly scalable, distributed graph database solution.
Titan as the source (and potential sink) for other graph
processing solutions.
OLTP OLAP
Slide 168
Slide 168 text
FAUNUS
GOD OF HERDS
Slide 169
Slide 169 text
FAUNUS
PATH ALGEBRA FOR HADOOP
hercules
battled battled
theseus
cretan bull
theseus
hercules
ally
Derived graphs are single-relational and are typically much smaller than
their multi-relational source. Therefore, derived graphs can be subjected to
"textbook-style" graph algorithms in both a meaningful and efficient manner.
WHO IS THE MOST CENTRAL ALLY?
A · A ◦ n(I)
Slide 170
Slide 170 text
FAUNUS
PATH ALGEBRA FOR HADOOP
ally
ally
ally
ally
ally
ally
ally
ally
ally
ally
ally
ally
ally
B · B ◦ n(I)
"My allies' allies are my allies."
B = A · A ◦ n(I)
(A · A)2 ◦ n(I)
Slide 171
Slide 171 text
FAUNUS
PATH ALGEBRA FOR HADOOP
Implements the multi-relational path algebra
as a collection of Map/Reduce operations
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to
Single-Relational Network Analysis Algorithms,” Journal of Informetrics,
4(1), pp. 29-41, 2009. http://arxiv.org/abs/0806.2274
Support for "HadoopGraph" and HDFS file formats
Project codename: TinkerPoop
Reduce a massive property graph into a smaller
semantically-rich single-relational graph.
Used for global graph operations.
Slide 172
Slide 172 text
FULGORA
GODDESS OF LIGHTNING
Slide 173
Slide 173 text
FULGORA
AN EFFICIENt IN-MEMORY
GRAPH ENGINE
Non-transactional, in-memory graph engine.
It is not a "database."
Process ~90 billion edges in 68-Gigs of RAM
assuming a small world topology.
Perform complex graph algorithms in-memory.
global graph analysis
multi-relational graph analysis
Similar in spirit to Twitter's Cassovary: https://github.com/twitter/cassovary
Slide 174
Slide 174 text
Stores a massive-scale
property graph
Generates a large-scale
single-relational graph
Analyzes compressed, large-scale
single or multi-relational
graphs in memory
Map/Reduce
Load into RAM
on a single-machine
Update element properties with algorithm results
THE AURELIUS OLAP FLOW
to a stats package
Update graph with derived edges
Slide 175
Slide 175 text
Stores a massive-scale
property graph
Generates a large-scale
single-relational graph
Analyzes compressed, large-scale
single or multi-relational
graphs in memory
Map/Reduce
Load into RAM
on a single-machine
THE AURELIUS OLAP FLOW
to a stats package
theseus
hercules
ally
hercules
ally_centrality:0.0123
Slide 176
Slide 176 text
Stores a massive-scale
property graph
Generates a large-scale
single-relational graph
Analyzes compressed, large-scale
single or multi-relational
graphs in memory
THE AURELIUS OLAP FLOW
to a stats package
Slide 177
Slide 177 text
AURELIUS' USE OF BLUEPRINTS
Aurelius products use the Blueprints API so any
graph product can communicate with any other
graph product.
The code for graph databases, frameworks,
algorithms, and batch-processing are written in terms
of the Blueprints API.
Aurelius encourages developers to use Blueprints/
TinkerPop in order to grow a rich ecosystem of
interoperable graph technologies.
Slide 178
Slide 178 text
THE GRAPH LANDSCAPE
REPRISE
Speed of Traversal/Process
Size of Graph/Structure
* Not to scale. Did not want to overlap logos.
Slide 179
Slide 179 text
NEXT STEPS
http://thinkaurelius.com
http://thinkaurelius.github.com/titan/
Learn about applying graph
theory and network science.
Make use of and/or contribute to the
free, open source Titan product.
Slide 180
Slide 180 text
THANK YOU
Slide 181
Slide 181 text
CREDITS
PRESENTERS
MARKO A. RODRIGUEZ
MATTHIAS BROCHELER
FINANCIAL SUPPORT
PEARSON EDUCATION
AURELIUS
LOCATION PROVISIONS
JIVE SOFTWARE
MANY THANKS TO
DAN LAROCQUE
TINKERPOP COMMUNITY
STEPHEN MALLETTE
BOBBY NORTON
KETRINA YIM