Titan: The Journey

1 Titan: The Journey Oct 28, 2013 Abhinav Ajgaonkar
@abh1nv [email protected] Kent English @kentenglish [email protected]

Why Titan?

3 But first of all, why a graph?

4 Social networks are a canonical example of data that
is well suited to graph representation

5 CrowdRiff is interested in the relationships established between people
through activities

6 So a graph is a natural fit but then
this happens…

7 So why a distributed graph database?

8 Why Titan specifically? ! • Active, responsive and competent
team • Choice of backend storage (more on this later) • Faunus • Fulgora (soon)

9 The Aurelius Team Dr. Marko A. Rodriguez
• TinkerPop cofounder and lead developer of the Gremlin graph traversal language and Faunus graph analytics engine ! Dr. Matthias Broecheler • award-‐winning research includes high performance index structures and query answering algorithms for graph structured data ! From http://thinkaurelius.com/team/

10 What is Faunus? • Scalable, distributed global graph
processing • Analyzes graphs using a MapReduce implementation of the Gremlin graph traversal language

11 What is Fulgora (going to be)? In-‐memory data
processing for low latency query answering of both OLTP (real-‐time) and OLAP (batch) queries. ! (Think Apache Giraph)

Titan is not a Graph Database

A Production Deployment Consists of: 1. Storage Backend 2. Titan
Library 3. A server that ties it all together

Pluggable Storage Backends Image credit: https://github.com/thinkaurelius/titan/wiki/Storage-‐Backend-‐Overview

Guess which one we picked?

New in 0.4.0 Image credits: http://www.hazelcast.com/images/logo.png https://github.com/thinkaurelius/titan/wiki/Using-‐Persistit

A Production Deployment Consists of: 1. Storage Backend 2. Titan
Library 3. A server that ties it all together

Titan Server

Embedded Mode Rexster + Titan + Cassandra running in a
single JVM Image credit: https://github.com/thinkaurelius/titan/wiki/Using-‐Cassandra

Remote Cassandra Mode Rexster + Titan with a remote Cassandra
cluster Image credit: https://github.com/thinkaurelius/titan/wiki/Using-‐Cassandra

The CrowdRiff Setup Custom Query Engine with a remote Cassandra
cluster (using the Titan Java Driver)

Gotchas

Titan releases are backwards incompatible :( (for now)

Edge Retrievals are not O(1) (by design) Translation: Don’t even
think about creating supernodes.

Indexing is tricky (choose your indexes very carefully) For example:
A secondary index (via Elastic Search or Lucene) is required for enabling range queries. ! Elastic Search requires that the field must be unique in some direction and be present in all vertices. This would not be useful for indexing a timestamp field.

Property Types are forever (even those you don’t declare explicitly)

Locking Exceptions in a highly concurrent environment (i.e. don’t forget
your retry logic)

Faunus dislikes Supernodes

Questions?

Titan: The Journey

Titan: The Journey

Abhinav Ajgaonkar

More Decks by Abhinav Ajgaonkar

Other Decks in Technology

Featured

Transcript

1 Titan: The Journey Oct 28, 2013 Abhinav Ajgaonkar

Why Titan?

3 But first of all, why a graph?

4 Social networks are a canonical example of data that

5 CrowdRiff is interested in the relationships established between people

6 So a graph is a natural fit but then

7 So why a distributed graph database?

8 Why Titan specifically? ! • Active, responsive and competent

9 The Aurelius Team Dr. Marko A. Rodriguez

10 What is Faunus? • Scalable, distributed global graph

11 What is Fulgora (going to be)? In-‐memory data

Titan is not a Graph Database

A Production Deployment Consists of: 1. Storage Backend 2. Titan

Pluggable Storage Backends Image credit: https://github.com/thinkaurelius/titan/wiki/Storage-‐Backend-‐Overview

Guess which one we picked?

New in 0.4.0 Image credits: http://www.hazelcast.com/images/logo.png https://github.com/thinkaurelius/titan/wiki/Using-‐Persistit

A Production Deployment Consists of: 1. Storage Backend 2. Titan

Titan Server

Embedded Mode Rexster + Titan + Cassandra running in a

Remote Cassandra Mode Rexster + Titan with a remote Cassandra

The CrowdRiff Setup Custom Query Engine with a remote Cassandra

Gotchas

Titan releases are backwards incompatible :( (for now)

Edge Retrievals are not O(1) (by design) Translation: Don’t even

Indexing is tricky (choose your indexes very carefully) For example:

Property Types are forever (even those you don’t declare explicitly)

Locking Exceptions in a highly concurrent environment (i.e. don’t forget

Faunus dislikes Supernodes

Questions?