Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Titan: The Journey

Titan: The Journey

Lessons learned while deploying Titan to a production environment.

A few things to keep in mind:

* Everything said here should be taken with a grain of salt

* Our overall message was: "Titan is awesome"

* This was our first foray into the world of graphs; we are not experts by any stretch of the imagination

* These slides were meant as talking points, so they may not come across as intended when taken out of context

* I've been informed that Titan can handle supernodes just fine: http://thinkaurelius.com/2012/10/25/a-solution-to-the-supernode-problem/

Cheers,
--Abhi

Abhinav Ajgaonkar

October 28, 2013
Tweet

More Decks by Abhinav Ajgaonkar

Other Decks in Technology

Transcript

  1. 4 Social networks are a canonical example of data that

    is well suited to graph representation
  2. 8 Why Titan specifically? ! • Active,  responsive  and  competent

     team   • Choice  of  backend  storage  (more  on  this  later)   • Faunus   • Fulgora  (soon)
  3. 9 The Aurelius Team Dr.  Marko  A.  Rodriguez    

    • TinkerPop  cofounder  and  lead  developer  of  the  Gremlin  graph   traversal  language  and  Faunus  graph  analytics  engine   ! Dr.  Matthias  Broecheler   • award-­‐winning  research  includes  high  performance  index  structures   and  query  answering  algorithms  for  graph  structured  data   ! From  http://thinkaurelius.com/team/
  4. 10 What is Faunus? • Scalable,  distributed  global  graph  

    processing   • Analyzes  graphs  using  a   MapReduce  implementation  of  the   Gremlin  graph  traversal  language
  5. 11 What is Fulgora (going to be)? In-­‐memory  data  

    processing  for  low  latency   query  answering  of  both   OLTP  (real-­‐time)  and  OLAP   (batch)  queries.   ! (Think  Apache  Giraph)
  6. A Production Deployment Consists of: 1. Storage Backend 2. Titan

    Library 3. A server that ties it all together
  7. A Production Deployment Consists of: 1. Storage Backend 2. Titan

    Library 3. A server that ties it all together
  8. Embedded Mode Rexster + Titan + Cassandra running in a

    single JVM Image  credit:  https://github.com/thinkaurelius/titan/wiki/Using-­‐Cassandra
  9. Remote Cassandra Mode Rexster + Titan with a remote Cassandra

    cluster Image  credit:  https://github.com/thinkaurelius/titan/wiki/Using-­‐Cassandra
  10. Indexing is tricky (choose  your  indexes  very  carefully) For  example:

        A  secondary  index  (via  Elastic  Search  or  Lucene)  is  required  for  enabling  range  queries.   ! Elastic  Search  requires  that  the  field  must  be  unique  in  some  direction  and  be  present  in  all   vertices.  This  would  not  be  useful  for  indexing  a  timestamp  field.