Titan: The Rise of Big Graph Data

Titan: The Rise of Big Graph Data

A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.

Fb12ea6a621399613aae4d692533e067?s=128

Marko Rodriguez

June 14, 2012
Tweet

Transcript

  1. TITAN MARKO A. RODRIGUEZ MATTHIAS BROECHELER http://THINKAURELIUS.COM THE RISE OF

    BIG GRAPH DATA
  2. A graph is a data structure composed of vertices/dots and

    edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012. ABSTRACT
  3. Dr. Marko A. Rodriguez is the founder of the graph

    consulting firm Aurelius. He has focused his academic and commercial career on the theoretical and applied aspects of graphs. Marko is a cofounder of TinkerPop and the primary developer of the Gremlin graph traversal language. Dr. Matthias Broecheler has been researching and developing large-scale graph database systems for many years in both academia and in his role as a cofounder of the Aurelius graph consulting firm. He is the primary developer of the distributed graph database Titan. Matthias focuses most of his time and effort on novel OLTP and OLAP graph processing solutions. SPEAKER BIOGRAPHIES
  4. SPONSORS As the leading education services company, Pearson is serious

    about evolving how the world learns. We apply our deep education experience and research, invest in innovative technologies, and promote collaboration throughout the education ecosystem. Real change is our commitment and its results are delivered through connecting capabilities to create actionable, scalable solutions that improve access, affordability, and achievement. Aurelius is a team of software engineers and scientists committed to applying graph theory and network science to problems in numerous domains. Aurelius develops the theory and technology whereby graphs can be used to model, understand, predict, and influence the behavior of complex, interrelated social, economic, and physical networks. Jive is the pioneer and world's leading provider of social business solutions. Our products apply powerful technology that helps people connect, communicate and collaborate to get more work done and solve their biggest business challenges. Millions of users and many of the worldʼs most successful companies rely on Jive day in and day out to get work done, serve their customers and stay ahead of their competitors.
  5. 1. ThE GRAPH LANDSCAPE OUTLINE 2. INTRODUCTION TO TITAN 3.

    THE FUTURE OF AURELIUS An introduction to graph computing. Graph technologies on the market today. Getting up and running with Titan. Titan's techniques for scalability. Satellite technologies and the OLAP story. The graph landscape reprise.
  6. PART 1: ThE GRAPH LANDSCAPE MARKO A. RODRIGUEZ

  7. GRAPH

  8. VERTEX EDGE GRAPH

  9. VERTEX EDGE GRAPH G = (V, E) Graph Vertices Edges

  10. G = (V, E) Classic Textbook Graph Structure

  11. V A homogenous set of vertices...

  12. E ...connected by a homogenous set of edges.

  13. RESTRICTED MODELING People and follows relationships...

  14. RESTRICTED MODELING People and follows relationships... ...xor webpages and citations.

  15. AN INTEGRATED MODEL IS TYPICALLY DESIRED mentions follows references references

    createdBy references follows
  16. AN INTEGRATED MODEL IS USEFUL mentions follows references references createdBy

    references follows Allows for more interesting/novel algorithms. Allows for a universal model of things and their relationships. (beyond "textbook" graph algorithms) (a single, unified model of a domain of interest)
  17. THE PROPERTY GRAPH Current Popular Graph Structure G = (V,

    E, λ) * Directed, attributed, edge-labeled graph * Multi-relational graph with key/value pairs on the elements
  18. VERTEX

  19. VERTEX name:hercules PROPERTIES

  20. VERTEX name:hercules PROPERTIES KEY VALUE

  21. name:hercules

  22. name:hercules mother name:alcmene type:human

  23. name:hercules mother name:alcmene type:human EDGE LABEL

  24. name:hercules mother name:alcmene type:human

  25. name:hercules mother name:alcmene type:human name:jupiter type:god father

  26. name:hercules mother name:alcmene type:human name:jupiter type:god father IS HERCULES A

    DEMIGOD? DEMIGOD = HALF HUMAN + HALF GOD
  27. name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules ==>v[0]

  28. name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.out('mother','father') ==>v[1]

    ==>v[2]
  29. name:hercules mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.out('mother','father').type ==>human

    ==>god DEMIGOD = HALF HUMAN + HALF GOD
  30. name:hercules type:demigod mother name:alcmene type:human name:jupiter type:god father gremlin> hercules.type

    = 'demigod' ==>demigod DEMIGOD = HALF HUMAN + HALF GOD
  31. STRUCTURE PROCESS COMPUTING

  32. GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING

  33. GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING GRAPH-BASED COMPUTING

  34. WhY GRAPH-BASED COMPUTING?

  35. INTUITIVE MODELING WhY GRAPH-BASED COMPUTING?

  36. INTUITIVE MODELING EXPRESSIVE QUERYING WhY GRAPH-BASED COMPUTING?

  37. INTUITIVE MODELING EXPRESSIVE QUERYING NUMEROUS ANALYSES Centrality Mixing Patterns Geodesics

    Path Expressions Ranking Inference Motifs Scoring WhY GRAPH-BASED COMPUTING?
  38. f( )ˠ ANALYSES ARE THE EPIPHENOMENA OF TRAVERSAL

  39. WHAT IS THE SIGNIFICANCE OF GRAPH ANALYSIS?

  40. ANALYSES YIELD INSIGHTS ABOUT THE MODEL = DATA PRODUCTS DATA-DRIVEN

    DECISION SUPPORT
  41. RECOMMENDATION People you may know. Products you might like. Movies

    you should watch and the friends you should watch them with. SOCIAL GRAPH RATINGS GRAPH SOCIAL+RATINGS GRAPH
  42. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows WHO ELSE MIGHT HERCULES KNOW?
  43. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules ==>v[0]
  44. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows') ==>v[1] ==>v[2] ==>v[3]
  45. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows') ==>v[4] ==>v[5] ==>v[5] ==>v[6] ==>v[5]
  46. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows').groupCount.cap ==>v[4]=1 ==>v[5]=3 ==>v[6]=1
  47. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows knows HERCULES PROBABLY KNOWS NEPTUNE
  48. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows knows HERCULES PROBABLY KNOWS NEPTUNE THIS IS A "TEXTBOOK STYLE" GRAPH
  49. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother ...PROBABLY MORE SO WHEN OTHER TYPES OF EDGES ARE ANALYZED HERCULES PROBABLY KNOWS NEPTUNE
  50. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother
  51. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes
  52. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes SOCIAL GRAPH
  53. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes SOCIAL GRAPH
  54. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes SOCIAL GRAPH
  55. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 tartarus SOCIAL GRAPH
  56. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 likes likes tartarus dislikes SOCIAL GRAPH
  57. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes 8 likes likes tartarus dislikes SOCIAL GRAPH RATINGS GRAPH
  58. 0 2 hercules 1 3 5 4 6 cerberus nemean

    hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 human flesh likes likes likes composedOf 8 likes likes tartarus dislikes NEMEAN MIGHT LIKE TARTARUS smellsOf SOCIAL GRAPH RATINGS GRAPH PRODUCT GRAPH * Collaborative Filtering + Content-Based Recommendation
  59. PATH FINDING How is this person related to this film?

    Which authors of this book also wrote a New York Times bestseller? Which movies are based on a book by a New York Times bestseller? MOVIE GRAPH BOOK GRAPH MOVIE+BOOK GRAPH
  60. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?
  61. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules ==>v[0] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?
  62. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn') ==>v[7] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn WHO PLAYED HERCULES IN WHAT MOVIE?
  63. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie') ==>v[7] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  64. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') ==>v[8] ==>v[10] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  65. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role') ==>v[0] ==>v[6] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  66. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules) ==>v[0] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  67. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2) ==>v[8] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  68. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') ==>v[9] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie WHO PLAYED HERCULES IN WHAT MOVIE?
  69. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star') ==>v[9] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?
  70. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select ==>[movie:v[7], star:v[9]] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?
  71. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select{it.name} ==>[movie:hercules in new york, star:arnold schwarzenegger] 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn movie star WHO PLAYED HERCULES IN WHAT MOVIE?
  72. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn
  73. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn
  74. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules depictedIn
  75. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy
  76. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn
  77. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe
  78. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn
  79. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn thinksHeIs
  80. 0 hercules arnold schwarzenegger hasActor 7 hercules in new york

    depictedIn 10 8 actor role 9 hasActor 6 role jupiter ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 albuquerque livesIn 15 25-North santa fe 16 marko rodriguez livesIn thinksHeIs MOVIE GRAPH BOOK GRAPH TRANSPORTATION GRAPH PROFILE GRAPH
  81. SOCIAL INFLUENCE Who are the most influential people in java,

    mathematics, art, surreal art, politics, ...? Which region of the social graph will propagate this advertisement this furthest? Which 3 experts should review this submitted article? Which people should I talk to at the upcoming conference and what topics should I talk to them about? SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH
  82. PATTERN IDENTIFICATION This connectivity pattern is a sign of financial

    fraud. When this motif is found, a red flag will be raised. Healthy discourse is typified by a discussion board with a branch factor in this range and a concept clique score in this range. TRANSACTION GRAPH DISCUSSION GRAPH
  83. KNOWLEDGE DISCOVERY The terms "ice", "fans", "stanley cup," are classified

    as "sports" Given that all identified birds fly, it can be deduced that all birds fly. If contrary evidence is provided, then this "fact" can be retracted. WIKIPEDIA GRAPH EVIDENTIAL LOGIC GRAPH
  84. WORLD MODEL

  85. WORLD MODEL WORLD PROCESSES

  86. WORLD MODEL WORLD PROCESSES A single world model and various

    types of traversers moving through that model to solve problems.
  87. GRAPH TRAVERSAL STRUCTURE PROCESS COMPUTING GRAPH-BASED COMPUTING

  88. GRAPH COMPUTING ENGINES

  89. MEMORY-BASED GRAPHS Application iGraph http://igraph.sourceforge.net/ NetworkX http://networkx.lanl.gov/ JUNG http://jung.sourceforge.net/ Graph

    Framework
  90. Application Application DISK-BASED GRAPHS Application Neo4j http://neo4j.org/ OrientDB http://orientdb.org Graph

    Database InfiniteGraph http://objectivity.com DEX http://www.sparsity-technologies.com/dex
  91. CLUSTER-BASED GRAPHS Hama http://incubator.apache.org/hama/ Giraph http://incubator.apache.org/giraph/ GoldenOrb http://goldenorbos.org/ Application 3

    Application 2 Application 1 Bulk Synchronous Parallel Processing * In the same spirit as Google's Pregel
  92. MEMORY-bASED GRAPHS Graph size is constrained by local machine's RAM.

    Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. * Based on typical behavior
  93. MEMORY-bASED GRAPHS DISK-BASED GRAPHS Graph size is constrained by local

    machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. * Based on typical behavior
  94. MEMORY-bASED GRAPHS DISK-BASED GRAPHS CLUSTER-BASED GRAPHS Graph size is constrained

    by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. Graph size is constrained to cluster's total RAM. Optimized for global graph algorithms. Oriented towards "textbook-style" graphs. * Based on typical behavior
  95. TINKERPOP Open source graph product group Support for various graph

    vendors Provides a vendor-agnostic graph framework * Encompassing the various graph computing styles Simple, well-defined products * Based on future directions http://tinkerpop.com
  96. TINKERPOP Generic Graph API Dataflow Processing Traversal Language Object-Graph Mapper

    Graph Algorithms Graph Server http://tinkerpop.com http://${project.name}.tinkerpop.com
  97. TINKERPOP INTEGRATION http://tinkerpop.com

  98. AND NOW THERE IS ANOTHER...

  99. None
  100. None
  101. None
  102. None
  103. None
  104. TITAN

  105. PART 2: INTRODUCTION TO TITAN MATTHIAS BROECHELER

  106. ...need to represent and process graphs at the 100+ billion

    edge scale w/ thousands of concurrent transactions. ...desire a free, open source distributed graph database. ...need both local graph traversals (OLTP) and batch graph processing (OLAP). WhY CREATE TITAN? A number of Aurelius' clients...
  107. ..."infinite size" graphs and "unlimited" users by means of a

    distributed storage engine. ...distribution via the liberal, free, open source Apache2 license. ...real-time local traversals (OLTP) and support for global batch processing via Hadoop (OLAP). TITAN's KEY FEATURES Titan provides...
  108. matthias$

  109. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average

    Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$
  110. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average

    Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$
  111. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average

    Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$
  112. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average

    Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin>
  113. gremlin> g = TitanFactory.open('/tmp/local-titan') ==>titangraph[local:/tmp/local-titan]

  114. gremlin> g = TitanFactory.open('/tmp/local-titan') ==>titangraph[local:/tmp/local-titan] LOCAL MACHINE MODE

  115. gremlin> g.createKeyIndex('name',Vertex.class) ==>null gremlin> g.stopTransaction(SUCCESS) ==>null

  116. gremlin> g.loadGraphML('data/graph-of-the-gods.xml') ==>null name:tartarus type:location name:pluto type:god lives brother name:jupiter

    type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12 * The Graph of the Gods is a toy dataset distributed with Titan
  117. gremlin> hercules = g.V('name','hercules').next() ==>v[24] name:tartarus type:location name:pluto type:god lives

    brother name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12
  118. gremlin> hercules.out('mother','father') ==>v[44] ==>v[16] name:tartarus type:location name:pluto type:god lives brother

    name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12
  119. gremlin> hercules.out('mother','father').name ==>alcmene ==>jupiter name:tartarus type:location name:pluto type:god lives brother

    name:jupiter type:god brother name:neptune type:god pet name:cerberus type:monster lives father name:saturn type:titan brother name:sea type:location lives name:sky type:location lives father battled name:hercules type:demigod name:hydra type:monster battled name:nemean type:monster battled name:alcmene type:human mother time:1 time:2 time:12
  120. THAT WAS TITAN LOCAL. NEXT IS TITAN DISTRIBUTED. Broecheler, M.,

    Pugliese, A., Subrahmanian, V.S., "COSI: Cloud Oriented Subgraph Identification in Massive Social Networks," Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248-255, 2010. http://www.knowledgefrominformation.com/2010/08/01/cosi-cloud-oriented-subgraph-identification-in-massive-social-networks/
  121. -OR- BACKEND AGNOSTIC

  122. titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new

    BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","cassandra"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77] gremlin> TITAN DISTRIBUTED VIA CASSANDRA * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
  123. INHERITED FEATURES Continuously available with no single point of failure.

    Cassandra available at http://cassandra.apache.org/ No write bottlenecks to the graph as there is no master/slave architecture. Elastic scalability allows for the introduction and removal of machines. Caching layer ensures that continuously accessed data is available in memory. Built-in replication ensures data is available during machine failure.
  124. titan$ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new

    BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","hbase"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[hbase:77.77.77.77] gremlin> TITAN DISTRIBUTED VIA HBASE * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
  125. INHERITED FEATURES Linear scalability with the addition of machines. HBase

    available at http://hbase.apache.org/ Strictly consistent reads and writes. HDFS-based data replication. Base classes for backing Hadoop MapReduce jobs with HBase tables. Generally good integration with the tools in the Hadoop ecosystem.
  126. TITAN AND THE CAP THEOREM Consistency Partitionability Availability

  127. Titan is all about ...

  128. Titan is all about numerous concurrent users...

  129. Titan is all about numerous concurrent users... high availability....

  130. Titan is all about numerous concurrent users... high availability.... dynamic

    scalability...
  131. EDGE COMPRESSION VERTEX-CENTRIC INDICES DATA MANAGEMENT THE HOW OF TITAN

  132. DATA MANAGEMENT THE HOW OF TITAN

  133. DATA MANAGEMENT MAIN DESIGN PRINCIPLES Optimistic Concurrency Control Fined-Grained Locking

    Control Immutable, Atomic Edges battled hercules cerberus battled hercules time:12 cerberus battled hercules time:12 successful:true cerberus 1 2 3 + + + + + -
  134. DATA MANAGEMENT hercules jupiter father father mars Functional Declarations Datatype

    Constraints TYPE DEFINITION TitanKey timeKey = g.makeType().name("time") .dataType(Integer.class) time:12 TitanLabel father = g.makeType().name("father") .functional() Edge Label Signatures TitanLabel battled = g.makeType().name("battled") .signature(timeKey) battled hercules time:12 cerberus time:"twelve" Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
  135. DATA MANAGEMENT Unique Property Key/Value Pairs TYPE DEFINITION Endogenous Indices

    g.createKeyIndex("name",Vertex.class) name:hercules name:hermes name:jupiter name:jupiter status:king of the gods name:neptune status:king of the gods TitanKey status = g.makeType().name("status") .unique() Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
  136. DATA MANAGEMENT Ensures consistency over non-consistent storage backends. LOCKING SYSTEM

    hercules neptune father father jupiter hercules write write father jupiter hercules 1. Acquire lock at the end of the transaction. - locking mechanism depends on storage layer consistency guarantees. 2. Verify original read. 3. Fail transaction if any precondition is violated.
  137. DATA MANAGEMENT ID MANAGEMENT Global ID Pool Maintained by Storage

    Engine [0,1,2,3,4,5,6,7,8,9,10,11]
  138. DATA MANAGEMENT ID MANAGEMENT Pool Subsets Assigned to Individual Instances

    Global ID Pool Maintained by Storage Engine [0,1,2] [3,4,5] [6,7,8] [9,10,11] [0,1,2,3,4,5,6,7,8,9,10,11]
  139. EDGE COMPRESSION THE HOW OF TITAN

  140. EDGE COMPRESSION Natural graphs have a small world, community/cluster property.

    Watts, D. J., Strogatz, S. H., "Collective Dynamics of 'Small-World' Networks," Nature 393 (6684), pp. 440–442, 1998. Community 1 Community 2 High intra-connectivity within a community and low inter-connectivity between communities.
  141. EDGE COMPRESSION

  142. EDGE COMPRESSION 12345678 12345683 knows

  143. EDGE COMPRESSION 12345678 12345683 knows

  144. EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes

  145. EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes

    12345678 9 +5
  146. EDGE COMPRESSION 12345678 12345683 knows 12345678 9 12345683 24 bytes

    12345678 9 +5 12345678 9 + 5 7 bytes
  147. VERTEX-CENTRIC INDICES THE HOW OF TITAN

  148. VERTEX-CENTRIC INDICES Natural, real-world graphs contain vertices of high degree.

    Even if rare, their degree ensures that they exist on many paths. Traversing a high degree vertex means touching numerous incident edges and potentially touching most of the graph in only a few steps. THE SUPER NODE PROBLEM
  149. VERTEX-CENTRIC INDICES A "super node" only exists from the vantage

    point of classic "textbook style" graphs. In the world of property graphs, intelligent disk-level filtering can interpret a "super node" as a more manageable low-degree vertex. Vertex-centric querying utilizes B-Trees and sort orders for speedy lookup of incident edges with particular qualities. A SUPER NODE SOLUTION
  150. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES knows knows knows likes likes likes

    likes likes stars:5 stars:3 stars:3 stars:2 stars:2 vertex.query() 8 edges
  151. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES knows knows likes likes likes likes

    likes stars:5 stars:3 stars:3 stars:2 stars:2 vertex.query().direction(OUT) 7 edges
  152. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES likes likes likes likes likes stars:5

    stars:3 stars:3 stars:2 stars:2 vertex.query().direction(OUT) .labels("likes") 5 edges
  153. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES likes stars:5 1 edge vertex.query().direction(OUT) .labels("likes").has("stars",5)

  154. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES Query Query.direction(Direction) Query Query.labels(String... labels) Query

    Query.has(String, Object, Compare) Query Query.has(String, Object) Query Query.range(String, Object, Object) Iterable<Vertex> Query.vertices() Iterable<Edge> Query.edges() PREDICATES GETTERS
  155. VERTEX-CENTRIC INDICES battled battled battled knows knows time:1 time:2 time:12

    DISK-LEVEL SORTING/INDEXING
  156. VERTEX-CENTRIC INDICES battled battled battled knows knows battled knows time:1

    time:2 time:12 DISK-LEVEL SORTING/INDEXING
  157. VERTEX-CENTRIC INDICES battled battled battled knows knows battled w/ time

    1-5 knows TitanLabel battled = g.makeType().name("battled") .primaryKey(time) time:1 time:2 time:12 battled w/ time 5-10 DISK-LEVEL SORTING/INDEXING
  158. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother

  159. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother

  160. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother family

    TypeGroup family = TypeGroup.of(2,"family"); TitanLabel father = g.makeType().name("father") .group(family).makeEdgeLabel(); TitanLabel mother = g.makeType().name("mother") .group(family).makeEdgeLabel(); TitanLabel brother = g.makeType().name("brother") .group(family).makeEdgeLabel();
  161. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING father battled knows brother mother family

    vertex.query().group("family")...
  162. EDGE COMPRESSION VERTEX-CENTRIC INDICES DATA MANAGEMENT THAT IS HOW TITAN

    WORKS
  163. WHAT IF YOU WANTED TO CREATE TWITTER FROM SCRATCH? SIMULATING

    TWITTER
  164. 3 BILLION EDGES 100 MILLION VERTICES 10000 CONCURRENT USERS 50

    MACHINES 1 GRAPH DATABASE COMING JULY 2012
  165. PART 3: THE FUTURE OF AURELIUS MATTHIAS BROECHELER MARKO A.

    RODRIGUEZ
  166. AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed

    graph database solution. OLTP
  167. AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed

    graph database solution. Titan as the source (and potential sink) for other graph processing solutions. OLTP OLAP
  168. FAUNUS GOD OF HERDS

  169. FAUNUS PATH ALGEBRA FOR HADOOP hercules battled battled theseus cretan

    bull theseus hercules ally Derived graphs are single-relational and are typically much smaller than their multi-relational source. Therefore, derived graphs can be subjected to "textbook-style" graph algorithms in both a meaningful and efficient manner. WHO IS THE MOST CENTRAL ALLY? A · A￿ ◦ n(I)
  170. FAUNUS PATH ALGEBRA FOR HADOOP ally ally ally ally ally

    ally ally ally ally ally ally ally ally B · B ◦ n(I) "My allies' allies are my allies." B = A · A￿ ◦ n(I) (A · A￿)2 ◦ n(I)
  171. FAUNUS PATH ALGEBRA FOR HADOOP Implements the multi-relational path algebra

    as a collection of Map/Reduce operations Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29-41, 2009. http://arxiv.org/abs/0806.2274 Support for "HadoopGraph" and HDFS file formats Project codename: TinkerPoop Reduce a massive property graph into a smaller semantically-rich single-relational graph. Used for global graph operations.
  172. FULGORA GODDESS OF LIGHTNING

  173. FULGORA AN EFFICIENt IN-MEMORY GRAPH ENGINE Non-transactional, in-memory graph engine.

    It is not a "database." Process ~90 billion edges in 68-Gigs of RAM assuming a small world topology. Perform complex graph algorithms in-memory. global graph analysis multi-relational graph analysis Similar in spirit to Twitter's Cassovary: https://github.com/twitter/cassovary
  174. Stores a massive-scale property graph Generates a large-scale single-relational graph

    Analyzes compressed, large-scale single or multi-relational graphs in memory Map/Reduce Load into RAM on a single-machine Update element properties with algorithm results THE AURELIUS OLAP FLOW to a stats package Update graph with derived edges
  175. Stores a massive-scale property graph Generates a large-scale single-relational graph

    Analyzes compressed, large-scale single or multi-relational graphs in memory Map/Reduce Load into RAM on a single-machine THE AURELIUS OLAP FLOW to a stats package theseus hercules ally hercules ally_centrality:0.0123
  176. Stores a massive-scale property graph Generates a large-scale single-relational graph

    Analyzes compressed, large-scale single or multi-relational graphs in memory THE AURELIUS OLAP FLOW to a stats package
  177. AURELIUS' USE OF BLUEPRINTS Aurelius products use the Blueprints API

    so any graph product can communicate with any other graph product. The code for graph databases, frameworks, algorithms, and batch-processing are written in terms of the Blueprints API. Aurelius encourages developers to use Blueprints/ TinkerPop in order to grow a rich ecosystem of interoperable graph technologies.
  178. THE GRAPH LANDSCAPE REPRISE Speed of Traversal/Process Size of Graph/Structure

    * Not to scale. Did not want to overlap logos.
  179. NEXT STEPS http://thinkaurelius.com http://thinkaurelius.github.com/titan/ Learn about applying graph theory and

    network science. Make use of and/or contribute to the free, open source Titan product.
  180. THANK YOU

  181. CREDITS PRESENTERS MARKO A. RODRIGUEZ MATTHIAS BROCHELER FINANCIAL SUPPORT PEARSON

    EDUCATION AURELIUS LOCATION PROVISIONS JIVE SOFTWARE MANY THANKS TO DAN LAROCQUE TINKERPOP COMMUNITY STEPHEN MALLETTE BOBBY NORTON KETRINA YIM