Graph Database Patterns in Python

Graph Database Patterns Elizabeth Ramirez @eramirem

What is this talk about •  Property graph definition • 
Graphs at scale with Titan, Cassandra and Elasticsearch •  Gremlin Query Language •  Python Patterns for Titan Models

What is not this talk about •  Graph Theory • 
Titan Data Model •  Existing libraries •  Best practices

The Property Graph Model G = (V, E, λ) V:
set of vertices E: set of vertices identifiers λ: set of properties

Why a Graph Database? •  Semantic Web: Structured Knowledge Representation
•  Index-free adjacency: Like memory pointers, but in disk •  Navigation between nodes in constant time. •  Graph != No schema

•  Vendor agnostic •  Blueprints: Collection of Java interfaces for
representing Graphs •  Pipes: Extension of Iterator, Iterable, chained together (Filter, Aggregation, SideEffect, etc.) •  Groovy: Superset of Java, exposes full JDK to Gremlin Blueprints → Pipes → Gremlin TinkerPop Stack

Why Titan? •  Multiple options for storage backend (Cassandra, HBase,
BerkeleyDB) •  Multiple options for index backend (Lucene, Elasticsearch) •  Based on Blueprints API and Tinkerpop Stack •  Locking control to ensure consistency •  Edge compression, Vertex-Centric Indices •  Expressive querying using Gremlin: outV = g.v(4); inV = g.v(400); g.addEdge(null, outV, inV, 'BT');

Architecture (I)

Architecture (II) $ nodetool status titan expr: syntax error
Datacenter: us-‐east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -‐-‐ Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 133.09 MB 256 32.1% 9xxxxxxx-‐xxxx-‐xxxx-‐xxxx-‐xxxxxxxxxxx 1e UN 10.x.x.x. 145.92 MB 256 33.8% fxxxxxxx-‐xxxx-‐xxxx-‐xxxx-‐xxxxxxxxxxx 1d UN 10.x.x.x 135.59 MB 256 34.1% bxxxxxxx-‐xxxx-‐xxxx-‐xxxx-‐xxxxxxxxxxx 1d

Gremlin + Rexster: Remote Query Execution Gremlin: Graph Query Language
Uses Groovy as host language Rexster: Graph Server RexsterClient client = RexsterClientFactory.open("localhost", "titan"); List<Map<String, Object>> results = client.execute("g.v(4).map"); ./gremlin.sh gremlin> g = TitanFactory.open('conf/titan-‐cassandra.properties'); ==>titangraph[cassandrathrift:127.0.0.1] gremlin> g.v(27512).outE('link') ==>e[1ffB3-‐79K-‐aG][27512-‐link-‐>1497496] ==>e[1ffB5-‐79K-‐aG][27512-‐link-‐>1497500]

Graph of Semantic Knowledge

Simple Traversals gremlin> g.v(20000).has('namespace', 'concept') ==>v[20000] gremlin> g.V('concept_name',
'California').has('concept_type', 'nytd_geo') ==>v[23716] gremlin> g.v(27512).out('location') ==>v[4] gremlin> g.v(27512).outE ==>e[1ak0F-‐79K-‐bI][27512-‐teragram-‐>1796728] ==>e[1ak0z-‐79K-‐bI][27512-‐teragram-‐>1796712] ==>e[1ak0x-‐79K-‐bI][27512-‐teragram-‐>1804588] ==>e[1ak0D-‐79K-‐bI][27512-‐teragram-‐>1796716] ==>e[1ak0B-‐79K-‐bI][27512-‐teragram-‐>1796720] ==>e[1ak0H-‐79K-‐bI][27512-‐teragram-‐>1796724] ==>e[1c96H-‐79K-‐bY][27512-‐mapping-‐>1536760] ==>e[1c96J-‐79K-‐bY][27512-‐mapping-‐>1655936] ==>e[1c96F-‐79K-‐bY][27512-‐mapping-‐>1536756] ==>e[1e0RP-‐79K-‐cm][27512-‐location-‐>4] gremlin>

More complex traversals (I)

More complex traversals (II) gremlin> m=[]; gremlin> g.v(12808).as('x').outE('taxonomy').has('taxonomy_relation',
'BT').inV().store(m).loop('x') {it.object.outE('taxonomy').has('taxonomy_relation', 'BT').inV().count() != 0}.iterate() ==>null gremlin> m ==>v[21812] ==>v[16492] ==>v[10176] ==>v[19584]

More complex traversals (III) gremlin> g.V.has('geocode_waypoint', WITHIN, Geoshape.circle(40.714, -‐74.0059,
1.0)) ==>v[4] ==>v[320] ==>v[2756] ==>v[3252] ==>v[1348] ==>v[1084] ==>v[8140] gremlin> g.V.has('concept_name', CONTAINS, 'Barack').has('concept_name', CONTAINS, 'Obama').filter({it.concept_status=='Active'}) ==>v[59360] ==>v[714092] ==>v[1105536] gremlin>

•  transform •  filter •  sideEffect Pipes Traversal Pattern

@classmethod def v(cls,
id): cls.pipe = "g.v({0})".format(id) return cls @classmethod def e(cls, id): cls.pipe = "g.e({0})".format(repr(id)) return cls Pipes and Filters in Python (I)

@classmethod def addV(cls, **kwargs): cls.pipe = "v
= g.addVertex()\n” cls.pipe += "v.setProperty('namespace', '{0}')\n".format(cls.namespace) for p, v in kwargs.items(): cls.pipe += "v.setProperty('{0}', {1})\n".format("_".join([cls.namespace, p]), repr(v)) cls.pipe += "return v" return cls Pipes and Filters in Python (II)

@classmethod def addE(cls, outV, inV, **kwargs): cls.pipe
= "outV = g.v({0}); inV = g.v({1}); ".format(outV, inV) cls.pipe += "e = g.addEdge(null, outV, inV, '{0}'); ".format(cls.namespace) for p, v in kwargs.iteritems(): cls.pipe += 'e.setProperty("{0}", {1}); '.format("_".join([cls.namespace, p]), repr(v)) cls.pipe += 'return e' return cls Pipes and Filters in Python (III)

class GraphFactory(type): """ Metaclass for graph
elements: vertices and edges """ def __new__(cls, name, bases, dct): if 'namespace' not in dct: dct['namespace'] = camel_to_snake(name) return super(GraphFactory, cls).__new__(cls, name, bases, dct) Factory Pattern (I)

class VertexElement(object): @classmethod
def v(cls, id): cls.pipe = "g.v({0})".format(id) return cls class EdgeElement(object): @classmethod def e(cls, id): cls.pipe = "g.e({0})".format(id) return cls Factory Pattern (II)

class GraphFactory(type): def __call__(cls):
results = execute_query(cls.pipe) if isinstance(results, list): return map(deserialize, results) else: return deserialize(results) Factory Pattern (III)

class ExtractionRule(VertexElement): __metaclass__ = GraphFactory @classmethod
def get_by_id(self, id): return self.v(id).has('namespace', EQUAL, self.namespace)() @classmethod def get_for_variant(self, variant, **filters): results = self.V().has('extraction_rule_trigger_term', CONTAINS, *utils.tokenize(variant)) results.inV('teragram').filter(type=concept_type).dedup().limit()() return results Models

@classmethod def search(self, id=None, **filters): order =
['trigger_term', 'condition', 'descriptor', 'condition_type'] if id: results = self.get_by_id(id) return results else: parsed = OrderedDict(sorted(parsed.items(), key=lambda t: order.index(t[0]), reverse=True)) index, value = parsed.popitem() results = self.V('_'.join([self.namespace, index]), value) results.filter(**parsed).limit() return results() Models (II)

Conclusions - Factories are the most universal design patterns. -
Don't delegate the creation of types to your code. - For bulk imports, use a JVM language - Patterns that don’t do well: SELECT *

Thank You!

Graph Database Patterns in Python

Graph Database Patterns in Python

The New York Times Developers

More Decks by The New York Times Developers

Other Decks in Programming

Featured

Transcript

Graph Database Patterns Elizabeth Ramirez @eramirem

What is this talk about •  Property graph definition •

What is not this talk about •  Graph Theory •

The Property Graph Model G = (V, E, λ) V:

Why a Graph Database? •  Semantic Web: Structured Knowledge Representation

•  Vendor agnostic •  Blueprints: Collection of Java interfaces for

Why Titan? •  Multiple options for storage backend (Cassandra, HBase,

Architecture (I)

Architecture (II) $ nodetool status titan expr: syntax error

Gremlin + Rexster: Remote Query Execution Gremlin: Graph Query Language

Graph of Semantic Knowledge

Simple Traversals gremlin> g.v(20000).has('namespace', 'concept') ==>v[20000] gremlin> g.V('concept_name',

More complex traversals (I)

More complex traversals (II) gremlin> m=[]; gremlin> g.v(12808).as('x').outE('taxonomy').has('taxonomy_relation',

More complex traversals (III) gremlin> g.V.has('geocode_waypoint', WITHIN, Geoshape.circle(40.714, -‐74.0059,

•  transform •  filter •  sideEffect Pipes Traversal Pattern

@classmethod def v(cls,

@classmethod def addV(cls, **kwargs): cls.pipe = "v

@classmethod def addE(cls, outV, inV, **kwargs): cls.pipe

class GraphFactory(type): """ Metaclass for graph

class VertexElement(object): @classmethod

class GraphFactory(type): def call(cls):

class ExtractionRule(VertexElement): metaclass = GraphFactory @classmethod

@classmethod def search(self, id=None, **filters): order =

Conclusions - Factories are the most universal design patterns. -

Thank You!