Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Databases in Python (PyCon Ca 2012)

Graph Databases in Python (PyCon Ca 2012)

Since the irruption in the market of the NoSQL concept, graph databases have been traditionally designed to be used with Java or C. With some honorable exceptions, there isn't an easy way to manage graph databases from Python. In this talk, I will introduce you some of the tools that you can use today in order to work with those new challenging databases, from our favorite languge, Python.

Javier de la Rosa

November 11, 2012
Tweet

More Decks by Javier de la Rosa

Other Decks in Technology

Transcript

  1. GRAPH DATABASES IN PYTHON Javier de la Rosa @versae The

    CulturePlex Lab Western University, London, ON PyCon Canada 2012
  2. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 2 WHO I AM • Javier de la Rosa • versae • versae • Computer Scientist and Humanist • CulturePlex Lab • CulturePlex
  3. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 3 FIRST OF ALL “You do not really understand something unless you can explain it to your grandmother” – (Frequently attributed to) Richard Feynman
  4. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 4 DATABASES (in the last 30 years) • Data in tables, rows and columns • Pretty basic mechanism to make connections: – Primary keys, Foreign keys, and... that's all • Relational, ahem, really?
  5. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 5 DATABASES (in the last 30 years) • Rigid data schemas – Have you ever tried to make a schema migration? • Relational Algebra and SQL – Terrible for highly interconnected data – JOIN's can take a life to end (a bit overdramatized)
  6. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 6 NoSQL, Not Only SQL • Document – MongoDB, CouchDB, etc. • Key-value stores – Redis, Riak, Voldemort, Dynamo, etc. • Big Tables – Cassandra, Hbase, etc • Anaylitc – Hadoop • Graph – Neo4j, OrientDB, HyperGraphDB, Titan, etc. • Other – Objectivity/DB, ZODB, etc.
  7. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 7 DATABASES LANDSCAPE Source: 451Research, https://451research.com/report-long?icid=2289
  8. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 8 WHO IS USING GRAPHS? • Mozilla with Pancake and Pacer – https://wiki.mozilla.org/Pancake & http://pangloss.github.com/pacer/ • Twitter with FlockDB – https://github.com/twitter/flockdb • Facebook with Open Graph – https://developers.facebook.com/docs/opengraph/ • Google with Knowledge Graph – http://www.google.ca/insidesearch/.../knowledge.html
  9. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 9 WHY GRAPHS? • Data is getting more and more connected – From text documents, to wikis, to ontologies, to folksonomies, etc • And more semi-structured – Think about the decentralization of content generation • And more complex – Social networks, semantic trending, etc Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009
  10. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 10 A FEW OF THE CURRENT USES • Social Networking and Recommendations • Network and Cloud Management • Master Data Management • Geospatial • Bioinformatics • Content Management and Security and Access Control Source: Mashable, http://mashable.com/2012/09/26/graph-databases/
  11. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 11 AND WHY ELSE? • Because graphs are cool! Leonard Euler
  12. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 12 WHAT IS A GRAPH? • G = (V, E) Where – G is a graph – V is a set of vertices – E is a set of edges Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)
  13. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 13 WHAT IS A GRAPH? • G = (V, E) – Graph, aka network, diagram, etc. – Vertex, aka point, dot, node, element, etc. – Edge, aka relationship, arc, line, link, etc. • Basically, “a graph states that something is related to something else” – Svetlana Sicular, Research Director at Gartner Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/
  14. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 14 TYPES OF GRAPH Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Undirected Digraph
  15. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 15 TYPES OF GRAPH Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Multigraph Hypergraph
  16. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 16 SOME GRAPHS EVEN HAVE A NAME • Complete graphs Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs K 3 K 8 K 5
  17. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 17 SOME GRAPHS EVEN HAVE A NAME • Stars Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs The star graphs S 3 , S 4 , S 5 and S 6
  18. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 18 SOME GRAPHS EVEN HAVE A NAME • Snarks Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Blanuša (second) Double star Szekeres
  19. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 19 THINGS CAN COMPLICATE... Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Local McLaughlin graph
  20. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 21 DON'T WORRY • Just one more type: the Property Graph 1 4 3 2 1 2 3 4
  21. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 22 THE PROPERTY GRAPH • Directed, attributed and multi-relational 4 3 2 1 2 3 4 Knows Since: 2009 Knows Since:1990 Likes Likes Name: John Name: Javi Name: David Title: The Art of Computer Programming Price: $135 1
  22. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 23 THE PROPERTY GRAPH • A set of nodes, and each node has: – An unique identifier. – A set of outgoing edges. – A set of incoming edges. – A collection of properties defined by a map from key to value. • A set of relationships, and each relationship has: – An unique identifier. – An outgoing tail vertex. – An incoming head vertex. – And a collection of properties defined by a map from key to value. Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph
  23. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 24 IN SHORT • A Property Graph is composed by: – A set of nodes – A set of relationships – Properties and id's on both • Sometimes, nodes and relationship can be typed – In Blueprints and Neo4j, a label denotes the type of relationship between its two nodes.
  24. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 25 GRAPH DATABASES • A graph database uses graph structures with nodes, edges, and properties to represent and store data – ...but there is not an easy way to visualize this Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
  25. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 27 HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester")
  26. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 28 HOW IT LOOKS IN PYTHON? Name: Silvester # Let's create a graph >>> silvester = g.nodes.create(name="Silvester")
  27. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 29 HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") Name: Silvester
  28. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 30 HOW IT LOOKS IN PYTHON? Name: Silvester Name: Arnold # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold")
  29. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 31 HOW IT LOOKS IN PYTHON? Name: Silvester Name: Arnold # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") >>> punch = arnold.punches(silvester)
  30. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 32 HOW IT LOOKS IN PYTHON? Name: Silvester Name: Arnold # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") >>> punch = arnold.punches(silvester) punches
  31. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 33 HOW IT LOOKS IN PYTHON? Name: Arnold punches Name: Silvester
  32. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 34 HOW IT LOOKS IN PYTHON? Name: Arnold >>> chuck = g.nodes.create(name="Chuck") punches Name: Silvester
  33. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 35 HOW IT LOOKS IN PYTHON? Name: Arnold >>> chuck = g.nodes.create(name="Chuck") punches Name: Silvester Name: Chuck
  34. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 36 HOW IT LOOKS IN PYTHON? Name: Arnold >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches Name: Silvester Name: Chuck
  35. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 37 HOW IT LOOKS IN PYTHON? Name: Arnold >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches Name: Silvester Name: Chuck dropkicks dropkicks
  36. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 40 GRAPH DATABASES LANDSCAPE Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Database Data Model Query Method License Python Binding Neo4j Property Graph Cypher, Gremlin, Traversal GPL, AGPL Native, Blueprints, REST OrientDB Property Graph Gremlin, Traversal Apache 2 Blueprints HyperGraphDB Typed Hypergraph HGQuery, Traversal LGPL Nope DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints InfoGrid Property Graph Traversal AGPL, Commercial Nope InfiniteGraph Property Graph Gremlin Commercial Nope
  37. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 41 GRAPH DATABASES LANDSCAPE And more: – AffinityDB – YarcData uRiKA – Apache Giraph – Cassovary – StigDB – NuvolaBase – Pegasus – Microsoft Trinity – Sherlock – And so on
  38. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 42 GRAPH DATABASES LANDSCAPE Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Database Data Model Query Method License Python Binding Neo4j Property Graph Cypher, Gremlin, Traversal GPL, AGPL Native, Blueprints, REST OrientDB Property Graph Gremlin, Traversal Apache 2 Blueprints HyperGraphDB Typed Hypergraph HGQuery, Traversal LGPL Nope DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints InfoGrid Property Graph Traversal AGPL, Commercial Nope InfiniteGraph Property Graph Gremlin Commercial Nope
  39. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 43 GREMLIN, BLUEPRINTS, WAT? Let me introduce you the TinkerPop Stack Source:TinkerPop, http://www.tinkerpop.com/
  40. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 44 BLUEPRINTS AND REXSTER • Blueprints is a property graph model interface • Rexster is a server that exposes any Blueprints graph through REST Source:TinkerPop, http://www.tinkerpop.com/
  41. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 45 AND WHAT ABOUT PYTHON? • Options to connect to a Blueprints Graph Database Rexster Blueprints API Neo4j REST bulbflow python-blueprints pyblueprints OrientDB Titan DEX
  42. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 46 BULBFLOW • Create • Get • Update • Delete Source: Bulbflow, http://bulbflow.com/docs/ >>> alice = g.vertices.create(name="Alice") >>> bob = g.vertices.create(name="Bob") >>> g.edges.create(alice, "knows", bob) >>> alice = g.vertices.get(1) >>> bob = g.vertices.get(2) >>> alice.age = 21 >>> alice.save() >>> alice.delete()
  43. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 47 PYBLUEPRINTS • Create • Get • Update • Delete Source: PyBlueprints, https://github.com/escalant3/pyblueprints >>> alice = g.addVertex() >>> alice.setProperty("name", "Alice") >>> bob = g.addVertex() >>> bob.setProperty("name", "Bob") >>> g.addEdge(alice, bob, "knows") >>> alice = g.getVertex(1) >>> bob = g.getVertex(2) >>> alice.setProperty("age", 21) >>> g.removeVertex(alice.getId())
  44. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 48 BUT NEO4J HAS ITS OWN CLIENTS! • REST Clients for Neo4j Rexster Blueprints API Neo4j REST bulbflow python-blueprints pyblueprints OrientDB Titan DEX neo4j-rest-client py2neo
  45. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 49 HOW CAN I LOOKUP? • An index is a data structure that supports the fast lookup of elements by some key/value pair Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices
  46. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 50 INDICES • In Python bindings, are similar to dict – bulbflow – PyBlueprints # bulbflow creates auto indices to make easier basic lookups >>> nodes = g.vertices.index.lookup(name="Alice") >>> for node in nodes: ...: print vertex >>> index = g.getIndex("names", "vertex") >>> index.put("name", alice.getProperty("name"), alice) >>> nodes = index.get("name", "Alice") >>> for node in nodes: ...: print node
  47. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 51 INDICES • Some Graph Databases provide full-text queries – bulbflow – PyBlueprints >>> nodes = g.vertices.index.query(name="ali*") >>> for node in nodes: ...: print node >>> index = g.getIndex("names", "vertex") >>> nodes = index.query("name", "ali*") >>> for node in nodes: ...: print node
  48. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 52 ...MORE COMPLEX SEARCHS? “Without traversals [FlockDB] is only a persisted graph. But not a graph database.” – Alex Popescu Source: myNoSQL, http://nosql.mypopescu.com/
  49. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 53 LET'S TRAVERSE THE GRAPH! • “A graph traversal is the problem of visiting all the nodes in a graph in a particular manner” – A* search – Alpha-beta prunning – Breadth-First Search (BFS) – Depth-First Search (DFS) – Dijkstra's algorithm – Floyd-Warshall's algortimth – Etc. Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal
  50. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 54 NEO4J TRAVERSAL API • Python-embedded (native Neo4j Python binding) • neo4j-rest-client >>> traverser = gdb.traversal()\ .relationships('knows').traverse(alice) # The graph is traversed as you loop through the result >>> for node in traverser.nodes: ...: print node >>> traverser = alice.traverse(types=[client.All.knows]) # The graph is traversed as you loop through the result >>> for node in traverser: ...: print node
  51. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 55 BLUEPRINTS GREMLIN • Gremlin is a domain specific language for traversing property graphs – Defines how to do a query based on the graph structure Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010 >>> gremlin = g.extensions.GremlinPlugin.execute_script >>> params = {'alice_id': alice.id} >>> script = "g.V(alice_id).out('knows')" >>> node = gremlin(script=script, params=params) >>> node == bob
  52. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 56 NEO4J CYPHER QUERY LANGUAGE • Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
  53. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 57 NEO4J CYPHER QUERY LANGUAGE • Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database 1 2 label (1) -[:label]- (2)
  54. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 58 NEO4J CYPHER QUERY LANGUAGE • Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database 1 2 label START n=(1), m=(2) MATCH n-[r:label]-m RETURN r
  55. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 59 PY2NEO CYPHER HELPERS • Get or create elements • Get counts • Delete Source: py2neo, http://py2neo.org/ >>> nodes_count = g.get_node_count() >>> rels_count = g.get_relationship_count() >>> g.delete() >>> g.get_or_create_relationships( ...: (bob, "WORKS WITH", carol, {"since": 2004}), ...: (alice, "DISLIKES!", carol, {"reason": "youth"}), ...: (bob, "WORKS WITH", dave, {"since": 2009}), )
  56. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 60 NEO4J-REST-CLIENT CYPHER HELPERS • Query casting • Complex filtering Source: neo4j-rest-client, https://github.com/versae/neo4j-rest-client >>> q = """start n=node(*) match n-[r:punchs]-() """ \ """return n, n.name, r, r.since""" >>> results = g.query(q, returns=(Node, unicode, Relationship, int)) lookups = ( Q("name", exact="Arnold") & (Q("surname", istartswith="swar") & ~Q("surname", iendswith="chenegger")) ) arnolds = g.nodes.filter(lookups)
  57. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 61 LET'S PLAY! • Deploy Neo4j in Heroku or Amazon • Use one of the available clients
  58. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 62 NEO4J HEROKU ADD-ON • Create a Heroku app and add the Neo4j add-on • Create a virtualenv with neo4j-rest-client $ heroku apps:create pyconca $ heroku addons:add neo4j --app pyconca $ xdg-open `heroku config:get NEO4J_URL --app pyconca` $ export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca` $ mkvirtualenv --no-site-packages pyconca $ workon pyconca $ pip install ipython neo4jrestclient $ ipython
  59. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 63 NEO4J HEROKU ADD-ON • Run IPython and that's it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url
  60. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 64 NEO4J HEROKU ADD-ON • Run IPython and that's it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url
  61. THANKS! Questions? Javier de la Rosa @versae The CulturePlex Lab

    Western University, London, ON PyCon Canada 2012
  62. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 66 APPENDIX: DATA MODELS • neo4django – https://github.com/scholrly/neo4django • neomodel – https://github.com/robinedwards/neomodel • bulbflow models – http://bulbflow.com/quickstart/#models
  63. Graph Databases in Python, Javier de la Rosa, PyCon Canada,

    2012 67 APPENDIX: VISUALIZE YOUR GRAPH • Export somehow to .gexf for Gephi – http://gephi.org/ • Use D3.js – http://d3js.org/ • Use sigma.js – http://sigmajs.org/ • Take a look on Max De Marzi work – http://maxdemarzi.com/category/visualization/ • Use Sylva (for newbies) – http://www.sylvadb.com/