Warming Up to Graphs

Warming Up to Graphs

From PyOhio 2014

D57aec10399cbb252bd890c2bb3fe1c9?s=128

Brad Montgomery

July 27, 2014
Tweet

Transcript

  1. Warming up to Graphs Brad Montgomery

  2. What is a Graph?

  3. Nope. 7% 8% 10% 11% 29% 35%

  4. 0 25 50 75 100 April May June July Nope.

  5. Nope. 0 25 50 75 100 April May June July

  6. Yes. A B C

  7. – paraphrasing Wikipedia “A set of objects…connected by links”

  8. Some Graph Theory

  9. Some Graph Theory Very Little

  10. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Leonhard_Euler_2.jpg Leonhard Euler

  11. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Konigsberg_bridges.png Seven Bridges of Königsberg

  12. C A B This is a Graph

  13. C A B These are Nodes (aka vertices or points)

  14. C A B These are Edges (aka lines or arcs)

  15. C A B Edges may have an explicit direction.

  16. C A B A Directed Graph (aka Digraph)

  17. C A B This is an Undirected Graph.

  18. C A B Edges may also have weights. This becomes

    a Weighted Graph. 42 37
  19. C A B You can traverse a graph. This is

    a Path.
  20. C A B This is a Cycle.

  21. C A B This Graph has no cycles. It’s acyclic.

  22. C A B A Directed Acyclic Graph (aka DAG)

  23. C A B This Graph is Connected.

  24. C A B This Graph is not Connected. E D

  25. C A B A complete graph.

  26. C A B A Tree D E

  27. Graph Algorithms

  28. Graph Algorithms Wikipedia has 94 pages categorized as “Graph Algorithms”

  29. Shortest Path Dijkstra’s Algorithm A* search algorithm

  30. Search Depth-first search Breadth-first search B-trees

  31. Minimal Spanning Tree Kruskal's algorithm Prim's algorithm

  32. Graph Tours Eulerian Path Hamiltonian path/ Traveling Salesman

  33. networkx.github.io

  34. Why a Graph Database?

  35. C A B !=

  36. Neo4j • Commercial Software • Community Edition is Open Source

    (GPL/AGPL) • Written in Java • Has a RESTful API • ACID/Transactions/High Availability* • Supports Billions of Nodes & Relationships on a single machine.
  37. Neo4j https://github.com/neo4j/neo4j

  38. How to play with Neo4j 1. Download the current version

    (2.1.2) 2. Unpack the .tar.gz file. 3. Install Java? 4. ./neo4j-community-2.1.2/bin/neo4j start 5. Visit http://localhost:7474/
  39. None
  40. Modeling Data

  41. Modeling Data Still called a Node. ~ An Entity. With

    properties.
  42. Modeling Data name: Janet email: janet@example.com

  43. Modeling Data User! name: Janet email: janet@example.com Nodes can also

    have Labels.
  44. Modeling Data User! name: Janet email: janet@example.com Project! name: open-unicorn

    website: open-unicorn.org
  45. Modeling Data User! name: Janet email: janet@example.com Project! name: open-unicorn

    website: open-unicorn.org Edges are called “Relationships”
  46. Modeling Data User! name: Janet email: janet@example.com Project! name: open-unicorn

    website: open-unicorn.org CONTRIBUTES_TO
  47. Modeling Data User! name: Janet email: janet@example.com Project! name: open-unicorn

    website: open-unicorn.org CONTRIBUTES_TO ! first_commit: 2014-07-27 User! name: Rose email: rose@example.com OWNED_BY
  48. And now for some Python

  49. Several Python Wrappers. ! I’m using py2neo 1.6.4

  50. from py2neo import neo4j! ! # Connect to a DB.!

    db = neo4j.GraphDatabaseService(! ‘http://localhost:7474/db/data/'! )
  51. from py2neo import node! ! # Create a Node! n

    = node(name="Janet", email="janet@example.com")! Properties
  52. from py2neo import node! ! # Create a Node! n

    = node(name="Janet", email="janet@example.com")! An Abstract Node
  53. from py2neo import node, rel! ! db.create(! node(name="Janet", email="janet@example.com"),! node(name="open-unicorn",

    website="open-unicorn.org"),! rel(0, "CONTRIBUTES_TO", 1),! )!
  54. from py2neo import node, rel! ! db.create(! node(name="Janet", email="janet@example.com"),! node(name="open-unicorn",

    website="open-unicorn.org"),! rel(0, "CONTRIBUTES_TO", 1),! )! What?
  55. from py2neo import node, rel! ! # Create some Users!

    user_data = [{'name': 'Janet', 'email': 'janet@example.com'}]! user_nodes = [node(d) for d in user_data] # Abstract Nodes! users = db.create(*user_nodes)! ! ! ! !
  56. from py2neo import node, rel! ! # Create some Users!

    user_data = [{'name': 'Janet', 'email': 'janet@example.com'}]! user_nodes = [node(d) for d in user_data] # Abstract Nodes! users = db.create(*user_nodes)! ! for u in users:! u.add_labels("User")! ! Now you can add Labels
  57. from py2neo import node, rel! ! # Create some Projects.!

    project_data = [! {'name': 'open-unicorn', 'website': 'open-unicorn.org'}! ]! project_nodes = [node(d) for d in project_data]! projects = db.create(*project_nodes)! ! # Every User contributes to every Project.! rels = []! for p in projects:! p.add_labels("Project")! for u in users:! rels.append(! rel(u, "CONTRIBUTES_TO", p)! )! ! # Save the relationships! relationships = db.create(*rels)!
  58. # Find a User based on their email! users =

    db.find(! “User",! property_key=“email",! property_value=“janet@example.com"! )! ! print(users[0])! # (1 {'name': ‘Janet',! ‘email': 'janet@example.com'})!
  59. # Find a User based on their email! users =

    db.find(! “User",! property_key=“email",! property_value=“janet@example.com"! )! ! print(users[0])! # (1 {'name': ‘Janet',! ‘email': 'janet@example.com'})! Label
  60. # Find a User based on their email! users =

    db.find(! “User",! property_key=“email",! property_value=“janet@example.com"! )! ! print(users[0])! # (1 {'name': ‘Janet',! ‘email': 'janet@example.com'})! Nodes get an ID, but don’t rely on it.
  61. # Accessing Node Attributes! users = db.find(…)! user = users[0]!

    print(user[‘name’]) ! # Janet
  62. # Access Labels and additional properties! print user.get_labels()! # {'User'}!

    ! print user.get_properties()! # {'name': 'Janet', 'email': 'janet@example.com'}
  63. # Accessing Relationships! for relationship in user.match_outgoing():! print(! relationship.type, !

    relationship.end_node[‘name']! )! # CONTRIBUTES_TO open-unicorn!
  64. # Accessing Relationships! for relationship in user.match_outgoing():! print(! relationship.type, !

    relationship.end_node[‘name']! )! # CONTRIBUTES_TO open-unicorn! User Project
  65. for relationship in user.match_incoming():! print(! relationship.type, ! relationship.start_node[‘name']! )! #

    OWNED_BY open-unicorn
  66. for relationship in user.match_incoming():! print(! relationship.type, ! relationship.start_node[‘name']! )! #

    OWNED_BY open-unicorn User Project
  67. for relationship in user.match():! print(! relationship.start_node['name'],! relationship.type,! relationship.end_node['name'],! )! #

    open-unicorn OWNED_BY Janet! # Janet CONTRIBUTES_TO open-unicorn
  68. for relationship in user.match():! print(! relationship.start_node['name'],! relationship.type,! relationship.end_node['name'],! )! #

    open-unicorn OWNED_BY Janet! # Janet CONTRIBUTES_TO open-unicorn User Project
  69. # Find a project's contributors.! # 1) get the project

    node! projects = db.find(! "Project",! property_key="name",! property_value="open-unicorn"! )! p = projects[0]! ! # 2) list all contributors! for r in p.match_incoming(rel_type="CONTRIBUTES_TO"):! print(rel.start_node['name']) # Janet!
  70. More Interesting Queries • People that contribute to open-unicorn also

    contribute to … ? • Who contributes to similar projects as me? • Six degrees of Guido van Rossum?
  71. Cypher

  72. Cypher • Declarative • SQL-like • Sometimes *looks* like a

    graph.
  73. MATCH (n) RETURN n

  74. MATCH (n) RETURN n A Node

  75. MATCH (n:User) RETURN n

  76. MATCH (n:User) RETURN n A Label

  77. MATCH (n:User) WHERE n.name=“Janet” RETURN n

  78. MATCH (p)-[:OWNED_BY]->(u) RETURN p, u

  79. MATCH (p)-[:OWNED_BY]->(u) RETURN p, u A Relationship

  80. MATCH (p)-[:OWNED_BY]->(u) RETURN p, u

  81. None
  82. MATCH (u:User)-[:CONTRIBUTES_TO]->(p:Project) WHERE u.name="Janet" RETURN p.name ORDER BY p.name

  83. from py2neo import cypher! ! # Create a Transaction.! session

    = cypher.Session(! ‘http://localhost:7474/db/data/'! )! tx = session.create_transaction()
  84. query = """! MATCH (n:User) ! WHERE n.username={name} ! RETURN

    n! """! tx.append(! query,! parameters={‘name’:‘Janet’}! )! results = tx.commit()
  85. query = """! MATCH (n:User) ! WHERE n.username={name} ! RETURN

    n! """! tx.append(! query,! parameters={‘name’:‘Janet’}! )! results = tx.commit() Parameter Substitution
  86. # Returns a list of Records for each query.! [!

    [! Record(! columns=('n',),! values=(Node(), )! )! ],! ]!
  87. # Returns a list of Records for each query.! [!

    [! Record(! columns=('n',),! values=(Node(), )! )! ],! ]! May contain Nodes and Relationships
  88. People that contribute to open-unicorn also contribute to…

  89. query = """! MATCH! (p:project)<-[:CONTRIBUTES_TO]-(u:user)! -[:CONTRIBUTES_TO]->(o:project)! WHERE p.name={name}! RETURN o.name,

    count(*)! ORDER BY count(*) DESC, o.name! LIMIT {limit}! """! # tx is a transaction object! tx.append(! query,! parameters={"name": "open-unicorn", "limit": 5}! )! results = tx.commit()! for record in results[0]:! name, count = record.values! print("({0}) {1}".format(count, name))!
  90. # o.name count(*)! # --------------------------! # open-jackrabbit 6! # flailing-jackrabbit

    5! # secret-butterfly 5! # tiny-armyant 5! # flaming-butterfly 3
  91. Who contributes to similar projects?

  92. # People who contribute to similar projects as Janet! query

    = """! MATCH! (a:user)-[:CONTRIBUTES_TO]->(p:project)! -[:OWNED_BY]->(u)! -[:CONTRIBUTES_TO]->(x:project)! <-[:CONTRIBUTES_TO]-(people)! WHERE a.username={name} AND NOT a=people! RETURN people.name AS name, count(*) AS similar_contribs! ORDER BY similar_contribs DESC! """! # tx is a transaction object! tx.append(! query,! parameters={"name":"Janet", "limit": 5}! )! results = tx.commit()! for record in results[0]:! name, count = record.values! print("{0} {1}".format(name, count))
  93. # people.name count(*)! # ---------------------------! # Bridget Betty 33! #

    Donald Catherine 33! # Donald Bob 30! # Frank Chuck 28! # Bob Brad 27
  94. How am I connected to Guido Van Rossum?

  95. # Path between two Users! query = """! MATCH! (a:user),

    (b:user),! p=shortestPath((a)-[]->(b))! WHERE a.name={name_a} AND b.name={name_b}! RETURN LENGTH(p), p! """! params = {'name_a': 'Janet' 'name_b': 'Daisy'}! tx.append(query, parameters=params)! results = tx.commit()! for record in results[0]:! length, path = record.values! print("{0} hops".format(length))! for rel in path.relationships:! print("({0})-[:{1}]->({2})".format(! rel.start_node['name'],! rel.type,! rel.end_node['name']! ))
  96. # 6 hops.! # (Janet)-[:CONTRIBUTES_TO]->(enterprise-grasshopper)! # (enterprise-grasshopper)-[:OWNED_BY]->(Zoe)! # (Zoe)-[:CONTRIBUTES_TO]->(open-turtledove)! #

    (open-turtledove)-[:OWNED_BY]->(Delia)! # (Delia)-[:CONTRIBUTES_TO]->(flailing-sealion)! # (flailing-sealion)-[:OWNED_BY]->(Daisy)!
  97. Object-Graph Mapping (ogm)

  98. ! class User(object):! ! def __init__(self, name=None, email=None):! self.name =

    name! self.email = email! ! ! class Project(object):! ! def __init__(self, name=None):! self.name = name!
  99. from py2neo import ogm! ! # Create a User &

    Project! store = ogm.Store(db)! u = User("Janet", "janet@example.com")! p = Project(“open-unicorn")! ! store.save_unique("User", "email", u.email, u)! store.save_unique("project", "name", p.name, p)! store.relate(u, ”CONTRIBUTES_TO”, p)! store.save(u)!
  100. # Retrieve a User! store = ogm.Store(db)! u = store.load_unique(!

    "User", "email","janet@example.com", User! )! p = store.load_unique(! "Project", "name", "massive-aardvark", Project! )! ! # Get some relationships! contribs = store.load_related(! u, "CONTRIBUTES_TO", Project! )!
  101. # User! u.__node__! ! # A Dictionary of Outgoing Relationships!

    u.__rel__! !
  102. Performance.

  103. Performance RDBMs Index Lookup: O(log n) Neo4j: Immediate Relationships: O(1)

  104. Performance RDBMs Traversals: O(m log n) Neo4j: Traversals: O(m)

  105. Scenarios When is a GraphDB the right tool?

  106. Social Graphs

  107. Fraud Detection

  108. Recommendations

  109. Dependencies

  110. Graph-Like Data (Trees)

  111. When is it a Bad Idea?

  112. Write-Heavy

  113. Tabular Data?

  114. When Postgres Works

  115. More Python!

  116. More Python • Neomodel - Neo4j Models for Django (build

    on py2neo) • neo4django - Neo4j Models for Django • bulbflow - Neo4j, OrientDB, Titan • neo4j-rest-client - Nice API. Active development. • py2neo - Undergoing a major rewrite.
  117. None
  118. (me)-[:THANKS]->(you)

  119. (you)-[:QUESTIONS]->(me)