July 27, 2014
440

# Warming Up to Graphs

From PyOhio 2014

July 27, 2014

## Transcript

1. Warming up to Graphs

2. What is a Graph?

3. Nope.
7%
8%
10%
11%
29%
35%

4. 0
25
50
75
100
April May June July
Nope.

5. Nope.
0
25
50
75
100
April May June July

6. Yes.
A
B
C

7. – paraphrasing Wikipedia
“A set of objects…connected by links”

8. Some Graph Theory

9. Some Graph Theory
Very Little

10. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Leonhard_Euler_2.jpg
Leonhard Euler

11. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Konigsberg_bridges.png
Seven Bridges of Königsberg

12. C
A
B
This is a Graph

13. C
A
B
These are Nodes
(aka vertices or points)

14. C
A
B
These are Edges
(aka lines or arcs)

15. C
A
B
Edges may have an explicit direction.

16. C
A
B
A Directed Graph (aka Digraph)

17. C
A
B
This is an Undirected Graph.

18. C
A
B
Edges may also have weights.
This becomes a Weighted Graph.
42
37

19. C
A
B
You can traverse a graph.
This is a Path.

20. C
A
B
This is a Cycle.

21. C
A
B
This Graph has no cycles.
It’s acyclic.

22. C
A
B
A Directed Acyclic Graph (aka DAG)

23. C
A
B
This Graph is Connected.

24. C
A
B
This Graph is not Connected.
E
D

25. C
A
B
A complete graph.

26. C
A
B
A Tree
D E

27. Graph Algorithms

28. Graph Algorithms
Wikipedia has 94 pages
categorized as “Graph Algorithms”

29. Shortest Path
Dijkstra’s Algorithm
A* search algorithm

30. Search
Depth-ﬁrst search
B-trees

31. Minimal Spanning Tree
Kruskal's algorithm
Prim's algorithm

32. Graph Tours
Eulerian Path
Hamiltonian path/
Traveling Salesman

33. networkx.github.io

34. Why a
Graph Database?

35. C
A
B
!=

36. Neo4j
• Commercial Software
• Community Edition is Open Source (GPL/AGPL)
• Written in Java
• Has a RESTful API
• ACID/Transactions/High Availability*
• Supports Billions of Nodes & Relationships on a single
machine.

37. Neo4j
https://github.com/neo4j/neo4j

38. How to play with Neo4j
2. Unpack the .tar.gz ﬁle.
3. Install Java?
4. ./neo4j-community-2.1.2/bin/neo4j start
5. Visit http://localhost:7474/

39. Modeling Data

40. Modeling Data
Still called a Node.
~ An Entity. With properties.

41. Modeling Data
name: Janet
email: [email protected]

42. Modeling Data
User!
name: Janet
email: [email protected]
Nodes can also have Labels.

43. Modeling Data
User!
name: Janet
email: [email protected]
Project!
name: open-unicorn
website: open-unicorn.org

44. Modeling Data
User!
name: Janet
email: [email protected]
Project!
name: open-unicorn
website: open-unicorn.org
Edges are called “Relationships”

45. Modeling Data
User!
name: Janet
email: [email protected]
Project!
name: open-unicorn
website: open-unicorn.org
CONTRIBUTES_TO

46. Modeling Data
User!
name: Janet
email: [email protected]
Project!
name: open-unicorn
website: open-unicorn.org
CONTRIBUTES_TO
!
ﬁrst_commit: 2014-07-27
User!
name: Rose
email: [email protected]
OWNED_BY

47. And now for some
Python

48. Several Python Wrappers.
!
I’m using py2neo 1.6.4

49. from py2neo import neo4j!
!
# Connect to a DB.!
db = neo4j.GraphDatabaseService(!
‘http://localhost:7474/db/data/'!
)

50. from py2neo import node!
!
# Create a Node!
n = node(name="Janet", email="[email protected]")!
Properties

51. from py2neo import node!
!
# Create a Node!
n = node(name="Janet", email="[email protected]")!
An Abstract Node

52. from py2neo import node, rel!
!
db.create(!
node(name="Janet", email="[email protected]"),!
node(name="open-unicorn", website="open-unicorn.org"),!
rel(0, "CONTRIBUTES_TO", 1),!
)!

53. from py2neo import node, rel!
!
db.create(!
node(name="Janet", email="[email protected]"),!
node(name="open-unicorn", website="open-unicorn.org"),!
rel(0, "CONTRIBUTES_TO", 1),!
)!
What?

54. from py2neo import node, rel!
!
# Create some Users!
user_data = [{'name': 'Janet', 'email': '[email protected]'}]!
user_nodes = [node(d) for d in user_data] # Abstract Nodes!
users = db.create(*user_nodes)!
!
!
!
!

55. from py2neo import node, rel!
!
# Create some Users!
user_data = [{'name': 'Janet', 'email': '[email protected]'}]!
user_nodes = [node(d) for d in user_data] # Abstract Nodes!
users = db.create(*user_nodes)!
!
for u in users:!
!

56. from py2neo import node, rel!
!
# Create some Projects.!
project_data = [!
{'name': 'open-unicorn', 'website': 'open-unicorn.org'}!
]!
project_nodes = [node(d) for d in project_data]!
projects = db.create(*project_nodes)!
!
# Every User contributes to every Project.!
rels = []!
for p in projects:!
for u in users:!
rels.append(!
rel(u, "CONTRIBUTES_TO", p)!
)!
!
# Save the relationships!
relationships = db.create(*rels)!

57. # Find a User based on their email!
users = db.find(!
“User",!
property_key=“email",!
property_value=“[email protected]"!
)!
!
print(users[0])!
# (1 {'name': ‘Janet',!
‘email': '[email protected]'})!

58. # Find a User based on their email!
users = db.find(!
“User",!
property_key=“email",!
property_value=“[email protected]"!
)!
!
print(users[0])!
# (1 {'name': ‘Janet',!
‘email': '[email protected]'})!
Label

59. # Find a User based on their email!
users = db.find(!
“User",!
property_key=“email",!
property_value=“[email protected]"!
)!
!
print(users[0])!
# (1 {'name': ‘Janet',!
‘email': '[email protected]'})!
Nodes get an ID, but don’t rely on it.

60. # Accessing Node Attributes!
users = db.find(…)!
user = users[0]!
print(user[‘name’]) !
# Janet

61. # Access Labels and additional properties!
print user.get_labels()!
# {'User'}!
!
print user.get_properties()!
# {'name': 'Janet', 'email': '[email protected]'}

62. # Accessing Relationships!
for relationship in user.match_outgoing():!
print(!
relationship.type, !
relationship.end_node[‘name']!
)!
# CONTRIBUTES_TO open-unicorn!

63. # Accessing Relationships!
for relationship in user.match_outgoing():!
print(!
relationship.type, !
relationship.end_node[‘name']!
)!
# CONTRIBUTES_TO open-unicorn!
User
Project

64. for relationship in user.match_incoming():!
print(!
relationship.type, !
relationship.start_node[‘name']!
)!
# OWNED_BY open-unicorn

65. for relationship in user.match_incoming():!
print(!
relationship.type, !
relationship.start_node[‘name']!
)!
# OWNED_BY open-unicorn
User
Project

66. for relationship in user.match():!
print(!
relationship.start_node['name'],!
relationship.type,!
relationship.end_node['name'],!
)!
# open-unicorn OWNED_BY Janet!
# Janet CONTRIBUTES_TO open-unicorn

67. for relationship in user.match():!
print(!
relationship.start_node['name'],!
relationship.type,!
relationship.end_node['name'],!
)!
# open-unicorn OWNED_BY Janet!
# Janet CONTRIBUTES_TO open-unicorn
User
Project

68. # Find a project's contributors.!
# 1) get the project node!
projects = db.find(!
"Project",!
property_key="name",!
property_value="open-unicorn"!
)!
p = projects[0]!
!
# 2) list all contributors!
for r in p.match_incoming(rel_type="CONTRIBUTES_TO"):!
print(rel.start_node['name']) # Janet!

69. More Interesting Queries
• People that contribute to open-unicorn also
contribute to … ?
• Who contributes to similar projects as me?
• Six degrees of Guido van Rossum?

70. Cypher

71. Cypher
• Declarative
• SQL-like
• Sometimes *looks* like a graph.

72. MATCH (n) RETURN n

73. MATCH (n) RETURN n
A Node

74. MATCH (n:User) RETURN n

75. MATCH (n:User) RETURN n
A Label

76. MATCH (n:User)
WHERE n.name=“Janet”
RETURN n

77. MATCH (p)-[:OWNED_BY]->(u)
RETURN p, u

78. MATCH (p)-[:OWNED_BY]->(u)
RETURN p, u
A Relationship

79. MATCH (p)-[:OWNED_BY]->(u)
RETURN p, u

80. MATCH
(u:User)-[:CONTRIBUTES_TO]->(p:Project)
WHERE u.name="Janet"
RETURN p.name
ORDER BY p.name

81. from py2neo import cypher!
!
# Create a Transaction.!
session = cypher.Session(!
‘http://localhost:7474/db/data/'!
)!
tx = session.create_transaction()

82. query = """!
MATCH (n:User) !
RETURN n!
"""!
tx.append(!
query,!
parameters={‘name’:‘Janet’}!
)!
results = tx.commit()

83. query = """!
MATCH (n:User) !
RETURN n!
"""!
tx.append(!
query,!
parameters={‘name’:‘Janet’}!
)!
results = tx.commit()
Parameter
Substitution

84. # Returns a list of Records for each query.!
[!
[!
Record(!
columns=('n',),!
values=(Node(), )!
)!
],!
]!

85. # Returns a list of Records for each query.!
[!
[!
Record(!
columns=('n',),!
values=(Node(), )!
)!
],!
]!
May contain Nodes and Relationships

86. People that contribute
to open-unicorn also
contribute to…

87. query = """!
MATCH!
(p:project)<-[:CONTRIBUTES_TO]-(u:user)!
-[:CONTRIBUTES_TO]->(o:project)!
WHERE p.name={name}!
RETURN o.name, count(*)!
ORDER BY count(*) DESC, o.name!
LIMIT {limit}!
"""!
# tx is a transaction object!
tx.append(!
query,!
parameters={"name": "open-unicorn", "limit": 5}!
)!
results = tx.commit()!
for record in results[0]:!
name, count = record.values!
print("({0}) {1}".format(count, name))!

88. # o.name count(*)!
# --------------------------!
# open-jackrabbit 6!
# flailing-jackrabbit 5!
# secret-butterfly 5!
# tiny-armyant 5!
# flaming-butterfly 3

89. Who contributes to
similar projects?

90. # People who contribute to similar projects as Janet!
query = """!
MATCH!
(a:user)-[:CONTRIBUTES_TO]->(p:project)!
-[:OWNED_BY]->(u)!
-[:CONTRIBUTES_TO]->(x:project)!
<-[:CONTRIBUTES_TO]-(people)!
RETURN people.name AS name, count(*) AS similar_contribs!
ORDER BY similar_contribs DESC!
"""!
# tx is a transaction object!
tx.append(!
query,!
parameters={"name":"Janet", "limit": 5}!
)!
results = tx.commit()!
for record in results[0]:!
name, count = record.values!
print("{0} {1}".format(name, count))

91. # people.name count(*)!
# ---------------------------!
# Bridget Betty 33!
# Donald Catherine 33!
# Donald Bob 30!
# Frank Chuck 28!

92. How am I connected to
Guido Van Rossum?

93. # Path between two Users!
query = """!
MATCH!
(a:user), (b:user),!
p=shortestPath((a)-[]->(b))!
WHERE a.name={name_a} AND b.name={name_b}!
RETURN LENGTH(p), p!
"""!
params = {'name_a': 'Janet' 'name_b': 'Daisy'}!
tx.append(query, parameters=params)!
results = tx.commit()!
for record in results[0]:!
length, path = record.values!
print("{0} hops".format(length))!
for rel in path.relationships:!
print("({0})-[:{1}]->({2})".format(!
rel.start_node['name'],!
rel.type,!
rel.end_node['name']!
))

94. # 6 hops.!
# (Janet)-[:CONTRIBUTES_TO]->(enterprise-grasshopper)!
# (enterprise-grasshopper)-[:OWNED_BY]->(Zoe)!
# (Zoe)-[:CONTRIBUTES_TO]->(open-turtledove)!
# (open-turtledove)-[:OWNED_BY]->(Delia)!
# (Delia)-[:CONTRIBUTES_TO]->(flailing-sealion)!
# (flailing-sealion)-[:OWNED_BY]->(Daisy)!

95. Object-Graph
Mapping (ogm)

96. !
class User(object):!
!
def __init__(self, name=None, email=None):!
self.name = name!
self.email = email!
!
!
class Project(object):!
!
def __init__(self, name=None):!
self.name = name!

97. from py2neo import ogm!
!
# Create a User & Project!
store = ogm.Store(db)!
u = User("Janet", "[email protected]")!
p = Project(“open-unicorn")!
!
store.save_unique("User", "email", u.email, u)!
store.save_unique("project", "name", p.name, p)!
store.relate(u, ”CONTRIBUTES_TO”, p)!
store.save(u)!

98. # Retrieve a User!
store = ogm.Store(db)!
"User", "email","[email protected]", User!
)!
"Project", "name", "massive-aardvark", Project!
)!
!
# Get some relationships!
u, "CONTRIBUTES_TO", Project!
)!

99. # User!
u.__node__!
!
# A Dictionary of Outgoing Relationships!
u.__rel__!
!

100. Performance.

101. Performance
RDBMs Index Lookup: O(log n)
Neo4j: Immediate Relationships: O(1)

102. Performance
RDBMs Traversals: O(m log n)
Neo4j: Traversals: O(m)

103. Scenarios
When is a GraphDB the right tool?

104. Social Graphs

105. Fraud Detection

106. Recommendations

107. Dependencies

108. Graph-Like Data
(Trees)

109. When is it a Bad Idea?

110. Write-Heavy

111. Tabular Data?

112. When Postgres Works

113. More Python!

114. More Python
• Neomodel - Neo4j Models for Django (build on
py2neo)
• neo4django - Neo4j Models for Django
• bulbﬂow - Neo4j, OrientDB, Titan
• neo4j-rest-client - Nice API. Active development.
• py2neo - Undergoing a major rewrite.

115. (me)-[:THANKS]->(you)

116. (you)-[:QUESTIONS]->(me)