$30 off During Our Annual Pro Sale. View Details »

Warming Up to Graphs

Warming Up to Graphs

From PyOhio 2014

Brad Montgomery

July 27, 2014
Tweet

More Decks by Brad Montgomery

Other Decks in Technology

Transcript

  1. Warming up to Graphs
    Brad Montgomery

    View Slide

  2. What is a Graph?

    View Slide

  3. Nope.
    7%
    8%
    10%
    11%
    29%
    35%

    View Slide

  4. 0
    25
    50
    75
    100
    April May June July
    Nope.

    View Slide

  5. Nope.
    0
    25
    50
    75
    100
    April May June July

    View Slide

  6. Yes.
    A
    B
    C

    View Slide

  7. – paraphrasing Wikipedia
    “A set of objects…connected by links”

    View Slide

  8. Some Graph Theory

    View Slide

  9. Some Graph Theory
    Very Little

    View Slide

  10. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Leonhard_Euler_2.jpg
    Leonhard Euler

    View Slide

  11. http://en.wikipedia.org/wiki/Leonhard_Euler#mediaviewer/File:Konigsberg_bridges.png
    Seven Bridges of Königsberg

    View Slide

  12. C
    A
    B
    This is a Graph

    View Slide

  13. C
    A
    B
    These are Nodes
    (aka vertices or points)

    View Slide

  14. C
    A
    B
    These are Edges
    (aka lines or arcs)

    View Slide

  15. C
    A
    B
    Edges may have an explicit direction.

    View Slide

  16. C
    A
    B
    A Directed Graph (aka Digraph)

    View Slide

  17. C
    A
    B
    This is an Undirected Graph.

    View Slide

  18. C
    A
    B
    Edges may also have weights.
    This becomes a Weighted Graph.
    42
    37

    View Slide

  19. C
    A
    B
    You can traverse a graph.
    This is a Path.

    View Slide

  20. C
    A
    B
    This is a Cycle.

    View Slide

  21. C
    A
    B
    This Graph has no cycles.
    It’s acyclic.

    View Slide

  22. C
    A
    B
    A Directed Acyclic Graph (aka DAG)

    View Slide

  23. C
    A
    B
    This Graph is Connected.

    View Slide

  24. C
    A
    B
    This Graph is not Connected.
    E
    D

    View Slide

  25. C
    A
    B
    A complete graph.

    View Slide

  26. C
    A
    B
    A Tree
    D E

    View Slide

  27. Graph Algorithms

    View Slide

  28. Graph Algorithms
    Wikipedia has 94 pages
    categorized as “Graph Algorithms”

    View Slide

  29. Shortest Path
    Dijkstra’s Algorithm
    A* search algorithm

    View Slide

  30. Search
    Depth-first search
    Breadth-first search
    B-trees

    View Slide

  31. Minimal Spanning Tree
    Kruskal's algorithm
    Prim's algorithm

    View Slide

  32. Graph Tours
    Eulerian Path
    Hamiltonian path/
    Traveling Salesman

    View Slide

  33. networkx.github.io

    View Slide

  34. Why a
    Graph Database?

    View Slide

  35. C
    A
    B
    !=

    View Slide

  36. Neo4j
    • Commercial Software
    • Community Edition is Open Source (GPL/AGPL)
    • Written in Java
    • Has a RESTful API
    • ACID/Transactions/High Availability*
    • Supports Billions of Nodes & Relationships on a single
    machine.

    View Slide

  37. Neo4j
    https://github.com/neo4j/neo4j

    View Slide

  38. How to play with Neo4j
    1. Download the current version (2.1.2)
    2. Unpack the .tar.gz file.
    3. Install Java?
    4. ./neo4j-community-2.1.2/bin/neo4j start
    5. Visit http://localhost:7474/

    View Slide

  39. View Slide

  40. Modeling Data

    View Slide

  41. Modeling Data
    Still called a Node.
    ~ An Entity. With properties.

    View Slide

  42. Modeling Data
    name: Janet
    email: [email protected]

    View Slide

  43. Modeling Data
    User!
    name: Janet
    email: [email protected]
    Nodes can also have Labels.

    View Slide

  44. Modeling Data
    User!
    name: Janet
    email: [email protected]
    Project!
    name: open-unicorn
    website: open-unicorn.org

    View Slide

  45. Modeling Data
    User!
    name: Janet
    email: [email protected]
    Project!
    name: open-unicorn
    website: open-unicorn.org
    Edges are called “Relationships”

    View Slide

  46. Modeling Data
    User!
    name: Janet
    email: [email protected]
    Project!
    name: open-unicorn
    website: open-unicorn.org
    CONTRIBUTES_TO

    View Slide

  47. Modeling Data
    User!
    name: Janet
    email: [email protected]
    Project!
    name: open-unicorn
    website: open-unicorn.org
    CONTRIBUTES_TO
    !
    first_commit: 2014-07-27
    User!
    name: Rose
    email: [email protected]
    OWNED_BY

    View Slide

  48. And now for some
    Python

    View Slide

  49. Several Python Wrappers.
    !
    I’m using py2neo 1.6.4

    View Slide

  50. from py2neo import neo4j!
    !
    # Connect to a DB.!
    db = neo4j.GraphDatabaseService(!
    ‘http://localhost:7474/db/data/'!
    )

    View Slide

  51. from py2neo import node!
    !
    # Create a Node!
    n = node(name="Janet", email="[email protected]")!
    Properties

    View Slide

  52. from py2neo import node!
    !
    # Create a Node!
    n = node(name="Janet", email="[email protected]")!
    An Abstract Node

    View Slide

  53. from py2neo import node, rel!
    !
    db.create(!
    node(name="Janet", email="[email protected]"),!
    node(name="open-unicorn", website="open-unicorn.org"),!
    rel(0, "CONTRIBUTES_TO", 1),!
    )!

    View Slide

  54. from py2neo import node, rel!
    !
    db.create(!
    node(name="Janet", email="[email protected]"),!
    node(name="open-unicorn", website="open-unicorn.org"),!
    rel(0, "CONTRIBUTES_TO", 1),!
    )!
    What?

    View Slide

  55. from py2neo import node, rel!
    !
    # Create some Users!
    user_data = [{'name': 'Janet', 'email': '[email protected]'}]!
    user_nodes = [node(d) for d in user_data] # Abstract Nodes!
    users = db.create(*user_nodes)!
    !
    !
    !
    !

    View Slide

  56. from py2neo import node, rel!
    !
    # Create some Users!
    user_data = [{'name': 'Janet', 'email': '[email protected]'}]!
    user_nodes = [node(d) for d in user_data] # Abstract Nodes!
    users = db.create(*user_nodes)!
    !
    for u in users:!
    u.add_labels("User")!
    !
    Now you can add Labels

    View Slide

  57. from py2neo import node, rel!
    !
    # Create some Projects.!
    project_data = [!
    {'name': 'open-unicorn', 'website': 'open-unicorn.org'}!
    ]!
    project_nodes = [node(d) for d in project_data]!
    projects = db.create(*project_nodes)!
    !
    # Every User contributes to every Project.!
    rels = []!
    for p in projects:!
    p.add_labels("Project")!
    for u in users:!
    rels.append(!
    rel(u, "CONTRIBUTES_TO", p)!
    )!
    !
    # Save the relationships!
    relationships = db.create(*rels)!

    View Slide

  58. # Find a User based on their email!
    users = db.find(!
    “User",!
    property_key=“email",!
    property_value=“[email protected]"!
    )!
    !
    print(users[0])!
    # (1 {'name': ‘Janet',!
    ‘email': '[email protected]'})!

    View Slide

  59. # Find a User based on their email!
    users = db.find(!
    “User",!
    property_key=“email",!
    property_value=“[email protected]"!
    )!
    !
    print(users[0])!
    # (1 {'name': ‘Janet',!
    ‘email': '[email protected]'})!
    Label

    View Slide

  60. # Find a User based on their email!
    users = db.find(!
    “User",!
    property_key=“email",!
    property_value=“[email protected]"!
    )!
    !
    print(users[0])!
    # (1 {'name': ‘Janet',!
    ‘email': '[email protected]'})!
    Nodes get an ID, but don’t rely on it.

    View Slide

  61. # Accessing Node Attributes!
    users = db.find(…)!
    user = users[0]!
    print(user[‘name’]) !
    # Janet

    View Slide

  62. # Access Labels and additional properties!
    print user.get_labels()!
    # {'User'}!
    !
    print user.get_properties()!
    # {'name': 'Janet', 'email': '[email protected]'}

    View Slide

  63. # Accessing Relationships!
    for relationship in user.match_outgoing():!
    print(!
    relationship.type, !
    relationship.end_node[‘name']!
    )!
    # CONTRIBUTES_TO open-unicorn!

    View Slide

  64. # Accessing Relationships!
    for relationship in user.match_outgoing():!
    print(!
    relationship.type, !
    relationship.end_node[‘name']!
    )!
    # CONTRIBUTES_TO open-unicorn!
    User
    Project

    View Slide

  65. for relationship in user.match_incoming():!
    print(!
    relationship.type, !
    relationship.start_node[‘name']!
    )!
    # OWNED_BY open-unicorn

    View Slide

  66. for relationship in user.match_incoming():!
    print(!
    relationship.type, !
    relationship.start_node[‘name']!
    )!
    # OWNED_BY open-unicorn
    User
    Project

    View Slide

  67. for relationship in user.match():!
    print(!
    relationship.start_node['name'],!
    relationship.type,!
    relationship.end_node['name'],!
    )!
    # open-unicorn OWNED_BY Janet!
    # Janet CONTRIBUTES_TO open-unicorn

    View Slide

  68. for relationship in user.match():!
    print(!
    relationship.start_node['name'],!
    relationship.type,!
    relationship.end_node['name'],!
    )!
    # open-unicorn OWNED_BY Janet!
    # Janet CONTRIBUTES_TO open-unicorn
    User
    Project

    View Slide

  69. # Find a project's contributors.!
    # 1) get the project node!
    projects = db.find(!
    "Project",!
    property_key="name",!
    property_value="open-unicorn"!
    )!
    p = projects[0]!
    !
    # 2) list all contributors!
    for r in p.match_incoming(rel_type="CONTRIBUTES_TO"):!
    print(rel.start_node['name']) # Janet!

    View Slide

  70. More Interesting Queries
    • People that contribute to open-unicorn also
    contribute to … ?
    • Who contributes to similar projects as me?
    • Six degrees of Guido van Rossum?

    View Slide

  71. Cypher

    View Slide

  72. Cypher
    • Declarative
    • SQL-like
    • Sometimes *looks* like a graph.

    View Slide

  73. MATCH (n) RETURN n

    View Slide

  74. MATCH (n) RETURN n
    A Node

    View Slide

  75. MATCH (n:User) RETURN n

    View Slide

  76. MATCH (n:User) RETURN n
    A Label

    View Slide

  77. MATCH (n:User)
    WHERE n.name=“Janet”
    RETURN n

    View Slide

  78. MATCH (p)-[:OWNED_BY]->(u)
    RETURN p, u

    View Slide

  79. MATCH (p)-[:OWNED_BY]->(u)
    RETURN p, u
    A Relationship

    View Slide

  80. MATCH (p)-[:OWNED_BY]->(u)
    RETURN p, u

    View Slide

  81. View Slide

  82. MATCH
    (u:User)-[:CONTRIBUTES_TO]->(p:Project)
    WHERE u.name="Janet"
    RETURN p.name
    ORDER BY p.name

    View Slide

  83. from py2neo import cypher!
    !
    # Create a Transaction.!
    session = cypher.Session(!
    ‘http://localhost:7474/db/data/'!
    )!
    tx = session.create_transaction()

    View Slide

  84. query = """!
    MATCH (n:User) !
    WHERE n.username={name} !
    RETURN n!
    """!
    tx.append(!
    query,!
    parameters={‘name’:‘Janet’}!
    )!
    results = tx.commit()

    View Slide

  85. query = """!
    MATCH (n:User) !
    WHERE n.username={name} !
    RETURN n!
    """!
    tx.append(!
    query,!
    parameters={‘name’:‘Janet’}!
    )!
    results = tx.commit()
    Parameter
    Substitution

    View Slide

  86. # Returns a list of Records for each query.!
    [!
    [!
    Record(!
    columns=('n',),!
    values=(Node(), )!
    )!
    ],!
    ]!

    View Slide

  87. # Returns a list of Records for each query.!
    [!
    [!
    Record(!
    columns=('n',),!
    values=(Node(), )!
    )!
    ],!
    ]!
    May contain Nodes and Relationships

    View Slide

  88. People that contribute
    to open-unicorn also
    contribute to…

    View Slide

  89. query = """!
    MATCH!
    (p:project)<-[:CONTRIBUTES_TO]-(u:user)!
    -[:CONTRIBUTES_TO]->(o:project)!
    WHERE p.name={name}!
    RETURN o.name, count(*)!
    ORDER BY count(*) DESC, o.name!
    LIMIT {limit}!
    """!
    # tx is a transaction object!
    tx.append(!
    query,!
    parameters={"name": "open-unicorn", "limit": 5}!
    )!
    results = tx.commit()!
    for record in results[0]:!
    name, count = record.values!
    print("({0}) {1}".format(count, name))!

    View Slide

  90. # o.name count(*)!
    # --------------------------!
    # open-jackrabbit 6!
    # flailing-jackrabbit 5!
    # secret-butterfly 5!
    # tiny-armyant 5!
    # flaming-butterfly 3

    View Slide

  91. Who contributes to
    similar projects?

    View Slide

  92. # People who contribute to similar projects as Janet!
    query = """!
    MATCH!
    (a:user)-[:CONTRIBUTES_TO]->(p:project)!
    -[:OWNED_BY]->(u)!
    -[:CONTRIBUTES_TO]->(x:project)!
    <-[:CONTRIBUTES_TO]-(people)!
    WHERE a.username={name} AND NOT a=people!
    RETURN people.name AS name, count(*) AS similar_contribs!
    ORDER BY similar_contribs DESC!
    """!
    # tx is a transaction object!
    tx.append(!
    query,!
    parameters={"name":"Janet", "limit": 5}!
    )!
    results = tx.commit()!
    for record in results[0]:!
    name, count = record.values!
    print("{0} {1}".format(name, count))

    View Slide

  93. # people.name count(*)!
    # ---------------------------!
    # Bridget Betty 33!
    # Donald Catherine 33!
    # Donald Bob 30!
    # Frank Chuck 28!
    # Bob Brad 27

    View Slide

  94. How am I connected to
    Guido Van Rossum?

    View Slide

  95. # Path between two Users!
    query = """!
    MATCH!
    (a:user), (b:user),!
    p=shortestPath((a)-[]->(b))!
    WHERE a.name={name_a} AND b.name={name_b}!
    RETURN LENGTH(p), p!
    """!
    params = {'name_a': 'Janet' 'name_b': 'Daisy'}!
    tx.append(query, parameters=params)!
    results = tx.commit()!
    for record in results[0]:!
    length, path = record.values!
    print("{0} hops".format(length))!
    for rel in path.relationships:!
    print("({0})-[:{1}]->({2})".format(!
    rel.start_node['name'],!
    rel.type,!
    rel.end_node['name']!
    ))

    View Slide

  96. # 6 hops.!
    # (Janet)-[:CONTRIBUTES_TO]->(enterprise-grasshopper)!
    # (enterprise-grasshopper)-[:OWNED_BY]->(Zoe)!
    # (Zoe)-[:CONTRIBUTES_TO]->(open-turtledove)!
    # (open-turtledove)-[:OWNED_BY]->(Delia)!
    # (Delia)-[:CONTRIBUTES_TO]->(flailing-sealion)!
    # (flailing-sealion)-[:OWNED_BY]->(Daisy)!

    View Slide

  97. Object-Graph
    Mapping (ogm)

    View Slide

  98. !
    class User(object):!
    !
    def __init__(self, name=None, email=None):!
    self.name = name!
    self.email = email!
    !
    !
    class Project(object):!
    !
    def __init__(self, name=None):!
    self.name = name!

    View Slide

  99. from py2neo import ogm!
    !
    # Create a User & Project!
    store = ogm.Store(db)!
    u = User("Janet", "[email protected]")!
    p = Project(“open-unicorn")!
    !
    store.save_unique("User", "email", u.email, u)!
    store.save_unique("project", "name", p.name, p)!
    store.relate(u, ”CONTRIBUTES_TO”, p)!
    store.save(u)!

    View Slide

  100. # Retrieve a User!
    store = ogm.Store(db)!
    u = store.load_unique(!
    "User", "email","[email protected]", User!
    )!
    p = store.load_unique(!
    "Project", "name", "massive-aardvark", Project!
    )!
    !
    # Get some relationships!
    contribs = store.load_related(!
    u, "CONTRIBUTES_TO", Project!
    )!

    View Slide

  101. # User!
    u.__node__!
    !
    # A Dictionary of Outgoing Relationships!
    u.__rel__!
    !

    View Slide

  102. Performance.

    View Slide

  103. Performance
    RDBMs Index Lookup: O(log n)
    Neo4j: Immediate Relationships: O(1)

    View Slide

  104. Performance
    RDBMs Traversals: O(m log n)
    Neo4j: Traversals: O(m)

    View Slide

  105. Scenarios
    When is a GraphDB the right tool?

    View Slide

  106. Social Graphs

    View Slide

  107. Fraud Detection

    View Slide

  108. Recommendations

    View Slide

  109. Dependencies

    View Slide

  110. Graph-Like Data
    (Trees)

    View Slide

  111. When is it a Bad Idea?

    View Slide

  112. Write-Heavy

    View Slide

  113. Tabular Data?

    View Slide

  114. When Postgres Works

    View Slide

  115. More Python!

    View Slide

  116. More Python
    • Neomodel - Neo4j Models for Django (build on
    py2neo)
    • neo4django - Neo4j Models for Django
    • bulbflow - Neo4j, OrientDB, Titan
    • neo4j-rest-client - Nice API. Active development.
    • py2neo - Undergoing a major rewrite.

    View Slide

  117. View Slide

  118. (me)-[:THANKS]->(you)

    View Slide

  119. (you)-[:QUESTIONS]->(me)

    View Slide