Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stefan Armbruster on Graph Modelling Antipatterns

Stefan Armbruster on Graph Modelling Antipatterns

More Decks by Enterprise Java User Group Austria

Other Decks in Technology

Transcript

  1. Graph Database
    Prototyping
    @
    eJUG Austria meetup

    View full-size slide

  2. Agenda for Tonight
    • Building a Graph Database Prototype
    • 3 parts
    – Graph database & modeling concepts
    – Prototyping tools & import
    – Graph querying with Cypher

    View full-size slide

  3. Data Modeling With Neo4j

    View full-size slide

  4. Topics
    • Graph model building blocks
    • Quick intro to Cypher
    • Example modeling process
    • Modeling tips
    • Recipes for common modeling scenarios
    • Refactoring
    • Test-driven data modeling

    View full-size slide

  5. Graph Model Building Blocks

    View full-size slide

  6. Property Graph Data Model

    View full-size slide

  7. Four Building Blocks
    • Nodes
    • Relationships
    • Properties
    • Labels

    View full-size slide

  8. Nodes
    • Used to represent entities and complex value
    types in your domain
    • Can contain properties
    – Used to represent entity attributes and/or
    metadata (e.g. timestamps, version)
    – Key-value pairs
    • Java primitives
    • Arrays
    • null is not a valid value
    – Every node can have different properties

    View full-size slide

  9. Entities and Value Types
    • Entities
    – Have unique conceptual identity
    – Change attribute values, but identity remains the
    same
    • Value types
    – No conceptual identity
    – Can substitute for each other if they have the
    same value
    • Simple: single value (e.g. colour, category)
    • Complex: multiple attributes (e.g. address)

    View full-size slide

  10. Relationships

    View full-size slide

  11. Relationships
    • Every relationship has a name and a direction
    – Add structure to the graph
    – Provide semantic context for nodes
    • Can contain properties
    – Used to represent quality or weight of
    relationship, or metadata
    • Every relationship must have a start node and
    end node
    – No dangling relationships

    View full-size slide

  12. Relationships (continued)
    Nodes can have more
    than one relationship
    Self relationships are allowed
    Nodes can be connected by
    more than one relationship

    View full-size slide

  13. Variable Structure
    • Relationships are defined with regard to node
    instances, not classes of nodes
    – Two nodes representing the same kind of “thing”
    can be connected in very different ways
    • Allows for structural variation in the domain
    – Contrast with relational schemas, where foreign
    key relationships apply to all rows in a table
    • No need to use null to represent the absence of a
    connection

    View full-size slide

  14. Labels
    • Every node can have zero or more labels
    • Used to represent roles (e.g. user, product,
    company)
    – Group nodes
    – Allow us to associate indexes and constraints with
    groups of nodes

    View full-size slide

  15. Four Building Blocks
    • Nodes
    – Entities
    • Relationships
    – Connect entities and structure domain
    • Properties
    – Entity attributes, relationship qualities, and
    metadata
    • Labels
    – Group nodes by role

    View full-size slide

  16. Designing a Graph Model

    View full-size slide

  17. Models
    Images: en.wikipedia.org
    Purposeful abstraction of a domain designed to
    satisfy particular application/end-user goals

    View full-size slide

  18. Design for Queryability
    Model
    Query

    View full-size slide

  19. Method
    1. Identify application/end-user goals
    2. Figure out what questions to ask of the domain
    3. Identify entities in each question
    4. Identify relationships between entities in each
    question
    5. Convert entities and relationships to paths
    – These become the basis of the data model
    6. Express questions as graph patterns
    – These become the basis for queries

    View full-size slide

  20. Application/End-User Goals
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge

    View full-size slide

  21. Questions To Ask of the Domain
    Which people, who work for the same company
    as me, have similar skills to me?
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge

    View full-size slide

  22. Identify Entities
    Which people, who work for the same company
    as me, have similar skills to me?
    Person
    Company
    Skill

    View full-size slide

  23. Identify Relationships Between Entities
    Which people, who work for the same company
    as me, have similar skills to me?
    Person WORKS_FOR Company
    Person HAS_SKILL Skill

    View full-size slide

  24. Convert to Cypher Paths
    Person WORKS_FOR Company
    Person HAS_SKILL Skill
    Relationship
    Label
    (:Person)-[:WORKS_FOR]->(:Company),
    (:Person)-[:HAS_SKILL]->(:Skill)

    View full-size slide

  25. Consolidate Paths
    (:Person)-[:WORKS_FOR]->(:Company),
    (:Person)-[:HAS_SKILL]->(:Skill)
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

    View full-size slide

  26. Create Person Subgraph
    MERGE (c:Company{name:'Acme'})
    MERGE (p:Person{name:'Ian'})
    MERGE (s1:Skill{name:'Java'})
    MERGE (s2:Skill{name:'C#'})
    MERGE (s3:Skill{name:'Neo4j'})
    CREATE UNIQUE (c)<-[:WORKS_FOR]-(p),
    (p)-[:HAS_SKILL]->(s1),
    (p)-[:HAS_SKILL]->(s2),
    (p)-[:HAS_SKILL]->(s3)
    RETURN c, p, s1, s2, s3

    View full-size slide

  27. Candidate Data Model
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

    View full-size slide

  28. Express Question as Graph Pattern
    Which people, who work for the same company
    as me, have similar skills to me?

    View full-size slide

  29. Cypher Query
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View full-size slide

  30. Graph Pattern
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View full-size slide

  31. Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC
    Anchor Pattern in Graph
    If an index for
    Person.name exists,
    Cypher will use it

    View full-size slide

  32. Create Projection of Results
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View full-size slide

  33. Second Match

    View full-size slide

  34. Running the Query
    +-----------------------------------+
    | name | score | skills |
    +-----------------------------------+
    | "Lucy" | 2 | ["Java","Neo4j"] |
    | "Bill" | 1 | ["Neo4j"] |
    +-----------------------------------+
    2 rows

    View full-size slide

  35. From User Story to Model and Query
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
    Person WORKS_FOR Company
    Person HAS_SKILL Skill
    ?
    Which people, who work for the same
    company as me, have similar skills to me?

    View full-size slide

  36. Modeling Tips

    View full-size slide

  37. Properties Versus Relationships

    View full-size slide

  38. Use Relationships When…
    • You need to specify the weight, strength, or some
    other quality of the relationship
    • AND/OR the attribute value comprises a complex
    value type (e.g. address)
    • Examples:
    – Find all my colleagues who are expert (relationship
    quality) at a skill (attribute value) we have in common
    – Find all recent orders delivered to the same delivery
    address (complex value type)

    View full-size slide

  39. Use Properties When…
    • There’s no need to qualify the relationship
    • AND the attribute value comprises a simple
    value type (e.g. colour)
    • Examples:
    – Find those projects written by contributors to my
    projects that use the same language (attribute
    value) as my projects

    View full-size slide

  40. If Performance is Critical…
    • Small property lookup on a node will be
    quicker than traversing a relationship
    – But traversing a relationship is still faster than a
    SQL join…
    • However, many small properties on a node, or
    a lookup on a large string or large array
    property will impact performance
    – Always performance test against a representative
    dataset

    View full-size slide

  41. Relationship Granularity

    View full-size slide

  42. Align With Use Cases
    • Relationships are the “royal road” into the
    graph
    • When querying, well-named relationships
    help discover only what is absolutely
    necessary
    – And eliminate unnecessary portions of the graph
    from consideration

    View full-size slide

  43. General Relationships
    • Qualified by property

    View full-size slide

  44. Specific Relationships

    View full-size slide

  45. Best of Both Worlds

    View full-size slide

  46. Model and Query Recipes

    View full-size slide

  47. Events and Actions
    • Often involve multiple parties
    • Can include other circumstantial detail, which
    may be common to multiple events
    • Examples
    – Patrick worked for Acme from 2001 to 2005 as a
    Software Developer
    – Sarah sent an email to Lucy, copying in David and
    Claire

    View full-size slide

  48. Timeline Trees
    • Discrete events
    – No natural relationships to other events
    • You need to find events at differing levels of
    granularity
    – Between two days
    – Between two months
    – Between two minutes

    View full-size slide

  49. Example Timeline Tree

    View full-size slide

  50. Pitfalls and Anti-Patterns

    View full-size slide

  51. Modeling Entities as Relationships
    • Limits data model evolution
    – A relationship connects two things
    – Modeling an entity as a relationship prevents it
    from being related to more than two things
    • Smells:
    – Lots of attribute-like properties
    – Heavy use of relationship indexes
    • Entities hidden in verbs:
    – E.g. emailed, reviewed

    View full-size slide

  52. Example: Movie Reviews
    • Initial requirements:
    – People review films
    – Application aggregates reviews from multiple sites

    View full-size slide

  53. Initial Model

    View full-size slide

  54. New Requirements
    • Allow user to comment on each other’s
    reviews
    – Can’t connect a review to a third entity

    View full-size slide

  55. Revised model

    View full-size slide

  56. Model Actions in Terms of Products

    View full-size slide

  57. Now
    for
    Some
    Prototyping!

    View full-size slide

  58. Draw a Model!
    Eg. Using Visio, www.apcjones.com/arrows, http://graphjson.io, Omnigraffle

    View full-size slide

  59. Creating a prototype DB out of our model?

    View full-size slide

  60. Now for
    Some
    Queries!

    View full-size slide

  61. BACKUP slides:
    Cypher Query Language

    View full-size slide

  62. Nodes and Relationships
    ()-->()

    View full-size slide

  63. Labels and Relationship Types
    (:Person)-[:FRIEND]->(:Person)

    View full-size slide

  64. Properties
    (:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})

    View full-size slide

  65. Identifiers
    (p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})

    View full-size slide

  66. Cypher
    MATCH graph_pattern
    WHERE binding_and_filter_criteria
    RETURN results

    View full-size slide

  67. Cypher
    MATCH (p:Person)-[:FRIEND]->(friends)
    WHERE p.name = 'Peter'
    RETURN friends

    View full-size slide

  68. Lookup Using Identifier + Label
    MATCH (p:Person)-[:FRIEND]->(friends)
    WHERE p.name = 'Peter'
    RETURN friends

    View full-size slide