Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stefan Armbruster on Graph Modelling Antipatterns

Stefan Armbruster on Graph Modelling Antipatterns

More Decks by Enterprise Java User Group Austria

Other Decks in Technology

Transcript

  1. Graph Database
    Prototyping
    @
    eJUG Austria meetup

    View Slide

  2. Agenda for Tonight
    • Building a Graph Database Prototype
    • 3 parts
    – Graph database & modeling concepts
    – Prototyping tools & import
    – Graph querying with Cypher

    View Slide

  3. Data Modeling With Neo4j

    View Slide

  4. Topics
    • Graph model building blocks
    • Quick intro to Cypher
    • Example modeling process
    • Modeling tips
    • Recipes for common modeling scenarios
    • Refactoring
    • Test-driven data modeling

    View Slide

  5. Graph Model Building Blocks

    View Slide

  6. Property Graph Data Model

    View Slide

  7. Four Building Blocks
    • Nodes
    • Relationships
    • Properties
    • Labels

    View Slide

  8. Nodes

    View Slide

  9. Nodes
    • Used to represent entities and complex value
    types in your domain
    • Can contain properties
    – Used to represent entity attributes and/or
    metadata (e.g. timestamps, version)
    – Key-value pairs
    • Java primitives
    • Arrays
    • null is not a valid value
    – Every node can have different properties

    View Slide

  10. Entities and Value Types
    • Entities
    – Have unique conceptual identity
    – Change attribute values, but identity remains the
    same
    • Value types
    – No conceptual identity
    – Can substitute for each other if they have the
    same value
    • Simple: single value (e.g. colour, category)
    • Complex: multiple attributes (e.g. address)

    View Slide

  11. Relationships

    View Slide

  12. Relationships
    • Every relationship has a name and a direction
    – Add structure to the graph
    – Provide semantic context for nodes
    • Can contain properties
    – Used to represent quality or weight of
    relationship, or metadata
    • Every relationship must have a start node and
    end node
    – No dangling relationships

    View Slide

  13. Relationships (continued)
    Nodes can have more
    than one relationship
    Self relationships are allowed
    Nodes can be connected by
    more than one relationship

    View Slide

  14. Variable Structure
    • Relationships are defined with regard to node
    instances, not classes of nodes
    – Two nodes representing the same kind of “thing”
    can be connected in very different ways
    • Allows for structural variation in the domain
    – Contrast with relational schemas, where foreign
    key relationships apply to all rows in a table
    • No need to use null to represent the absence of a
    connection

    View Slide

  15. Labels

    View Slide

  16. Labels
    • Every node can have zero or more labels
    • Used to represent roles (e.g. user, product,
    company)
    – Group nodes
    – Allow us to associate indexes and constraints with
    groups of nodes

    View Slide

  17. Four Building Blocks
    • Nodes
    – Entities
    • Relationships
    – Connect entities and structure domain
    • Properties
    – Entity attributes, relationship qualities, and
    metadata
    • Labels
    – Group nodes by role

    View Slide

  18. Designing a Graph Model

    View Slide

  19. Models
    Images: en.wikipedia.org
    Purposeful abstraction of a domain designed to
    satisfy particular application/end-user goals

    View Slide

  20. Design for Queryability
    Model
    Query

    View Slide

  21. Method
    1. Identify application/end-user goals
    2. Figure out what questions to ask of the domain
    3. Identify entities in each question
    4. Identify relationships between entities in each
    question
    5. Convert entities and relationships to paths
    – These become the basis of the data model
    6. Express questions as graph patterns
    – These become the basis for queries

    View Slide

  22. Application/End-User Goals
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge

    View Slide

  23. Questions To Ask of the Domain
    Which people, who work for the same company
    as me, have similar skills to me?
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge

    View Slide

  24. Identify Entities
    Which people, who work for the same company
    as me, have similar skills to me?
    Person
    Company
    Skill

    View Slide

  25. Identify Relationships Between Entities
    Which people, who work for the same company
    as me, have similar skills to me?
    Person WORKS_FOR Company
    Person HAS_SKILL Skill

    View Slide

  26. Convert to Cypher Paths
    Person WORKS_FOR Company
    Person HAS_SKILL Skill
    Relationship
    Label
    (:Person)-[:WORKS_FOR]->(:Company),
    (:Person)-[:HAS_SKILL]->(:Skill)

    View Slide

  27. Consolidate Paths
    (:Person)-[:WORKS_FOR]->(:Company),
    (:Person)-[:HAS_SKILL]->(:Skill)
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

    View Slide

  28. Create Person Subgraph
    MERGE (c:Company{name:'Acme'})
    MERGE (p:Person{name:'Ian'})
    MERGE (s1:Skill{name:'Java'})
    MERGE (s2:Skill{name:'C#'})
    MERGE (s3:Skill{name:'Neo4j'})
    CREATE UNIQUE (c)<-[:WORKS_FOR]-(p),
    (p)-[:HAS_SKILL]->(s1),
    (p)-[:HAS_SKILL]->(s2),
    (p)-[:HAS_SKILL]->(s3)
    RETURN c, p, s1, s2, s3

    View Slide

  29. Candidate Data Model
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

    View Slide

  30. Express Question as Graph Pattern
    Which people, who work for the same company
    as me, have similar skills to me?

    View Slide

  31. Cypher Query
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View Slide

  32. Graph Pattern
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View Slide

  33. Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC
    Anchor Pattern in Graph
    If an index for
    Person.name exists,
    Cypher will use it

    View Slide

  34. Create Projection of Results
    Which people, who work for the same company
    as me, have similar skills to me?
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC

    View Slide

  35. First Match

    View Slide

  36. Second Match

    View Slide

  37. Third Match

    View Slide

  38. Running the Query
    +-----------------------------------+
    | name | score | skills |
    +-----------------------------------+
    | "Lucy" | 2 | ["Java","Neo4j"] |
    | "Bill" | 1 | ["Neo4j"] |
    +-----------------------------------+
    2 rows

    View Slide

  39. From User Story to Model and Query
    MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
    (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
    WHERE me.name = {name}
    RETURN colleague.name AS name,
    count(skill) AS score,
    collect(skill.name) AS skills
    ORDER BY score DESC
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    As an employee
    I want to know who in the company
    has similar skills to me
    So that we can exchange knowledge
    (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
    Person WORKS_FOR Company
    Person HAS_SKILL Skill
    ?
    Which people, who work for the same
    company as me, have similar skills to me?

    View Slide

  40. Modeling Tips

    View Slide

  41. Properties Versus Relationships

    View Slide

  42. Use Relationships When…
    • You need to specify the weight, strength, or some
    other quality of the relationship
    • AND/OR the attribute value comprises a complex
    value type (e.g. address)
    • Examples:
    – Find all my colleagues who are expert (relationship
    quality) at a skill (attribute value) we have in common
    – Find all recent orders delivered to the same delivery
    address (complex value type)

    View Slide

  43. Use Properties When…
    • There’s no need to qualify the relationship
    • AND the attribute value comprises a simple
    value type (e.g. colour)
    • Examples:
    – Find those projects written by contributors to my
    projects that use the same language (attribute
    value) as my projects

    View Slide

  44. If Performance is Critical…
    • Small property lookup on a node will be
    quicker than traversing a relationship
    – But traversing a relationship is still faster than a
    SQL join…
    • However, many small properties on a node, or
    a lookup on a large string or large array
    property will impact performance
    – Always performance test against a representative
    dataset

    View Slide

  45. Relationship Granularity

    View Slide

  46. Align With Use Cases
    • Relationships are the “royal road” into the
    graph
    • When querying, well-named relationships
    help discover only what is absolutely
    necessary
    – And eliminate unnecessary portions of the graph
    from consideration

    View Slide

  47. General Relationships
    • Qualified by property

    View Slide

  48. Specific Relationships

    View Slide

  49. Best of Both Worlds

    View Slide

  50. Model and Query Recipes

    View Slide

  51. Events and Actions
    • Often involve multiple parties
    • Can include other circumstantial detail, which
    may be common to multiple events
    • Examples
    – Patrick worked for Acme from 2001 to 2005 as a
    Software Developer
    – Sarah sent an email to Lucy, copying in David and
    Claire

    View Slide

  52. Timeline Trees
    • Discrete events
    – No natural relationships to other events
    • You need to find events at differing levels of
    granularity
    – Between two days
    – Between two months
    – Between two minutes

    View Slide

  53. Example Timeline Tree

    View Slide

  54. Pitfalls and Anti-Patterns

    View Slide

  55. Modeling Entities as Relationships
    • Limits data model evolution
    – A relationship connects two things
    – Modeling an entity as a relationship prevents it
    from being related to more than two things
    • Smells:
    – Lots of attribute-like properties
    – Heavy use of relationship indexes
    • Entities hidden in verbs:
    – E.g. emailed, reviewed

    View Slide

  56. Example: Movie Reviews
    • Initial requirements:
    – People review films
    – Application aggregates reviews from multiple sites

    View Slide

  57. Initial Model

    View Slide

  58. New Requirements
    • Allow user to comment on each other’s
    reviews
    – Can’t connect a review to a third entity

    View Slide

  59. Revised model

    View Slide

  60. Model Actions in Terms of Products

    View Slide

  61. Now
    for
    Some
    Prototyping!

    View Slide

  62. Draw a Model!
    Eg. Using Visio, www.apcjones.com/arrows, http://graphjson.io, Omnigraffle

    View Slide

  63. Creating a prototype DB out of our model?

    View Slide

  64. Now for
    Some
    Queries!

    View Slide

  65. BACKUP slides:
    Cypher Query Language

    View Slide

  66. Nodes and Relationships
    ()-->()

    View Slide

  67. Labels and Relationship Types
    (:Person)-[:FRIEND]->(:Person)

    View Slide

  68. Properties
    (:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})

    View Slide

  69. Identifiers
    (p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})

    View Slide

  70. Cypher
    MATCH graph_pattern
    WHERE binding_and_filter_criteria
    RETURN results

    View Slide

  71. Cypher
    MATCH (p:Person)-[:FRIEND]->(friends)
    WHERE p.name = 'Peter'
    RETURN friends

    View Slide

  72. Lookup Using Identifier + Label
    MATCH (p:Person)-[:FRIEND]->(friends)
    WHERE p.name = 'Peter'
    RETURN friends

    View Slide