Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stefan Armbruster on Graph Modelling Antipatterns

Stefan Armbruster on Graph Modelling Antipatterns

Transcript

  1. Graph Database Prototyping @ eJUG Austria meetup

  2. Agenda for Tonight • Building a Graph Database Prototype •

    3 parts – Graph database & modeling concepts – Prototyping tools & import – Graph querying with Cypher
  3. Data Modeling With Neo4j

  4. Topics • Graph model building blocks • Quick intro to

    Cypher • Example modeling process • Modeling tips • Recipes for common modeling scenarios • Refactoring • Test-driven data modeling
  5. Graph Model Building Blocks

  6. Property Graph Data Model

  7. Four Building Blocks • Nodes • Relationships • Properties •

    Labels
  8. Nodes

  9. Nodes • Used to represent entities and complex value types

    in your domain • Can contain properties – Used to represent entity attributes and/or metadata (e.g. timestamps, version) – Key-value pairs • Java primitives • Arrays • null is not a valid value – Every node can have different properties
  10. Entities and Value Types • Entities – Have unique conceptual

    identity – Change attribute values, but identity remains the same • Value types – No conceptual identity – Can substitute for each other if they have the same value • Simple: single value (e.g. colour, category) • Complex: multiple attributes (e.g. address)
  11. Relationships

  12. Relationships • Every relationship has a name and a direction

    – Add structure to the graph – Provide semantic context for nodes • Can contain properties – Used to represent quality or weight of relationship, or metadata • Every relationship must have a start node and end node – No dangling relationships
  13. Relationships (continued) Nodes can have more than one relationship Self

    relationships are allowed Nodes can be connected by more than one relationship
  14. Variable Structure • Relationships are defined with regard to node

    instances, not classes of nodes – Two nodes representing the same kind of “thing” can be connected in very different ways • Allows for structural variation in the domain – Contrast with relational schemas, where foreign key relationships apply to all rows in a table • No need to use null to represent the absence of a connection
  15. Labels

  16. Labels • Every node can have zero or more labels

    • Used to represent roles (e.g. user, product, company) – Group nodes – Allow us to associate indexes and constraints with groups of nodes
  17. Four Building Blocks • Nodes – Entities • Relationships –

    Connect entities and structure domain • Properties – Entity attributes, relationship qualities, and metadata • Labels – Group nodes by role
  18. Designing a Graph Model

  19. Models Images: en.wikipedia.org Purposeful abstraction of a domain designed to

    satisfy particular application/end-user goals
  20. Design for Queryability Model Query

  21. Method 1. Identify application/end-user goals 2. Figure out what questions

    to ask of the domain 3. Identify entities in each question 4. Identify relationships between entities in each question 5. Convert entities and relationships to paths – These become the basis of the data model 6. Express questions as graph patterns – These become the basis for queries
  22. Application/End-User Goals As an employee I want to know who

    in the company has similar skills to me So that we can exchange knowledge As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge
  23. Questions To Ask of the Domain Which people, who work

    for the same company as me, have similar skills to me? As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge
  24. Identify Entities Which people, who work for the same company

    as me, have similar skills to me? Person Company Skill
  25. Identify Relationships Between Entities Which people, who work for the

    same company as me, have similar skills to me? Person WORKS_FOR Company Person HAS_SKILL Skill
  26. Convert to Cypher Paths Person WORKS_FOR Company Person HAS_SKILL Skill

    Relationship Label (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill)
  27. Consolidate Paths (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill) (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

  28. Create Person Subgraph MERGE (c:Company{name:'Acme'}) MERGE (p:Person{name:'Ian'}) MERGE (s1:Skill{name:'Java'}) MERGE

    (s2:Skill{name:'C#'}) MERGE (s3:Skill{name:'Neo4j'}) CREATE UNIQUE (c)<-[:WORKS_FOR]-(p), (p)-[:HAS_SKILL]->(s1), (p)-[:HAS_SKILL]->(s2), (p)-[:HAS_SKILL]->(s3) RETURN c, p, s1, s2, s3
  29. Candidate Data Model (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

  30. Express Question as Graph Pattern Which people, who work for

    the same company as me, have similar skills to me?
  31. Cypher Query Which people, who work for the same company

    as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  32. Graph Pattern Which people, who work for the same company

    as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  33. Which people, who work for the same company as me,

    have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC Anchor Pattern in Graph If an index for Person.name exists, Cypher will use it
  34. Create Projection of Results Which people, who work for the

    same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  35. First Match

  36. Second Match

  37. Third Match

  38. Running the Query +-----------------------------------+ | name | score | skills

    | +-----------------------------------+ | "Lucy" | 2 | ["Java","Neo4j"] | | "Bill" | 1 | ["Neo4j"] | +-----------------------------------+ 2 rows
  39. From User Story to Model and Query MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)

    WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill) Person WORKS_FOR Company Person HAS_SKILL Skill ? Which people, who work for the same company as me, have similar skills to me?
  40. Modeling Tips

  41. Properties Versus Relationships

  42. Use Relationships When… • You need to specify the weight,

    strength, or some other quality of the relationship • AND/OR the attribute value comprises a complex value type (e.g. address) • Examples: – Find all my colleagues who are expert (relationship quality) at a skill (attribute value) we have in common – Find all recent orders delivered to the same delivery address (complex value type)
  43. Use Properties When… • There’s no need to qualify the

    relationship • AND the attribute value comprises a simple value type (e.g. colour) • Examples: – Find those projects written by contributors to my projects that use the same language (attribute value) as my projects
  44. If Performance is Critical… • Small property lookup on a

    node will be quicker than traversing a relationship – But traversing a relationship is still faster than a SQL join… • However, many small properties on a node, or a lookup on a large string or large array property will impact performance – Always performance test against a representative dataset
  45. Relationship Granularity

  46. Align With Use Cases • Relationships are the “royal road”

    into the graph • When querying, well-named relationships help discover only what is absolutely necessary – And eliminate unnecessary portions of the graph from consideration
  47. General Relationships • Qualified by property

  48. Specific Relationships

  49. Best of Both Worlds

  50. Model and Query Recipes

  51. Events and Actions • Often involve multiple parties • Can

    include other circumstantial detail, which may be common to multiple events • Examples – Patrick worked for Acme from 2001 to 2005 as a Software Developer – Sarah sent an email to Lucy, copying in David and Claire
  52. Timeline Trees • Discrete events – No natural relationships to

    other events • You need to find events at differing levels of granularity – Between two days – Between two months – Between two minutes
  53. Example Timeline Tree

  54. Pitfalls and Anti-Patterns

  55. Modeling Entities as Relationships • Limits data model evolution –

    A relationship connects two things – Modeling an entity as a relationship prevents it from being related to more than two things • Smells: – Lots of attribute-like properties – Heavy use of relationship indexes • Entities hidden in verbs: – E.g. emailed, reviewed
  56. Example: Movie Reviews • Initial requirements: – People review films

    – Application aggregates reviews from multiple sites
  57. Initial Model

  58. New Requirements • Allow user to comment on each other’s

    reviews – Can’t connect a review to a third entity
  59. Revised model

  60. Model Actions in Terms of Products

  61. Now for Some Prototyping!

  62. Draw a Model! Eg. Using Visio, www.apcjones.com/arrows, http://graphjson.io, Omnigraffle

  63. Creating a prototype DB out of our model?

  64. Now for Some Queries!

  65. BACKUP slides: Cypher Query Language

  66. Nodes and Relationships ()-->()

  67. Labels and Relationship Types (:Person)-[:FRIEND]->(:Person)

  68. Properties (:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})

  69. Identifiers (p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})

  70. Cypher MATCH graph_pattern WHERE binding_and_filter_criteria RETURN results

  71. Cypher MATCH (p:Person)-[:FRIEND]->(friends) WHERE p.name = 'Peter' RETURN friends

  72. Lookup Using Identifier + Label MATCH (p:Person)-[:FRIEND]->(friends) WHERE p.name =

    'Peter' RETURN friends