It's 2017, and I still want to sell you a graph database

It's 2017, and I still want to sell you a graph database

The aha!s and the oh-noe!s of over one year of building our product with a graph database, Neo4j, along with big brother PostgreSQL and hipster cousin Redis with Rails.

This talk will attempt to answer an important question, "when does using a graph database make sense?", through retrospection.

A9e271fb1622f8dbb6d652993f5a23a7?s=128

Swanand Pagnis

January 29, 2017
Tweet

Transcript

  1. IT'S 2017, AND I STILL WANT TO SELL YOU A

    GRAPH DATABASE
  2. A CONFESSION

  3. I DON'T REALLY WANT TO SELL YOU A GRAPH DATABASE

  4. I WANT TO SELL YOU THE IDEA OF A GRAPH

    DATABASE
  5. WHY?

  6. A LANGUAGE THAT DOESN'T AFFECT THE WAY YOU THINK ABOUT

    PROGRAMMING IS NOT WORTH KNOWING. Alan Perlis, emphasis mine.
  7. A DATABASE THAT DOESN'T AFFECT THE WAY YOU THINK ABOUT

    DATA MODELLING IS NOT WORTH KNOWING. Alan Perlis, paraphrased
  8. I WANT YOU TO MAKE AN INFORMED DECISION

  9. A QUICK STORY BUT BEFORE THAT,

  10. WE SPENT ONE YEAR BUILDING A PRODUCT USING NEO4J

  11. WE SPENT ONE YEAR BUILDING A PRODUCT USERS PAY FOR

  12. WE SPENT ONE YEAR BUILDING A PRODUCT USERS LOVE

  13. AND NOW WE'RE REWRITING IT IN POSTGRES

  14. WE CONCLUDED THAT BOTH DECISIONS WERE CORRECT

  15. USING NEO4J THEN, WAS CORRECT

  16. USING POSTGRES NOW, IS CORRECT

  17. THE FUN IS IN THE DETAILS NATURALLY,

  18. 1. SEMANTICS 2. TOOLING 3. OPS

  19. MODELLING AND QUERYING DATA AS A GRAPH SEMANTICS

  20. DOES IT MAKE YOUR BUSINESS LOGIC ANY SIMPLER OR BETTER?

  21. TURNS OUT, IT DOES

  22. EXAMPLE: SYMMETRICAL RELATIONSHIPS GRAPH SEMANTICS

  23. REPRESENTING MARRIAGES

  24. SAY, A PERSON IS MARRIED TO ANOTHER PERSON

  25. BY DEFINITION, THAT PERSON IS MARRIED TO THIS PERSON

  26. THE TRADITIONAL WAY TO SOLVE THIS WOULD BE A JOIN

    TABLE
  27. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES class Person < ActiveRecord::Base
 has_many

    :marriages
 has_many :spouses, through: :marriages, source: :spouse
 end
 class Marriage < ActiveRecord::Base
 belongs_to :person
 belongs_to :spouse, foreign_key: :spouse_id, class_name: "Person"
 end
  28. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES SELECT * FROM marriages m


    WHERE m.person_id = 42 OR m.spouse_id = 42
  29. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES if m.person_id == person.id
 Person.find(m.person_id)


    else
 Person.find(m.spouse_id)
 end
  30. AWKWARD!

  31. BUT HEY, THIS IS POSTGRES. WE CAN DO BETTER!

  32. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES class Marriage < ActiveRecord::Base
 belongs_to

    :person
 belongs_to :spouse, foreign_key: :spouse_id, class_name: "Person"
 
 after_create do
 inverse.first_or_create
 end
 
 after_destroy do
 inverse.first.try(:destroy)
 end
 
 def inverse
 Marriage.where(person: spouse, spouse: person)
 end
 end
  33. THIS IS GOOD, AND IT WORKS

  34. BUT…

  35. EXTRANEOUS COMPLEXITY IN MAINTAINING EVEN ROWS

  36. EXTRANEOUS COMPLEXITY IN ADDING FEATURES

  37. EXTRANEOUS COMPLEXITY LIKE MULTIPLE SPOUSES

  38. EXTRANEOUS COMPLEXITY LIKE PAST SPOUSES

  39. COMPARE THAT TO NEO4J:

  40. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES class Person
 include Neo4j::ActiveNode
 


    # (Person)-[:MARRIES]->(Person)
 has_many :both, :spouses,
 type: "MARRIES", model_class: "Person"
 
 # (Person)-[:IS_MARRIED_TO]->(Person)
 has_one :both, :current_spouse,
 type: "IS_MARRIED_TO", model_class: "Person"
 end
  41. SIMPLE, DECLARATIVE, DATABASE RELIANT

  42. GRAPH SEMANTICS SYMMETRICAL RELATIONSHIPS: MARRIAGES class Person
 include Neo4j::ActiveNode
 


    # (Person)-[:MARRIES]->(Person)
 has_many :both, :spouses,
 type: "MARRIES", model_class: "Person"
 
 # (Person)-[:IS_MARRIED_TO]->(Person)
 has_one :both, :current_spouse,
 type: "IS_MARRIED_TO", model_class: "Person"
 end
  43. AD-HOC POINTERS GRAPH SEMANTICS

  44. GRAPH SEMANTICS AD-HOC POINTERS # (Author)-[:HAS_WRITTEN]->(Book)
 has_many :out, :books,
 type:

    "HAS_WRITTEN", model_class: "Book"
 
 
 # (Author)-[:IS_WRITING]->(Book)
 has_one :out, :current_book,
 type: "IS_WRITING", model_class: "Book"
  45. GRAPH SEMANTICS AD-HOC POINTERS # (Student)-[:HAS_ATTEMPTED]->(Assignment)
 has_many :out, :assignments,
 type:

    "HAS_ATTEMPTED", model_class: "Assignment"
 
 
 # (Student)-[:HAS_BEST]->(Assignment)
 has_one :out, :best_submission,
 type: "HAS_BEST", model_class: "Assignment"
  46. GRAPH SEMANTICS AD-HOC POINTERS # (Dentist)-[:TREATS]->(Patient)
 has_many :out, :patients,
 type:

    "TREATS", model_class: "Person"
 
 
 # (Dentist)<-[:TEACHES_RUBY]-(Patient)
 has_one :in, :ruby_teacher,
 type: "TEACHES_RUBY", model_class: "Person"
  47. GRAPH SEMANTICS AD-HOC POINTERS # (Person)-[:HAS]->(MoveScore) has_many :out, :move_scores, type:

    "HAS", model_class: "MoveScore"
 # (Person)-[:HAS_CURRENT]->(MoveScore) has_one :out, :current_move_score, type: "HAS_CURRENT", model_class: "MoveScore"
 # (Person)-[:HAS_PREFERRED]->(*) has_one :out, :preferred_move_score, type: "HAS_PREFERRED", model_class: false
  48. WHY IS THIS USEFUL?

  49. GRAPH SEMANTICS AD-HOC POINTERS MATCH
 (realtor:Person)<-[:CONTACT_OF]-(contact:Person)<- [:CONTACT_INFOS]-(email:EmailContactInfo)
 WHERE
 (contact.rapportive_enqueued IS

    NULL)
 AND NOT ((contact)<-[:SOCIAL_PROFILES]- (:SocialProfile {type: 'linked_in'}))
 RETURN
 DISTINCT(contact) as contact, realtor.user_id as user_id, collect(email) as emails
  50. GRAPH SEMANTICS AD-HOC POINTERS MATCH
 (realtor:Person)<-[:CONTACT_OF]-(contact:Person)<- [:CONTACT_INFOS]-(email:EmailContactInfo)
 WHERE
 (contact.rapportive_enqueued IS

    NULL)
 AND NOT ((contact)<-[:SOCIAL_PROFILES]- (:SocialProfile {type: 'linked_in'}))
 RETURN
 DISTINCT(contact) as contact, realtor.user_id as user_id, collect(email) as emails
  51. GRAPH SEMANTICS AD-HOC POINTERS MATCH
 (node)<-[edge]-(node)<-[edge]-(node)
 WHERE
 (conditional)
 AND NOT

    ((node)<-[]-(node))
 RETURN stuff
  52. WRITE QUERIES ABOUT GRAPH, USING A GRAPH!

  53. None
  54. GRAPH SEMANTICS AD-HOC POINTERS # (Person)-[:HAS]->(MoveScore) has_many :out, :move_scores, type:

    "HAS", model_class: "MoveScore"
 # (Person)-[:HAS_CURRENT]->(MoveScore) has_one :out, :current_move_score, type: "HAS_CURRENT", model_class: "MoveScore"
 # (Person)-[:HAS_PREFERRED]->(*) has_one :out, :preferred_move_score, type: "HAS_PREFERRED", model_class: false
  55. GRAPH SEMANTICS AD-HOC POINTERS # (Person)-[:HAS]->(MoveScore) has_many :out, :move_scores, type:

    "HAS", model_class: "MoveScore"
 # (Person)-[:HAS_CURRENT]->(MoveScore) has_one :out, :current_move_score, type: "HAS_CURRENT", model_class: "MoveScore"
 # (Person)-[:HAS_PREFERRED]->(*) has_one :out, :preferred_move_score, type: "HAS_PREFERRED", model_class: false
  56. GRAPH SEMANTICS AD-HOC POINTERS # (Person)-[:HAS]->(MoveScore) has_many :out, :move_scores, type:

    "HAS", model_class: "MoveScore"
 # (Person)-[:HAS_CURRENT]->(MoveScore) has_one :out, :current_move_score, type: "HAS_CURRENT", model_class: "MoveScore"
 # (Person)-[:HAS_PREFERRED]->(*) has_one :out, :preferred_move_score, type: "HAS_PREFERRED", model_class: false
  57. GRAPH SEMANTICS AD-HOC POINTERS, POLYMORPHIC # Post, Article, Status has_many

    :comments, as: :commentable # Comment belongs_to :commentable, polymorphic: true
  58. GOOD LUCK, WRITING A JOIN QUERY AGAINST THAT!

  59. LABELS: POLYMORPHISM, MIXINS, & COMPOSITION GRAPH SEMANTICS

  60. EACH NODE IN A GRAPH IS IDENTIFIED BY A LABEL

  61. EACH EDGE IN A GRAPH IS IDENTIFIED BY A LABEL

  62. ANY NODE IN A GRAPH CAN ANY NUMBER OF LABELS

  63. ANY EDGE IN A GRAPH CAN ANY NUMBER OF LABELS

  64. LET'S EQUATE A LABEL TO A CLASS

  65. EXAMPLE: EMAIL, PHONE, CONTACT INFO

  66. EMAIL IS A CONTACT INFO

  67. PHONE IS A CONTACT INFO

  68. SO, SHARED BEHAVIOUR

  69. BUT, DIFFERENT CONSTRAINTS

  70. THIS IS CURRENTLY NOT POSSIBLE IN POSTGRES

  71. WELL, NOT WITHOUT A LOT OF ADDITIONAL COMPLEXITY

  72. EXAMPLE: SIX DEGREES OF KEVIN BACON GRAPH SEMANTICS

  73. GOOGLE FOR YOUR FAVOURITE ACTOR'S BACON NUMBER

  74. None
  75. GRAPH SEMANTICS EXAMPLE: SIX DEGREES OF KEVIN BACON (Actor) -[:HAS_WORKED_WITH]->(Actor)

    -[:HAS_WORKED_WITH]->(Actor) -[:HAS_WORKED_WITH]->(Actor) -[:HAS_WORKED_WITH]->(Actor) -[:HAS_WORKED_WITH]->(Actor)
  76. GRAPH SEMANTICS EXAMPLE: SIX DEGREES OF KEVIN BACON Answer to

    this question has been left to the reader as an exercise.
  77. GRAPH SEMANTICS CYPHER THE QUERY LANGUAGE MATCH (you {name:"You"})
 MATCH

    (expert) -[:WORKED_WITH]-> (db:Database {name:"Neo4j"})
 MATCH path = shortestPath( (you)-[:FRIEND*..5]-(expert))
 RETURN db,expert,path
  78. PLENTY OF MORE EXAMPLES, AND GREAT DOCS ON THE OFFICIAL

    SITE
  79. LIBRARIES, ADAPTERS, DEV TOOLS, ADMIN TOOLS ETC. TOOLING

  80. FIRST QUESTION THAT COMES TO THE MIND: ACTIVE RECORD?

  81. YES AND NO. RUBY GEM "NEO4J" AVAILABLE

  82. NO: NOT QUITE AS FULLY FEATURED AS ACTIVE RECORD

  83. YES: OFFERS THAT WELL FAMILIAR API WE'VE SEEN EARLIER

  84. TOOLING NEO4J.RB API # (Person)-[:HAS]->(MoveScore) has_many :out, :move_scores, type: "HAS",

    model_class: "MoveScore"
 # (Person)-[:HAS_CURRENT]->(MoveScore) has_one :out, :current_move_score, type: "HAS_CURRENT", model_class: "MoveScore"
 # (Person)-[:HAS_PREFERRED]->(*) has_one :out, :preferred_move_score, type: "HAS_PREFERRED", model_class: false
  85. YES: OFFERS ALL THE TOOLS YOU ARE ACQUAINTED WITH

  86. TOOLING NEO4J.RB API: ALL YOUR FAMILIAR TOOLS ▸ Properties ▸

    Indexes / Constraints ▸ Callbacks ▸ Validation ▸ Associations
  87. OF COURSE, THERE ARE CAVEATS

  88. NEO4J DOESN'T MAINTAIN TYPES OF PROPERTIES

  89. AND SO NEO4J.RB HAS TO

  90. ADDITIONAL COGNITIVE OVERLOAD WHEN QUERYING

  91. TOOLING NEO4J: NO TYPES # This is a sorted collection


    [2, 3, 5, 7, 11]
 
 # And so is this
 ["11", "2", "3", "5", "7"]
  92. NEO4J DOESN'T ALLOW COMPOSITE INDEXES

  93. NEO4J DID NOT SUPPORT A BINARY PROTOCOL

  94. IT WAS ALL REST BASED

  95. I SAID WAS, BECAUSE THIS IS BEING ADDRESSED NOW

  96. A BINARY PROTOCOL IS OUT, BUT NOT PRODUCTION READY

  97. YOU CAN ALWAYS USE THE JDBC CONNECTOR, FROM JAVA

  98. IT DOESN'T SUFFER FROM ANY OF THESE ISSUES

  99. WHAT ISSUES, YOU SAID?

  100. TOOLING NEO4J: ALL REST, NO BINARY Neo4j::Session::CypherError
 # 1. Unable

    to load RELATIONSHIP with id <some-id>
 # 2. LockClient[80] can't wait on resource
 # 3. Expected a numeric value for empty iterator, but got null
  101. TOOLING NEO4J: ALL REST, NO BINARY Faraday::ConnectionFailed
 # 1. too

    many connection resets
 # 2. connection refused to: <ip-address>
 # 3. Failed to open TCP connection to
 # 4. execution expired
  102. TOOLING NEO4J: ALL REST, NO BINARY Neo4j::ActiveRel::Persistence::RelCreateFaile dError
 # 1.

    Cannot create rel with unpersisted, invalid to_node
  103. FAR TOO MANY TIMES TO COUNT

  104. THESE ERRORS WERE OUR PRIMARY REASON FOR SWITCHING

  105. THEY CRIPPLED OUR SCALE, AND LIMITED OUR CONCURRENCY

  106. DEAL BREAKER!

  107. THERE ARE PLENTY MORE THINGS I WANT TO TALK ABOUT

  108. MONITORING, BACKUPS, DEPLOYMENTS, PERFORMANCE TUNING, ETC OPS

  109. BUT I WILL STOP HERE.

  110. CONCLUSION

  111. POWERFUL QUERYING, EXCELLENT PERFORMANCE, BEAUTIFUL API, GREAT SEMANTICS

  112. A NATURAL CHOICE FOR BUILDING APPLICATIONS WITH RELATIONSHIP FOCUSSED DATA

  113. BUT…

  114. THE RUBY ECOSYSTEM AND TOOLING LEAVES A LOT TO BE

    DESIRED
  115. SPECIALLY WHEN YOU ARE DEALING WITH HIGH PERFORMANCE AND HIGH

    CONCURRENCY
  116. THANK YOU! THIS IS IT.

  117. SWANAND PAGNIS PRINCIPAL ENGINEER, FIRST.IO @_SWANAND ON TWITTER WHO AM

    I?
  118. QUESTIONS? IF TIME PERMITS…