Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Consistency For Large Neo4j Clusters by Jim Webber at Big Data Spain 2017

Causal Consistency For Large Neo4j Clusters by Jim Webber at Big Data Spain 2017

An overview of the Raft algorithm and how Neo4j uses it to provide strong consistency at scale.

https://www.bigdataspain.org/2017/talk/causal-consistency-for-large-neo4j-clusters

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 29, 2017
Tweet

Transcript

  1. None
  2. Causal Consistency For Large Neo4j Clusters Dr. Jim Webber Chief

    Scientist, Neo4j
  3. None
  4. None
  5. ads to a social graph

  6. None
  7. Motivation Why do we need Neo4j clusters?

  8. Massive Throughput

  9. Data Redundancy

  10. Data Redundancy

  11. Data Redundancy

  12. Data Redundancy

  13. High Availability

  14. High Availability

  15. High Availability Error! 503: Service Unavailable

  16. High Availability Error! 503: Service Unavailable

  17. High Availability Error! 503: Service Unavailable

  18. High Availability Error! 503: Service Unavailable

  19. High Availability ✓ Error! 503: Service Unavailable

  20. Data Massive High

  21. Data Massive High 3.0

  22. Data Massive High 3.0 Bigger Clusters Consensus Commit Built-in load

    balancing 3.1 + Causal Clusterin g
  23. No Free Lunch* Consistency makes clusters tricky

  24. Register Login You need to login in to continue your

    purchase!
  25. Register Login You need to login in to continue your

    purchase! Username: Password: Create Account
  26. Register Login You need to login in to continue your

    purchase! Username: jim_w Password: ******** Create Account
  27. Register Login You need to login in to continue your

    purchase! Username: Password: Login
  28. Username: jim_w Password: ******** Login

  29. Purchase Login Successful Try again No account found! Username: jim_w

    Password: ******** Login
  30. Roles for Safety and Scale Divide and conquer complexity

  31. Design Trade-off Design Trade-off Availability Reliability

  32. Read Replicas Core

  33. • Small group of Neo4j databases • Fault-tolerant Consensus Commit

    • Responsible for data safety Core
  34. Writing to the Core Cluster Neo4j Driver Neo4j Cluster

  35. Writing to the Core Cluster Neo4j Driver CREATE ( :

    User {. . . }) ✓ Neo4j Cluster
  36. Writing to the Core Cluster Neo4j Driver CREATE ( :

    User {. . . }) ✓ Neo4j Cluster
  37. Writing to the Core Cluster Neo4j Driver CREATE ( :

    User {. . . }) ✓ ✓ ✓ Neo4j Cluster
  38. Writing to the Core Cluster Neo4j Driver CREATE ( :

    User {. . . }) ✓ ✓ ✓ Neo4j Cluster
  39. Writing to the Core Cluster Neo4j Driver CREATE ( :

    User {. . . }) ✓ ✓ ✓ Neo4j Cluster
  40. Writing to the Core Cluster Neo4j Driver ✓ ✓ ✓

    Success Neo4j Cluster
  41. Writing to the Core Cluster Neo4j Driver ✓ ✓ ✓

    Success Neo4j Cluster ✓ ✓
  42. Raft Protocol Non-Blocking Consensus for Humans

  43. EASY TO UNDERSTAND

  44. Raft Protocol https://github.com/ongardie/raftscope

  45. Raft in a Nutshell • Raft keeps logs tied together

    • Logs contain entries for both the data and cluster membership • Entries are appended and subsequently committed if a simple majority agree • Implication: majority agree with the log as proposed • Anyone can call an election: highest term (logical clock) wins, followed by highest committed, followed by highest appended • Appended but uncommitted entries can be truncated, but this is safe (transaction aborted)
  46. Consensus Log → Committed Transactions → Updated Graph 0 1

    2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Transaction log: the same transactions appear in the same order on all members Consensus log: stores both committed and uncommitted transactions Uncommitt ed entries may differ between members Transactions are only appended to the transaction log when committed according to Raft Transactions are applied, updating the graph Neo4j Raft implementation
  47. • Small group of Neo4j databases • Fault-tolerant Consensus Commit

    • Responsible for data safety Core
  48. • For massive query throughput • Read-only replicas • Not

    involved in Consensus Commit • Disposable, suitable for auto-scaling Read Replicas
  49. Propagating updates to the Read Replicas Neo4j Driver Neo4j Cluster

  50. Propagating updates to the Read Replicas Neo4j Driver Neo4j Cluster

    Write
  51. Propagating updates to the Read Replicas Neo4j Driver Neo4j Cluster

    Write
  52. Reading from the Read Replicas Neo4j Driver Neo4j Cluster Read

  53. Updating the graph Querying the graph

  54. Read Repli cas Cor e Updating the graph Queries, analysis,

    reporting
  55. Building an App Where CS meets software eng

  56. App Server Neo4j Driver Bolt protocol

  57. Java < dependency> < gr oupI d> org. neo4j .

    dri ver< / gr oupI d> < art i f act I d> neo4j -j ava-dri ver< / art i f act I d> < / dependency> Python pi p i nst al l neo4j -dri ver .NET PM > I nst al l - Package N eo4j . D ri ver JavaScript npm i nst al l neo4j -dri ver
  58. https://neo4j.com/developer/language-guides

  59. bolt:// G raphD at abase. dri ver( "bol t :

    / / aServer" )
  60. bolt+routing:// G raphD at abase. dri ver( "bol t +

    r out i ng: / / aCor eServer" )
  61. G raphD at abase. dri ver( "bol t + r

    out i ng: / / aCor eServer" ) Bootstrap: specify any core server to route load across the bolt+routing://
  62. Application Server Neo4j Driver Max Jim Jane Mar k

  63. Routed write statements dri ver = G raphD at abase.

    dri ver( "bol t + r out i ng: / / aCor eServer" ) ; t ry ( Sessi on sessi on = dri ver. sessi on( AccessM ode. W RI TE ) ) { t ry ( Transact i on t x = sessi on. begi nTransact i on( ) ) { t x. run( "M ERG E ( user : User {userI d: {userI d}}) ", param et ers( "userI d", userI d ) ) ; t x. success( ) ; } }
  64. Routed read queries dri ver = G raphD at abase.

    dri ver( "bol t + r out i ng: / / aCor eServer" ) ; t ry ( Sessi on sessi on = dri ver. sessi on( AccessM ode. READ ) ) { t ry ( Transact i on t x = sessi on. begi nTransact i on( ) ) { t x. run( "M ATCH ( user : User {userI d: {userI d}}) - [ *] - ( : Pr oduct ) RETURN *", param et ers( "userI d", userI d ) ) ; t x. success( ) ; } }
  65. Consistency Models Can you read what you write?

  66. Cluster members slightly “ahead” or “behind” of each other 0

    1 2 3 4 5 6 7 8 9 10 1 1 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 If I query this server I won’t see the updates from transaction . If I query this server, I’ll see all updates from all committed transactions 1 1 1 1 This is normal behaviour
  67. Register Login You need to login in to continue your

    purchase!
  68. Register Login You need to login in to continue your

    purchase! Username: Password: Create Account
  69. Register Login You need to login in to continue your

    purchase! Username: jim_w Password: ******** Create Account
  70. Register Login You need to login in to continue your

    purchase! Username: Password: Login
  71. Username: jim_w Password: ******** Login

  72. Purchase Login Successful Try again No account found! Username: jim_w

    Password: ******** Login
  73. Username: jim_w Password: ******** A few moments later... ✓ Login

  74. Purchase Login Successful Username: jim_w Password: ******** Login A few

    moments later... ✓
  75. Q Why didn’t this work? A Eventual Consistency

  76. 0 1 2 3 4 5 6 7 8 9

    10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Create Account App Serve r A Drive r
  77. 0 1 2 3 4 5 6 7 8 9

    10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r
  78. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r
  79. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r 11
  80. 0 1 2 3 4 5 6 7 8 9

    10 11 CREATE ( : User ) Create Account App Serve r A Drive r 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11
  81. 0 1 2 3 4 5 6 7 8 9

    10 11 CREATE ( : User ) Create Account App Serve r A Drive r 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11
  82. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r 11
  83. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r M ATCH ( : User) Login App Serve r B Drive r 11
  84. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r M ATCH ( : User) Login App Serve r B Drive r 11
  85. Bookmark Session token String (for portability) Opaque to application Represents

    ultimate user’s most recent view of the graph More capabilities to come
  86. Let’s try again, with Causal Consistency

  87. 0 1 2 3 4 5 6 7 8 9

    10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Create Account App Serve r A Drive r
  88. 0 1 2 3 4 5 6 7 8 9

    10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r
  89. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r
  90. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r 11
  91. 0 1 2 3 4 5 6 7 8 9

    10 11 CREATE ( : User ) Create Account App Serve r A Drive r 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11
  92. 0 1 2 3 4 5 6 7 8 9

    10 11 CREATE ( : User ) Create Account App Serve r A Drive r 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11
  93. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r 11
  94. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 CREATE ( : User ) Create Account App Serve r A Drive r M ATCH ( : User) Login App Serve r B Drive r 11
  95. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE ( : User ) Create Account M ATCH ( : User) Login App Serve r A App Serve r B Drive r Drive r
  96. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE ( : User ) Create Account M ATCH ( : User) Login App Serve r A App Serve r B Drive r Drive r 11
  97. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE ( : User ) Create Account M ATCH ( : User) Login App Serve r A App Serve r B Drive r Drive r 11
  98. Obtain bookmark t ry ( Sessi on sessi on =

    dri ver. sessi on( AccessM ode. W RI TE ) ) { t ry ( Transact i on t x = sessi on. begi nTransact i on( ) ) { t x. run( "CREATE ( user : User {userI d: {userI d}, passw or dHash: {passw or dHash}) ", param et ers( "userI d", userI d, "passw or dH ash", passw or dH ash ) ); t x. success( ) ; } St ri ng bookm ark = sessi on. l ast Bookm ark( ) ; }
  99. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE ( : User ) Create Account M ATCH ( : User) Login App Serve r A App Serve r B Drive r Drive r 11 Obtain bookmark
  100. Use a bookmark t ry ( Sessi on sessi on

    = dri ver. sessi on( AccessM ode. READ ) ) { t ry ( Transact i on t x = sessi on. begi nTransact i on( bookm ark ) ) { t x. run( "M ATCH ( user : User {userI d: {userI d}}) RETURN *", param et ers( "userI d", userI d ) ) ; t x. success( ) ; } }
  101. 0 1 2 3 4 5 6 7 8 9

    10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE ( : User ) Create Account M ATCH ( : User) Login App Serve r A App Serve r B Drive r Drive r 11 Use bookmark
  102. Thank you for listening @jimwebber