Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Data @ Munich Data Engineering Meetup

Scaling Data @ Munich Data Engineering Meetup

Let's look at how databases can be distributed to multiple nodes. We will start with covering consistency models like strong consistency and eventual consistency. We will then talk about different ways of distributing databases and the reasons for distributing databases: Scaling reads and/or writes, reliability, big data sets and geographical distribution. Distributed databases have disadvantages too: We will talk about the influence of networks and clocks on our database. And yes, we will cover the CAP theorem and what it means.

Lucas Dohmen

March 15, 2018
Tweet

More Decks by Lucas Dohmen

Other Decks in Programming

Transcript

  1. Lucas Dohmen • Senior Consultant at INNOQ • Everything Web

    & Databases • Previously worked at ArangoDB • http://faucet-pipeline.org 3
  2. Consistency Models: Which histories are valid? 8 / Consistency Read()

    => a Write(b) Read() => b Read() => b Read() => a Write(b) Read() => a Read() => a Read() => b Write(b) Read() => b Read() => b
  3. 10 / Consistency strictly serializable serializable linearizable sequential repeatable read

    SI causal PRAM RL RV Highly Available Transactions: Virtues and Limitations – Bailis et al.
  4. How do we scale web applications? • Share nothing between

    application servers • Put behind a load balancer • Add servers 12 / Scaling Applications Load Balancer App App App App Database
  5. How do we scale web applications? • Share nothing between

    application servers • Put behind a load balancer • Add servers 12 / Scaling Applications Load Balancer App App App App Database
  6. Share Nothing for Databases? • Possible & Underused • Separate

    databases for separate data • If we need to join data, we need to join in the application 13 / Scaling / Sharding MySQL Redis
  7. Single Leader • Failover • Read scaling • No write

    scaling 16 / Scaling / Replication Leader Follower
  8. Sync or Async Replication? • Trade-off between consistency & speed

    • Sync: Every follower we add decreases performance • Async: If our leader dies and the replication is not done, we have lost acknowledged data 17 / Scaling / Replication
  9. Multi Leader • Failover • Read & write scaling 19

    / Scaling / Replication Leader Leader
  10. Write Conflicts • Two leaders can accept a conflicting write

    • We usually resolve them when reading • Do we have all information we need to resolve a conflict at read time? 20 / Scaling / Replication
  11. Quorum • Clients write to multiple nodes at once •

    When more than n nodes acknowledged the write, the write is successful (n is the write quorum) • When we read, we read from m nodes (m is the read quorum) 23 / Scaling / Replication
  12. Combining Replication & Sharding 29 / Scaling Replicas Shards Shard

    A Shard A Shard A Shard A Shard A Shard A Shard A Shard B Shard A Shard A Shard A Shard C
  13. Clocks are monotonic & synchronized 32 / Trouble leap seconds

    NTP fails NTP Sync 㱺 Going back in time
  14. Clocks are monotonic & synchronized 32 / Trouble leap seconds

    NTP fails NTP Sync 㱺 Going back in time NTP is an estimation
  15. Clocks are monotonic & synchronized 32 / Trouble leap seconds

    NTP fails NTP Sync 㱺 Going back in time NTP is an estimation
  16. 36 / Trouble Problem IP TCP Reordered Messages ✘ ✔

    (Sequence Numbers) Lost Messages ✘ ✔ (ack) Duplicated Messages ✘ ✔ (Sequence Numbers) Delayed Messages ✘ ✘
  17. The network is reliable 37 / Trouble packages can take

    a looooong time the network can fail partially/entirely
  18. You have two choices • Stop taking requests • Not

    available, but consistent 43 / Trouble • Continue taking requests • Available, but not consistent CP AP A B C D A B C D a=1 a=2 Sorry we’re CLOSED
  19. 44 / Trouble not possible with total availability possible with

    total or sticky availability strictly serializable serializable linearizable sequential repeatable read SI causal PRAM RL RV Highly Available Transactions: Virtues and Limitations – Bailis et al.
  20. Remember! • Nodes will fail • The network will fail

    • Clocks aren’t reliable 46 / Wrap Up
  21. What are your requirements? 47 / Wrap Up Scaling Reads

    Scaling Writes Geographical Distribution Big Data Sets Failure Resistance Inconsistency
  22. Thank you! • @moonbeamlabs on Twitter • Photo Credit •

    Slide 5: Shoot N' Design on Unsplash • Slide 11: Andy Hall on Unsplash • Slide 30: Hermes Rivera on Unsplash 48