Horrors of Distributed Systems

Horrors of Distributed Systems

A talk I gave at DjangoCon AU 2017.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

August 04, 2017
Tweet

Transcript

  1. 2.

    Hi, I’m Andrew Godwin • Django core developer • Senior

    Software Engineer at • Channels is a thing, I guess?
  2. 5.
  3. 8.

    Disks & RAM Enterprise SSDs Bit flips after as little

    as a week unpowered Non-ECC memory 1 bit flip per gigabyte EVERY TWO HOURS 32GB of RAM is experiencing a bit flip every FOUR MINUTES Source: Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009). "DRAM Errors in the Wild: A Large-Scale Field Study" Source: Alvin Cox (2015). “JEDEC SSD Specifications Explained”
  4. 10.

    Networks How high will you let latency get? How bad

    does packet loss have to be? Do you have bad neighbours?
  5. 14.

    Australia is real far away from stuff Melbourne → US-east-1

    16,000km At the speed of light 50ms Minimum possible round-trip, ever 100ms Number of web requests just to open Slack 96
  6. 15.

    You can solve a lot of distributed problems by waiting

    for consensus ...but you can be waiting a long time
  7. 17.

    Your phone corrects for the time dilation of the GPS

    satellites. Oh, and all clocks drift quite a decent amount.
  8. 18.
  9. 22.

    Well, alright, more of a spectrum. At most once At

    least once Exactly once Basically never Eleventy copies Effort
  10. 23.

    Do you want to maybe do it twice? Saving text,

    liking a tweet Or maybe never? Charging money, sending email
  11. 24.
  12. 25.

    Your servers all need to agree... ... over an unreliable

    network ...with unreliable storage ...and different ideas of what time is
  13. 26.

    It can happen to YOU! It doesn’t have to be

    big and fancy to be distributed.