Scaling Django with Distributed Systems

Scaling Django with Distributed Systems

A talk I gave at PyCon Ukraine 2017.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

April 07, 2017
Tweet

Transcript

  1. None
  2. Andrew Godwin Hi, I'm Django core developer Senior Software Engineer

    at Used to complain about migrations a lot
  3. Distributed Systems

  4. c = 299,792,458 m/s

  5. Early CPUs c = 60m propagation distance Clock ~2cm 5

    MHz
  6. Modern CPUs c = 10cm propagation distance 3 GHz

  7. Distributed systems are made of independent components

  8. They are slower and harder to write than synchronous systems

  9. But they can be scaled up much, much further

  10. Trade-offs

  11. There is never a perfect solution.

  12. Fast Good Cheap

  13. None
  14. Load Balancer WSGI Worker WSGI Worker WSGI Worker

  15. Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache

  16. Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache Cache

    Cache
  17. Load Balancer WSGI Worker WSGI Worker WSGI Worker Database

  18. CAP Theorem

  19. Partition Tolerant Consistent Available

  20. PostgreSQL: CP Consistent everywhere Handles network latency/drops Can't write if

    main server is down
  21. Cassandra: AP Can read/write to any node Handles network latency/drops

    Data can be inconsistent
  22. It's hard to design a product that might be inconsistent

  23. But if you take the tradeoff, scaling is easy

  24. Otherwise, you must find other solutions

  25. Read Replicas (often called master/slave) Load Balancer WSGI Worker WSGI

    Worker WSGI Worker Replica Replica Main
  26. Replicas scale reads forever... But writes must go to one

    place
  27. If a request writes to a table it must be

    pinned there, so later reads do not get old data
  28. When your write load is too high, you must then

    shard
  29. Vertical Sharding Users Tickets Events Payments

  30. Horizontal Sharding Users 0 - 2 Users 3 - 5

    Users 6 - 8 Users 9 - A
  31. Both Users 0 - 2 Users 3 - 5 Users

    6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A
  32. Both plus caching Users 0 - 2 Users 3 -

    5 Users 6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache
  33. Teams have to scale too; nobody should have to understand

    eveything in a big system.
  34. Services allow complexity to be reduced - for a tradeoff

    of speed
  35. Users 0 - 2 Users 3 - 5 Users 6

    - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache User Service Event Service Ticket Service
  36. User Service Event Service Ticket Service WSGI Server

  37. Each service is its own, smaller project, managed and scaled

    separately.
  38. But how do you communicate between them?

  39. Service 2 Service 3 Service 1 Direct Communication

  40. Service 2 Service 3 Service 1 Service 4 Service 5

  41. Service 2 Service 3 Service 1 Service 4 Service 5

    Service 6 Service 7 Service 8
  42. Service 2 Service 3 Service 1 Message Bus Service 2

    Service 3 Service 1
  43. A single point of failure is not always bad -

    if the alternative is multiple, fragile ones
  44. Channels and ASGI provide a standard message bus built with

    certain tradeoffs
  45. Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Channels Library

    Django Django Channels Project
  46. Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Pure Python

  47. Failure Mode At most once Messages either do not arrive,

    or arrive once. At least once Messages arrive once, or arrive multiple times
  48. Guarantees vs. Latency Low latency Messages arrive very quickly but

    go missing more Low loss rate Messages are almost never lost but arrive slower
  49. Queuing Type First In First Out Consistent performance for all

    users First In Last Out Hides backlogs but makes them worse
  50. Queue Sizing Finite Queues Sending can fail Infinite queues Makes

    problems even worse
  51. You must understand what you are making (This is surprisingly

    uncommon)
  52. Design as much as possible around shared-nothing

  53. Per-machine caches On-demand thumbnailing Signed cookie sessions

  54. Has to be shared? Try to split it

  55. Has to be shared? Try sharding it.

  56. Django's job is to be slowly replaced by your code

  57. Just make sure you match the API contract of what

    you're replacing!
  58. Don't try to scale too early; you'll pick the wrong

    tradeoffs.
  59. Thanks. Andrew Godwin @andrewgodwin channels.readthedocs.io