Scaling Django with Distributed Systems

Andrew Godwin Hi, I'm Django core developer Senior Software Engineer
at Used to complain about migrations a lot

Distributed Systems

c = 299,792,458 m/s

Early CPUs c = 60m propagation distance Clock ~2cm 5
MHz

Modern CPUs c = 10cm propagation distance 3 GHz

Distributed systems are made of independent components

They are slower and harder to write than synchronous systems

But they can be scaled up much, much further

Trade-offs

There is never a perfect solution.

Fast Good Cheap

Load Balancer WSGI Worker WSGI Worker WSGI Worker

Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache

Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache Cache
Cache

Load Balancer WSGI Worker WSGI Worker WSGI Worker Database

CAP Theorem

Partition Tolerant Consistent Available

PostgreSQL: CP Consistent everywhere Handles network latency/drops Can't write if
main server is down

Cassandra: AP Can read/write to any node Handles network latency/drops
Data can be inconsistent

It's hard to design a product that might be inconsistent

But if you take the tradeoff, scaling is easy

Otherwise, you must ﬁnd other solutions

Read Replicas (often called master/slave) Load Balancer WSGI Worker WSGI
Worker WSGI Worker Replica Replica Main

Replicas scale reads forever... But writes must go to one
place

If a request writes to a table it must be
pinned there, so later reads do not get old data

When your write load is too high, you must then
shard

Vertical Sharding Users Tickets Events Payments

Horizontal Sharding Users 0 - 2 Users 3 - 5
Users 6 - 8 Users 9 - A

Both Users 0 - 2 Users 3 - 5 Users
6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A

Both plus caching Users 0 - 2 Users 3 -
5 Users 6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache

Teams have to scale too; nobody should have to understand
eveything in a big system.

Services allow complexity to be reduced - for a tradeoff
of speed

Users 0 - 2 Users 3 - 5 Users 6
- 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache User Service Event Service Ticket Service

User Service Event Service Ticket Service WSGI Server

Each service is its own, smaller project, managed and scaled
separately.

But how do you communicate between them?

Service 2 Service 3 Service 1 Direct Communication

Service 2 Service 3 Service 1 Service 4 Service 5

Service 2 Service 3 Service 1 Service 4 Service 5
Service 6 Service 7 Service 8

Service 2 Service 3 Service 1 Message Bus Service 2
Service 3 Service 1

A single point of failure is not always bad -
if the alternative is multiple, fragile ones

Channels and ASGI provide a standard message bus built with
certain tradeoffs

Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Channels Library
Django Django Channels Project

Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Pure Python

Failure Mode At most once Messages either do not arrive,
or arrive once. At least once Messages arrive once, or arrive multiple times

Guarantees vs. Latency Low latency Messages arrive very quickly but
go missing more Low loss rate Messages are almost never lost but arrive slower

Queuing Type First In First Out Consistent performance for all
users First In Last Out Hides backlogs but makes them worse

Queue Sizing Finite Queues Sending can fail Inﬁnite queues Makes
problems even worse

You must understand what you are making (This is surprisingly
uncommon)

Design as much as possible around shared-nothing

Per-machine caches On-demand thumbnailing Signed cookie sessions

Has to be shared? Try to split it

Has to be shared? Try sharding it.

Django's job is to be slowly replaced by your code

Just make sure you match the API contract of what
you're replacing!

Don't try to scale too early; you'll pick the wrong
tradeoffs.

Thanks. Andrew Godwin @andrewgodwin channels.readthedocs.io

Scaling Django with Distributed Systems

Scaling Django with Distributed Systems

More Decks by Andrew Godwin

Other Decks in Programming

Featured

Transcript