×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
No content
Slide 2
Slide 2 text
Andrew Godwin Hi, I'm Django core developer Senior Software Engineer at Used to complain about migrations a lot
Slide 3
Slide 3 text
Distributed Systems
Slide 4
Slide 4 text
c = 299,792,458 m/s
Slide 5
Slide 5 text
Early CPUs c = 60m propagation distance Clock ~2cm 5 MHz
Slide 6
Slide 6 text
Modern CPUs c = 10cm propagation distance 3 GHz
Slide 7
Slide 7 text
Distributed systems are made of independent components
Slide 8
Slide 8 text
They are slower and harder to write than synchronous systems
Slide 9
Slide 9 text
But they can be scaled up much, much further
Slide 10
Slide 10 text
Trade-offs
Slide 11
Slide 11 text
There is never a perfect solution.
Slide 12
Slide 12 text
Fast Good Cheap
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Load Balancer WSGI Worker WSGI Worker WSGI Worker
Slide 15
Slide 15 text
Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache
Slide 16
Slide 16 text
Load Balancer WSGI Worker WSGI Worker WSGI Worker Cache Cache Cache
Slide 17
Slide 17 text
Load Balancer WSGI Worker WSGI Worker WSGI Worker Database
Slide 18
Slide 18 text
CAP Theorem
Slide 19
Slide 19 text
Partition Tolerant Consistent Available
Slide 20
Slide 20 text
PostgreSQL: CP Consistent everywhere Handles network latency/drops Can't write if main server is down
Slide 21
Slide 21 text
Cassandra: AP Can read/write to any node Handles network latency/drops Data can be inconsistent
Slide 22
Slide 22 text
It's hard to design a product that might be inconsistent
Slide 23
Slide 23 text
But if you take the tradeoff, scaling is easy
Slide 24
Slide 24 text
Otherwise, you must find other solutions
Slide 25
Slide 25 text
Read Replicas (often called master/slave) Load Balancer WSGI Worker WSGI Worker WSGI Worker Replica Replica Main
Slide 26
Slide 26 text
Replicas scale reads forever... But writes must go to one place
Slide 27
Slide 27 text
If a request writes to a table it must be pinned there, so later reads do not get old data
Slide 28
Slide 28 text
When your write load is too high, you must then shard
Slide 29
Slide 29 text
Vertical Sharding Users Tickets Events Payments
Slide 30
Slide 30 text
Horizontal Sharding Users 0 - 2 Users 3 - 5 Users 6 - 8 Users 9 - A
Slide 31
Slide 31 text
Both Users 0 - 2 Users 3 - 5 Users 6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A
Slide 32
Slide 32 text
Both plus caching Users 0 - 2 Users 3 - 5 Users 6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache
Slide 33
Slide 33 text
Teams have to scale too; nobody should have to understand eveything in a big system.
Slide 34
Slide 34 text
Services allow complexity to be reduced - for a tradeoff of speed
Slide 35
Slide 35 text
Users 0 - 2 Users 3 - 5 Users 6 - 8 Users 9 - A Events 0 - 2 Events 3 - 5 Events 6 - 8 Events 9 - A Tickets 0 - 2 Tickets 3 - 5 Tickets 6 - 8 Tickets 9 - A User Cache Event Cache Ticket Cache User Service Event Service Ticket Service
Slide 36
Slide 36 text
User Service Event Service Ticket Service WSGI Server
Slide 37
Slide 37 text
Each service is its own, smaller project, managed and scaled separately.
Slide 38
Slide 38 text
But how do you communicate between them?
Slide 39
Slide 39 text
Service 2 Service 3 Service 1 Direct Communication
Slide 40
Slide 40 text
Service 2 Service 3 Service 1 Service 4 Service 5
Slide 41
Slide 41 text
Service 2 Service 3 Service 1 Service 4 Service 5 Service 6 Service 7 Service 8
Slide 42
Slide 42 text
Service 2 Service 3 Service 1 Message Bus Service 2 Service 3 Service 1
Slide 43
Slide 43 text
A single point of failure is not always bad - if the alternative is multiple, fragile ones
Slide 44
Slide 44 text
Channels and ASGI provide a standard message bus built with certain tradeoffs
Slide 45
Slide 45 text
Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Channels Library Django Django Channels Project
Slide 46
Slide 46 text
Backing Store e.g. Redis, RabbitMQ ASGI (Channel Layer) Pure Python
Slide 47
Slide 47 text
Failure Mode At most once Messages either do not arrive, or arrive once. At least once Messages arrive once, or arrive multiple times
Slide 48
Slide 48 text
Guarantees vs. Latency Low latency Messages arrive very quickly but go missing more Low loss rate Messages are almost never lost but arrive slower
Slide 49
Slide 49 text
Queuing Type First In First Out Consistent performance for all users First In Last Out Hides backlogs but makes them worse
Slide 50
Slide 50 text
Queue Sizing Finite Queues Sending can fail Infinite queues Makes problems even worse
Slide 51
Slide 51 text
You must understand what you are making (This is surprisingly uncommon)
Slide 52
Slide 52 text
Design as much as possible around shared-nothing
Slide 53
Slide 53 text
Per-machine caches On-demand thumbnailing Signed cookie sessions
Slide 54
Slide 54 text
Has to be shared? Try to split it
Slide 55
Slide 55 text
Has to be shared? Try sharding it.
Slide 56
Slide 56 text
Django's job is to be slowly replaced by your code
Slide 57
Slide 57 text
Just make sure you match the API contract of what you're replacing!
Slide 58
Slide 58 text
Don't try to scale too early; you'll pick the wrong tradeoffs.
Slide 59
Slide 59 text
Thanks. Andrew Godwin @andrewgodwin channels.readthedocs.io