A talk I gave at PyCon AU 2014.
DATABASES IN THE
What does it mean?
What is 'big'?
Scalable designs are a tradeoff:
Small company? Agency?
Focus on ease of change, not scalability
You don't need to scale
from day one
But always leave yourself scaling points
It's all about schema change overhead
ID int Name text Weight uint
It's 11pm. Do you know where your locks are?
Add NULL and backfill
1-to-1 relation and backfill
DBMS-supported type changes
ZOMG RUN IT ON THE CLOUD
VMs are TERRIBLE at IO
Up to 10x slowdown, even with VT-d.
Memory is king
Your database loves it. Don't let other apps steal it.
Adding more power goes far
Especially with PostgreSQL or read-only replicas
Datasets paritioned by primary key
Implement consistent hashing on primary key
Make large number of logical shards (2048?)
Map logical shards to single physical shard
Migrate shards using replication
Entirely unrelated tables
Replicate database to new server
Route split tables there, disable replication
- or -
Slowly backfill new datastore with fallback lookup
It's not free!
Add NULL fields to dependent tables
App code to fetch and fill if not present
Possibly prefill on save of new items
Can you take inconsistent views?
Change your site!
Talk to your designers!
Deliberately introduce inconsistency!
Big Data isn't one thing
It depends on type, size, complexity,
Focus on the current problems
Future problems don't matter if you never get there
Efficiency and iterating fast matters
The smaller you are, the more time is worth
Good architecture affects product
You're not writing a system in a vacuum