Collect Metrics What parts of your system are growing? How fast is your data growing? How long will your current infrastructure last? What would cause your growth rate to increase? Friday, 17 August, 12
Remove ForeignKeys class Post(models.Model): text = models.TextField() class Comment(models.Model): post = models.ForeignKey(‘Post’) class Post(models.Model): text = models.TextField() class Comment(models.Model): post_id = models.PostitiveIntegerField() Friday, 17 August, 12
Partitioning Isn’t Free ‣ No more ForeignKeys ‣ No more select_related() ‣ No more prefetch_related() ‣ More database calls* ‣ Lose the Django Admin ‣ No more cascading deletes ‣ TransactionalTestCase doesn’t rollback on secondary databases * there are strategies to minimize database calls Friday, 17 August, 12
Partition Right Away... The Good •Easy •No data migrations •No refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly Friday, 17 August, 12
Partition Right Away... The Good •Easy •No data migrations •No refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly The Ugly •Might be a waste of time •Efficiency decrease •Shipping support code •Not shipping features •Premature optimization Friday, 17 August, 12
When You Need It... The Good •You know what to partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed Friday, 17 August, 12
When You Need It... The Good •You know what to partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed The Ugly •A lot of moving parts •Often migration can only be in 1 direction •Pressure (scaling because you NEED to) •No other options Friday, 17 August, 12
Other Strategies? ‣In-app Replication ‣Out of app Replication ‣Backfill ‣Epic Downtime All valid - the best strategy depends on your app Friday, 17 August, 12
Picking a Sharding Key ‣ Usually the primary key of an major entity in your system ‣ Different for every system you try to scale ‣ Look past the `User` model Sharding key Friday, 17 August, 12
Denormalized Data ‣ Pre-process your data as it comes in rather than as it’s requested ‣ Perfect place to use NoSQL (while maintaining a canonical source of data) ‣ Perform query lookups in denormalized data rather than querying all the shards Friday, 17 August, 12
Stop Using Auto-increment for Primary Key IDS ‣ Can’t migrate data between shards ‣ Globally unique primary keys ‣ Encode meta information in primary key Friday, 17 August, 12
Sharding in the Code posts1 = Post.objects.using(‘posts_shard_01’).all() posts2 = Post.objects.using(‘posts_shard_02’).all() ... ‣ Manually select database to use ‣ Pass in the ‘shard’ you want to access QuerySet.using(...) Friday, 17 August, 12
QuerySet + Routers ‣ Use your Django routers ‣ Automatically route data to the proper shard on write ‣Still need to use QuerySet.using() on read Friday, 17 August, 12