Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling with Django

Scaling with Django

Scaling the database of a Django application.

Ash Christopher

August 17, 2012
Tweet

More Decks by Ash Christopher

Other Decks in Programming

Transcript

  1. The Difference def view(request): return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=1

    Performant import sleep def view(request): sleep(10) return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=10 Scalable Friday, 17 August, 12
  2. The Difference def view(request): return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=1

    Performant import sleep def view(request): sleep(10) return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=10 Scalable Yes, I realize this is a contrived example Friday, 17 August, 12
  3. Write Performance There is a limit to how fast data

    can be written to disk Friday, 17 August, 12
  4. Collect Metrics What parts of your system are growing? How

    fast is your data growing? How long will your current infrastructure last? What would cause your growth rate to increase? Friday, 17 August, 12
  5. You Will Hit a Limit Limits of MySQL 32 CPU

    256GB RAM Friday, 17 August, 12
  6. More Databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'posts': dj_database_url.config(default='mysql://localhost/posts'), 'comments':

    dj_database_url.config(default='mysql://localhost/comments'), ... } Recommend using: dj-database-url Friday, 17 August, 12
  7. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None Friday, 17 August, 12
  8. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database Friday, 17 August, 12
  9. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database ‣ Routing happens automatically Friday, 17 August, 12
  10. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database ‣ Routing happens automatically ‣ Easy to stub in Friday, 17 August, 12
  11. Remove ForeignKeys class Post(models.Model): text = models.TextField() class Comment(models.Model): post

    = models.ForeignKey(‘Post’) class Post(models.Model): text = models.TextField() class Comment(models.Model): post_id = models.PostitiveIntegerField() Friday, 17 August, 12
  12. Partitioning Isn’t Free ‣ No more ForeignKeys ‣ No more

    select_related() ‣ No more prefetch_related() ‣ More database calls* ‣ Lose the Django Admin ‣ No more cascading deletes ‣ TransactionalTestCase doesn’t rollback on secondary databases * there are strategies to minimize database calls Friday, 17 August, 12
  13. Partition Right Away... The Good •Easy •No data migrations •No

    refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly Friday, 17 August, 12
  14. Partition Right Away... The Good •Easy •No data migrations •No

    refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly The Ugly •Might be a waste of time •Efficiency decrease •Shipping support code •Not shipping features •Premature optimization Friday, 17 August, 12
  15. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system Friday, 17 August, 12
  16. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed Friday, 17 August, 12
  17. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed The Ugly •A lot of moving parts •Often migration can only be in 1 direction •Pressure (scaling because you NEED to) •No other options Friday, 17 August, 12
  18. “replacing all components of a car while driving it at

    100mph” Mike Krieger - Instagram Friday, 17 August, 12
  19. Other Strategies? ‣In-app Replication ‣Out of app Replication ‣Backfill ‣Epic

    Downtime All valid - the best strategy depends on your app Friday, 17 August, 12
  20. Data Split Across Many Databases users_shard_01 users_shard_02 users_shard_03 users_shard_04 users_shard_n

    posts_shard_01 posts_shard_02 posts_shard_03 posts_shard_04 posts_shard_n comments_shard_01 ... comments_shard_03 comments_shard_04 comments_shard_n comments_shard_02 ... ... Friday, 17 August, 12
  21. Shards are just Databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'post_shard_01':

    dj_database_url.config(default='mysql://localhost/post_shard_01'), 'post_shard_02': dj_database_url.config(default='mysql://localhost/post_shard_02'), 'post_shard_03': dj_database_url.config(default='mysql://localhost/post_shard_03'), ... } Recommend using: dj-database-url Friday, 17 August, 12
  22. Picking a Sharding Key ‣ Usually the primary key of

    an major entity in your system ‣ Different for every system you try to scale ‣ Look past the `User` model Sharding key Friday, 17 August, 12
  23. Denormalized Data ‣ Pre-process your data as it comes in

    rather than as it’s requested ‣ Perfect place to use NoSQL (while maintaining a canonical source of data) ‣ Perform query lookups in denormalized data rather than querying all the shards Friday, 17 August, 12
  24. Stop Using Auto-increment for Primary Key IDS ‣ Can’t migrate

    data between shards ‣ Globally unique primary keys ‣ Encode meta information in primary key Friday, 17 August, 12
  25. Generating ID’s ‣ Single auto-incremented field in `default` database ‣

    External software (eg. Twitter Snowflake) Friday, 17 August, 12
  26. Sharding in the Code posts1 = Post.objects.using(‘posts_shard_01’).all() posts2 = Post.objects.using(‘posts_shard_02’).all()

    ... ‣ Manually select database to use ‣ Pass in the ‘shard’ you want to access QuerySet.using(...) Friday, 17 August, 12
  27. QuerySet + Routers ‣ Use your Django routers ‣ Automatically

    route data to the proper shard on write ‣Still need to use QuerySet.using() on read Friday, 17 August, 12