Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling with Django

Scaling with Django

Scaling the database of a Django application.

0ba94480bf4840a6957fa83303be919e?s=128

Ash Christopher

August 17, 2012
Tweet

More Decks by Ash Christopher

Other Decks in Programming

Transcript

  1. SCALE (and how to do it with Django) @ashchristopher ash.christopher@gmail.com

    Friday, 17 August, 12
  2. Scalability (What is it?) Friday, 17 August, 12

  3. Scalability != Performance Friday, 17 August, 12

  4. The Difference Friday, 17 August, 12

  5. The Difference def view(request): return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=1

    Performant Friday, 17 August, 12
  6. The Difference def view(request): return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=1

    Performant import sleep def view(request): sleep(10) return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=10 Scalable Friday, 17 August, 12
  7. The Difference def view(request): return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=1

    Performant import sleep def view(request): sleep(10) return HttpResponse('Hello World') ./manage.py run_gunicorn --workers=10 Scalable Yes, I realize this is a contrived example Friday, 17 August, 12
  8. What to scale... Application Database Friday, 17 August, 12

  9. Focus on the Database (because it’s non-trivial) Friday, 17 August,

    12
  10. Two main reasons to scale your database Friday, 17 August,

    12
  11. Write Performance There is a limit to how fast data

    can be written to disk Friday, 17 August, 12
  12. Massive Volumes of Data Friday, 17 August, 12

  13. Massive Volumes of Data A database can only store so

    much... Friday, 17 August, 12
  14. Before you scale, you need to know what to scale

    Friday, 17 August, 12
  15. Collect Metrics What parts of your system are growing? How

    fast is your data growing? How long will your current infrastructure last? What would cause your growth rate to increase? Friday, 17 August, 12
  16. Analyze Data Models (Analyze Relationships between Models) Easier to Scale

    Harder to Scale Friday, 17 August, 12
  17. You’re Ready to Start Scaling! Friday, 17 August, 12

  18. Scale Up Before Out Friday, 17 August, 12

  19. Invest in Faster Hardware Friday, 17 August, 12

  20. Invest in Faster Hardware More RAM Friday, 17 August, 12

  21. Invest in Faster Hardware More RAM SSD Harddrives Friday, 17

    August, 12
  22. Outrun The Problem Friday, 17 August, 12

  23. You Will Hit a Limit Limits of MySQL 32 CPU

    256GB RAM Friday, 17 August, 12
  24. Functional Partitioning (aka. Feature Partitioning) (aka. Vertical Partitioning) (...) Friday,

    17 August, 12
  25. Functional Partitioning the Internet Users Posts Comments Friday, 17 August,

    12
  26. More Databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'posts': dj_database_url.config(default='mysql://localhost/posts'), 'comments':

    dj_database_url.config(default='mysql://localhost/comments'), ... } Recommend using: dj-database-url Friday, 17 August, 12
  27. Django Routers Friday, 17 August, 12

  28. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None Friday, 17 August, 12
  29. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database Friday, 17 August, 12
  30. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database ‣ Routing happens automatically Friday, 17 August, 12
  31. Django Routers class SimpleRouter(object): def db_for_read(self, model, **hints): # return

    database or None def db_for_write(self, model, **hints): # return database or None def allow_relation(self, obj1, obj2, **hints): # return True, False or None def allow_syncdb(self, db, model): # return True, False or None ‣ Split data to different database ‣ Routing happens automatically ‣ Easy to stub in Friday, 17 August, 12
  32. Remove ForeignKeys class Post(models.Model): text = models.TextField() class Comment(models.Model): post

    = models.ForeignKey(‘Post’) class Post(models.Model): text = models.TextField() class Comment(models.Model): post_id = models.PostitiveIntegerField() Friday, 17 August, 12
  33. Partitioning Isn’t Free ‣ No more ForeignKeys ‣ No more

    select_related() ‣ No more prefetch_related() ‣ More database calls* ‣ Lose the Django Admin ‣ No more cascading deletes ‣ TransactionalTestCase doesn’t rollback on secondary databases * there are strategies to minimize database calls Friday, 17 August, 12
  34. Treat Databases as Lookup Tables Friday, 17 August, 12

  35. When Should You Feature Partition? Friday, 17 August, 12

  36. Partition Right Away... The Good •Easy •No data migrations •No

    refactoring Friday, 17 August, 12
  37. Partition Right Away... The Good •Easy •No data migrations •No

    refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly Friday, 17 August, 12
  38. Partition Right Away... The Good •Easy •No data migrations •No

    refactoring The Bad •You don’t have metrics •A lot of overhead •Codebase gets ‘gross’ really quickly The Ugly •Might be a waste of time •Efficiency decrease •Shipping support code •Not shipping features •Premature optimization Friday, 17 August, 12
  39. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system Friday, 17 August, 12
  40. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed Friday, 17 August, 12
  41. When You Need It... The Good •You know what to

    partition •You know usage patterns •Familiar with the system The Bad •Scaling a live system •Scaling a large system •Multi-part migrations •Massive amount of planning needed The Ugly •A lot of moving parts •Often migration can only be in 1 direction •Pressure (scaling because you NEED to) •No other options Friday, 17 August, 12
  42. “replacing all components of a car while driving it at

    100mph” Mike Krieger - Instagram Friday, 17 August, 12
  43. Our Strategy Friday, 17 August, 12

  44. Other Strategies? ‣In-app Replication ‣Out of app Replication ‣Backfill ‣Epic

    Downtime All valid - the best strategy depends on your app Friday, 17 August, 12
  45. Phew... Friday, 17 August, 12

  46. It’s Not Over Yet Friday, 17 August, 12

  47. Horizontal Partitioning (aka. Sharding) Friday, 17 August, 12

  48. Data Split Across Many Databases users_shard_01 users_shard_02 users_shard_03 users_shard_04 users_shard_n

    posts_shard_01 posts_shard_02 posts_shard_03 posts_shard_04 posts_shard_n comments_shard_01 ... comments_shard_03 comments_shard_04 comments_shard_n comments_shard_02 ... ... Friday, 17 August, 12
  49. Shards are just Databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'post_shard_01':

    dj_database_url.config(default='mysql://localhost/post_shard_01'), 'post_shard_02': dj_database_url.config(default='mysql://localhost/post_shard_02'), 'post_shard_03': dj_database_url.config(default='mysql://localhost/post_shard_03'), ... } Recommend using: dj-database-url Friday, 17 August, 12
  50. Picking a Sharding Key ‣ Usually the primary key of

    an major entity in your system ‣ Different for every system you try to scale ‣ Look past the `User` model Sharding key Friday, 17 August, 12
  51. Pick the Wrong Sharding Key? ... Query Friday, 17 August,

    12
  52. Friday, 17 August, 12

  53. Denormalized Data ‣ Pre-process your data as it comes in

    rather than as it’s requested ‣ Perfect place to use NoSQL (while maintaining a canonical source of data) ‣ Perform query lookups in denormalized data rather than querying all the shards Friday, 17 August, 12
  54. Stop Using Auto-increment for Primary Key IDS ‣ Can’t migrate

    data between shards ‣ Globally unique primary keys ‣ Encode meta information in primary key Friday, 17 August, 12
  55. Generating ID’s ‣ Single auto-incremented field in `default` database ‣

    External software (eg. Twitter Snowflake) Friday, 17 August, 12
  56. Sharding in the Code posts1 = Post.objects.using(‘posts_shard_01’).all() posts2 = Post.objects.using(‘posts_shard_02’).all()

    ... ‣ Manually select database to use ‣ Pass in the ‘shard’ you want to access QuerySet.using(...) Friday, 17 August, 12
  57. QuerySet + Routers ‣ Use your Django routers ‣ Automatically

    route data to the proper shard on write ‣Still need to use QuerySet.using() on read Friday, 17 August, 12
  58. Links dj-database-url https://github.com/kennethreitz/dj-database-url django-multidb-patterns https://github.com/malcolmt/django-multidb-patterns High Scalability http://highscalability.com Friday, 17

    August, 12
  59. @ashchristopher ash.christopher@gmail.com Friday, 17 August, 12