Upgrade to Pro — share decks privately, control downloads, hide ads and more …

django in the real world

Israel Fermín Montilla
November 16, 2018
180

django in the real world

Django is an extremely popular web framework written in Python, it's been used to build all kinds of web applications, from blogs and news sites all the way to web APIs to be consumed by mobile apps. There's a huge variety of django based webapps out there at all scales, from small personal sites to huge complex systems like Disqus or Instagram.

It's true that by moving towards a microservices oriented architecture, some of the scalability issues go away, but it comes at an expensive cost by adding the complexities of having a distributed system.

In this talk, I go through a series of recommendations to tweak and optimize your django projects when you need to scale, some practices that you should try before breaking everything into microservices, from the database, going through caching and finally few techniques to optimize the template layer.

Israel Fermín Montilla

November 16, 2018
Tweet

Transcript

  1. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . django in the real world yes! it scales!... YAY! Israel Fermin Montilla Tech Lead @ Careem November 16, 2018
  2. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . from iferminm import more_data ▶ Tech Lead @ Careem ▶ Venezuelan living in Dubai, UAE ▶ T: @iferminm ▶ blog: http://iffm.me
  3. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What will we see in this talk? ▶ Pareto Principle ▶ The simple django project ▶ Measuring ▶ Common bottlenecks
  4. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic concepts: Pareto principle
  5. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic concepts: Pareto principle The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes –Wikipedia
  6. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic concepts: Pareto principle The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes –Wikipedia For example: 20% of the code produces 80% of the bugs.
  7. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial django project in production Figure: Basic django project production setup
  8. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Profile first
  9. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . django-debug-toolbar Figure: debug_toolbar in action
  10. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cProfile + snakeviz Figure: snakeviz list view
  11. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cProfile + snakeviz Figure: snakeviz sunburst diagram
  12. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vprof Figure: vprof code heatmap
  13. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vprof Figure: vprof flame diagram
  14. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vprof Figure: vprof memory profiler
  15. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vprof Figure: vprof profiler
  16. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . newrelic Figure: Part of newrelic’s main dashboard
  17. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . newrelic Figure: part of newrelic’s main dashboard
  18. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . newrelic Figure: Inside a web transaction in newrelic
  19. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database
  20. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) 4 f o r r in t r i p s : 5 customers [ r . customer . name ] += 1
  21. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) 4 f o r r in t r i p s : 5 customers [ r . customer . name ] += 1 N hits to the database
  22. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) 4 f o r r in t r i p s : 5 customers [ r . customer . name ] += 1 N hits to the database 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) . s e l e c t _ r e l a t e d ( ' customer ' )
  23. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) 4 f o r r in t r i p s : 5 customers [ r . customer . name ] += 1 N hits to the database 1 t r i p s = Trips . o b j e c t s . f i l t e r ( 2 captain_id=captain_id 3 ) . s e l e c t _ r e l a t e d ( ' customer ' ) Will join the table and return it in one hit
  24. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts ▶ select_related ▶ prefetch_related
  25. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts ▶ Use it wisely and measure 1 customer = Customer . o b j e c t s . s e l e c t _ r e l a t e d ( 2 ' t r i p s ' 3 ) . get ( pk=request . data [ ' user_id ' ] ) 4 5 # No a d d i t i o n a l query 6 customer . t r i p s . a l l ()
  26. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts ▶ Use it wisely and measure 1 customer = Customer . o b j e c t s . s e l e c t _ r e l a t e d ( 2 ' t r i p s ' 3 ) . get ( pk=request . data [ ' user_id ' ] ) 4 5 # No a d d i t i o n a l query 6 customer . t r i p s . a l l () 1 # T r i g g e r s an a d d i t i o n a l query 2 customer . t r i p s . f i l t e r ( s t a t u s=' cancelled_by_user ' ) 3 4 # Sometimes i t ' s b e t t e r to use the cached r e s u l t 5 # and f i l t e r in memory 6 [ t f o r t in customer . t r i p s . a l l () i f t . s t a t u s == ' cancelled_by_user ' ]
  27. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query counts Use the Prefetch object! 1 # A product has many s u b s c r i p t i o n s and 2 # a s u b s c r i p t i o n can have many products 3 4 queryset = S u b s c r i p t i o n . o b j e c t s . f i l t e r ( 5 s t a t u s=' e x p i r e d ' 6 ) . s e l e c t _ r e l a t e d ( ' t r i p s ' ) 7 8 p r e f e t c h = Prefetch ( ' s u b s c r i p t i o n s ' , 9 queryset=queryset ) 10 products = Product . o b j e c t s . p r e f e t c h _ r e l a t e d ( 11 p r e f e t c h 12 ) . f i l t e r ( l o c a t i o n=' l a h o r e ' )
  28. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query time
  29. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query time ▶ Indexing
  30. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query time ▶ Indexing 1 c l a s s C a p t a i n P r o f i l e ( models . Model ) : 2 user = models . ForeignKey ( ' auth_user ' ) 3 dob = models . DateField ( db_index=True ) 4 e x t e r n a l _ i d = models . I n t e g e r F i e l d ( 5 db_index=True 6 )
  31. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce query time ▶ Indexing 1 c l a s s C a p t a i n P r o f i l e ( models . Model ) : 2 user = models . ForeignKey ( ' auth_user ' ) 3 dob = models . DateField ( db_index=True ) 4 e x t e r n a l _ i d = models . I n t e g e r F i e l d ( 5 db_index=True 6 ) Note: Your DBMS updates your indices in write time (INSERT and UPDATE)
  32. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some notes on indexing ▶ You need to measure before you do it. Run EXPLAIN on the query (Seq scan) ▶ Index by workload ▶ If you filter on multiple columns use index_together Meta option ▶ Check if the index is used before you push it. Run EXPLAIN again
  33. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expensive JOINs Sometimes you might want to separate them into two different queries. 1 # You may want to see the c r e d i t spending behavior of your u s e r s 2 C r e d i t . o b j e c t s . f i l t e r ( 3 subscription__product__location=' jeddah ' 4 ) . s e l e c t _ r e l a t e d ( ' t r i p ' ) 5 6 # Sometimes two q u e r i e s might perform b e t t e r 7 subs_ids = S u b s c r i p t i o n . o b j e c t s . f i l t e r ( 8 product__location=' jeddah ' 9 ) . v a l u e s _ l i s t ( ' id ' , f l a t=True ) 10 11 C r e d i t . o b j e c t s . f i l t e r ( 12 subscription_id__in=subs_ids 13 ) . s e l e c t _ r e l a t e d ( ' t r i p ' )
  34. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ALWAYS MEASURE
  35. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid whole table COUNT() queries After some point, having exact numbers is not important 1 Trip . o b j e c t s . count ()
  36. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid whole table COUNT() queries After some point, having exact numbers is not important 1 Trip . o b j e c t s . count () You can instead do a raw SQL query 1 # Postgres 2 SELECT r e l t u p l e s FROM pg_class 3 WHERE relname = ' t r i p ' 4 5 # MySQL 6 SELECT table_rows FROM information_schema . t a b l e s 7 WHERE table_schema = DATABASE() 8 AND table_name = ' t r i p '
  37. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid whole table COUNT() queries After some point, having exact numbers is not important 1 Trip . o b j e c t s . count () You can instead do a raw SQL query 1 # Postgres 2 SELECT r e l t u p l e s FROM pg_class 3 WHERE relname = ' t r i p ' 4 5 # MySQL 6 SELECT table_rows FROM information_schema . t a b l e s 7 WHERE table_schema = DATABASE() 8 AND table_name = ' t r i p ' This could reduce up to 90% response time
  38. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use persistent connections 1 DATABASES = { 2 ' d e f a u l t ' : { 3 # The usual . . . 4 'CONN_MAX_AGE' : None , 5 } 6 }
  39. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Know your ORM ▶ Read the full ORM docs at least once ▶ Use F expressions to reference values within the queryset ▶ Use Q expressions for advanced filters ▶ Explore the aggregation framework ▶ Use values(), values_list(), only() and defer() when the results are too big
  40. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Denormalize ▶ Evaluate huge joins ▶ Don’t use Generic Relations
  41. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Denormalize ▶ Evaluate huge joins ▶ Don’t use Generic Relations Figure: Response time reduction after denormalizing a Generic Relation
  42. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query caching ▶ johny-cache ▶ django-cache-machine
  43. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Templates
  44. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Russian Doll Caching
  45. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Russian Doll Caching 1 {% cache MIDDLE_TTL ” posts ” request .GET. page %} 2 {% i n c l u d e ” s e c t i o n s /dev/ postheader . html ” %} 3 <div c l a s s=” post−l i s t ”> 4 {% f o r post in posts %} 5 {% cache LONG_TTL ” post_teaser ” post_id post . last_updated %} 6 {% i n c l u d e ” s e c t i o n s /dev/ post_teaser . html ” %} 7 {% endcache %} 8 {% endfor %} 9 {% endcache %}
  46. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Optimization
  47. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further optimization ▶ Minimize your CSS and JS (django-compressor, webassets or django-pipeline) ▶ Optimize your static images ▶ Optimize user uploaded images ▶ Serve your media and static content from a CDN ▶ Do slow work later... (celery or python-rq) ▶ Use slave replicas for read operations (and database routers)
  48. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  49. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions?
  50. . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thank you! **** We’re hiring **** **** [email protected] ****