Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimize your Python & Django apps with easy-to...

Optimize your Python & Django apps with easy-to-learn Postgres superpowers | PyCon 2020 Remote | Louise Grandjonc

In this talk you’ll learn Postgres superpowers to optimize performance of Python & Django apps. You’ll dive into the Django ORM to identify performance bottlenecks, analyzing both Python & SQL code. You’ll fish into pg_stat_statements, django-debug-toolbar, Postgres logs, keyset pagination, Azure Performance Recommendations, eradicating loops, limiting what you select—and how to scale out Postgres horizontally with Hyperscale (Citus) on Azure Database for PostgreSQL.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Optimize your Python & Django apps with easy-to-learn Postgres superpowers

    Louise Grandjonc | @louisemeta @AzureDBPostgres @citusdata
  2. A bit about me—and these <fish> Louise Grandjonc @louisemeta @AzureDBPostgres

    @citusdata • Python developer • Postgres team at Microsoft (formerly of Citus Data) • Love both Python & Postgres • Really missing PyCon this year • Enjoy complex and modern SQL • Want applications to use database at full potential
  3. AGENDA: Easy-to-learn Postgres superpowers #1: What can cause performance issues?

    #2: What tools can you use to catch them? Loops Limiting query set Pagination Improving Slow Queries @louisemeta
  4. We have to be careful @louisemeta 1. ORM executes queries

    you might not expect 2. Queries might not be optimised—& you won’t know about it 3. Fast query in your dev environment might behave very differently with your full dataset & under load
  5. pg_stat_statements / for production & dev • Postgres extension •

    Tracks statistics on queries executed by a server (once enabled) • Great to catch slow queries • Add to production environment • And to dev environment • Installed / enabled by default on Azure Database for PostgreSQL @louisemeta
  6. pg_stat_statements / analyzing performance @louisemeta 1 2 3 4 SELECT

    total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;
  7. pg_stat_statements / analyzing performance -[ RECORD 1 ]------------------------------------------------------ total_time |

    643.624 min_time | 256.424 max_time | 531.417 mean_time | 383.666666667 calls | 3 query | SELECT COUNT(*) FROM letters; @louisemeta 1 2 3 4 SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;
  8. pg_stat_statements / analyzing performance -[ RECORD 1 ]------------------------------------------------------ total_time |

    643.624 min_time | 256.424 max_time | 531.417 mean_time | 383.666666667 calls | 3 query | SELECT COUNT(*) FROM letters; @louisemeta 1 2 3 4 SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;
  9. • Identify long-running queries • Uses pg_stat_statements • Top 5

    queries by average duration per execution • Aggregated in 15-min intervals • Alter time window • View additional queries Query Performance Insights / on Azure Wait statistics in Query Performance Insight, on Microsoft Azure @louisemeta
  10. • 2 types of custom suggestions • Create Index: suggests

    new indexes for frequently-run or time- consuming queries • Drop Index: suggests drop rarely-used or redundant indexes Performance Recommendations / on Azure Azure portal performance recommendations @louisemeta
  11. django-debug-toolbar / for dev environments • Lists queries in a

    template view • Gives EXPLAIN plan for query • Trace back queries • Only when DEBUG=True in Django settings @louisemeta
  12. Logs on Postgres / are they useful? @louisemeta 1 2

    3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:
  13. Logs on Postgres / are they useful? @louisemeta 1 2

    3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:
  14. Logs on Postgres / are they useful? @louisemeta 1 2

    3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:
  15. Loops, once upon a thousand times Here is the code

    for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)
  16. Loops, once upon a thousand times Here is the code

    for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)
  17. Loops, once upon a thousand times Here is the code

    for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)
  18. Loops, once upon a thousand times Here is the code

    for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)
  19. Loops, once upon a thousand times Thanks to django-debug-toolbar, I

    can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %} <li class="treeview"> <ul class="treeview-menu"> <li><a href="{% url 'campaign_detail' campaign.id %}">Manage Campaign</a></li> {% for ad in campaign.ads.all %} <li><a href="{% url 'campaign_ad_detail' campaign.id ad.id %}">Ad #{{ ad.name|truncatechars:30 }}</a></li> {% endfor %} </ul> </li> {% endfor %}
  20. Loops, once upon a thousand times Thanks to django-debug-toolbar, I

    can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %} <li class="treeview"> <ul class="treeview-menu"> <li><a href="{% url 'campaign_detail' campaign.id %}">Manage Campaign</a></li> {% for ad in campaign.ads.all %} <li><a href="{% url 'campaign_ad_detail' campaign.id ad.id %}">Ad #{{ ad.name|truncatechars:30 }}</a></li> {% endfor %} </ul> </li> {% endfor %}
  21. Loops, once upon a thousand times Thanks to django-debug-toolbar, I

    can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %} <li class="treeview"> <ul class="treeview-menu"> <li><a href="{% url 'campaign_detail' campaign.id %}">Manage Campaign</a></li> {% for ad in campaign.ads.all %} <li><a href="{% url 'campaign_ad_detail' campaign.id ad.id %}">Ad #{{ ad.name|truncatechars:30 }}</a></li> {% endfor %} </ul> </li> {% endfor %}
  22. 1 2 3 4 5 6 7 8 9 10

    {% for campaign in campaigns %} <li class="treeview"> <ul class="treeview-menu"> <li><a href="{% url 'campaign_detail' campaign.id %}">Manage Campaign</a></li> {% for ad in campaign.ads.all %} <li><a href="{% url 'campaign_ad_detail' campaign.id ad.id %}">Ad #{{ ad.name|truncatechars:30 }}</a></li> {% endfor %} </ul> </li> {% endfor %} Loops, once upon a thousand times Thanks to django-debug-toolbar, I can trace back that it comes from my template. Indeed, I have the following code: @louisemeta
  23. Loops, once upon a thousand times Loops can easily be

    fixed, Django provides 2 functions: • select_related: usually for ForeignKey • prefetch_related: usually for ManyToMany or related objects @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company) .prefetch_related('ads')
  24. Loops, once upon a thousand times Loops can easily be

    fixed, Django provides 2 functions: • select_related: usually for ForeignKey • prefetch_related: usually for ManyToMany or related objects @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company) .prefetch_related('ads')
  25. If working with small sets, you might not notice This

    `pg_stat_statements` query helps you ID most common queries: @louisemeta 1 2 select query,calls from pg_stat_statements ORDER BY calls DESC limit 50; To avoid catastrophes on production, look at queries during dev phase to avoid loops like this What I want you to remember
  26. Limiting what you are selecting SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image",

    "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query: 1 2
  27. SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at"

    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) Limiting what you are selecting @louisemeta 1 2 By default, Django ORM selects all columns. When we do prefetch_related, here is the query:
  28. Limiting what you are selecting However, in menu, we only

    display name; we only need id & campaign_id for the link. So, the query should be: @louisemeta 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query:
  29. 1 2 Limiting what you are selecting However, in menu,

    we only display name; we only need id & campaign_id for the link. So, the query should be: @louisemeta 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query:
  30. Limiting what you are selecting You can limit columns retrieved

    by Django ORM using function only() If table has tens of columns, limiting what you pull from the database will improve page performance @louisemeta Don't let your Django ORM dictate what you should SELECT :) What I want you to remember
  31. Limiting what you are selecting You can limit columns retrieved

    by Django ORM using function only() If table has tens of columns, limiting what you pull from the database will improve page performance @louisemeta 1 2 Example of how to use the function only() Campaign.objects.filter(company_id=1).only('name')
  32. Limiting what you are selecting However—ads are in prefetch_related—so to

    limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))
  33. Limiting what you are selecting However—ads are in prefetch_related—so to

    limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))
  34. Limiting what you are selecting However—ads are in prefetch_related—so to

    limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))
  35. Suppose I generated 1M campaigns for my company. And now

    the list page is very, very slow @louisemeta
  36. Pagination / why is paginating so important? 1. Queries retrieving

    data for 1M rows is slower than 25 rows 2. The more rows, the slower! 3. When you do prefetch_related, you are doing a query with campaign_id IN () , you can imagine what it would look like for 1M rows! 4. You never actually need to see all of your rows in one page @louisemeta
  37. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  38. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  39. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  40. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  41. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  42. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  43. Pagination / Default paginator class @louisemeta 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list <QuerySet [<Campaign: Campaign object (1)>, <Campaign: Campaign object (2)>, The Paginator class in Django takes care of splitting queryset into pages
  44. Pagination / Integration in ListView (python) The generic view ListView,

    that we use for the campaigns page, provides a built-in way to paginate—by simply adding this: @louisemeta 1 2 3 4 5 6 7 8 9 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginate_by = 20 def get_queryset(self): ...
  45. Pagination / Integration in ListView (python) The generic view ListView,

    that we use for the campaigns page, provides a built-in way to paginate—by simply adding this: @louisemeta 1 2 3 4 5 6 7 8 9 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginate_by = 20 def get_queryset(self): ...
  46. Pagination / Integration in ListView (html) 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 <div class="pagination"> <span class="step-links"> <span class="page-links previous"> <a href="?page=1">&laquo; first</a> {% if page_obj.has_previous %} <a href="?page={{ page_obj.previous_page_number }}">previous</a> {% endif %} </span> <span class="page-links current"> Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}. </span> <span class="page-links next"> {% if page_obj.has_next %} <a href="?page={{ page_obj.next_page_number }}">next</a> {% endif %} <a href="?page={{ page_obj.paginator.num_pages }}">last &raquo;</a> </span> </span> </div>
  47. Pagination / Integration in ListView (html) 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 <div class="pagination"> <span class="step-links"> <span class="page-links previous"> <a href="?page=1">&laquo; first</a> {% if page_obj.has_previous %} <a href="?page={{ page_obj.previous_page_number }}">previous</a> {% endif %} </span> <span class="page-links current"> Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}. </span> <span class="page-links next"> {% if page_obj.has_next %} <a href="?page={{ page_obj.next_page_number }}">next</a> {% endif %} <a href="?page={{ page_obj.paginator.num_pages }}">last &raquo;</a> </span> </span> </div>
  48. Pagination / Integration in ListView (html) 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 <div class="pagination"> <span class="step-links"> <span class="page-links previous"> <a href="?page=1">&laquo; first</a> {% if page_obj.has_previous %} <a href="?page={{ page_obj.previous_page_number }}">previous</a> {% endif %} </span> <span class="page-links current"> Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}. </span> <span class="page-links next"> {% if page_obj.has_next %} <a href="?page={{ page_obj.next_page_number }}">next</a> {% endif %} <a href="?page={{ page_obj.paginator.num_pages }}">last &raquo;</a> </span> </span> </div>
  49. Pagination / 2 problems with default pagination 1. Count used

    by Paginator to determine # of pages is slow—the more rows, the slower it will get! 2. OFFSET … LIMIT 20 gets slower in high pages OFFSET is used to skip first N rows of a query—to do that, database must fetch rows & order, to return last ones @louisemeta
  50. Keyset pagination / why you should care Example of what

    queries are usually like with OFFSET .. LIMIT 20: 1 2 SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;
  51. Keyset pagination / why you should care Instead of having

    an OFFSET query like this: With keyset pagination, the query will have a filter on the ORDER BY fields: To learn about keyset pagination, I recommend this article by Markus Winand on his use-the-index-luke dot com website: https://aka.ms/luke-no-offset. Markus’s website is great if you want to learn more about SQL performance! 1 2 SELECT * FROM campaigns WHERE id > ? ORDER BY id LIMIT 20; 1 2 SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;
  52. Keyset pagination / django-keyset-pagination-plus An application has been written to

    simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20
  53. Keyset pagination / django-keyset-pagination-plus An application has been written to

    simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20
  54. Keyset pagination / django-keyset-pagination-plus An application has been written to

    simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20
  55. Keyset pagination / django-keyset-pagination-plus An application has been written to

    simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20
  56. Fixing a slow query / pg_stat_statements SQL code using pg_stat_statements

    to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;
  57. Fixing a slow query / pg_stat_statements SQL code using pg_stat_statements

    to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; query | SELECT * FROM ad_analytics_campaign WHERE archived is not true average_time | 219.482032024528 One of the slow queries I discover is this one:
  58. Fixing a slow query / Looking into EXPLAIN @louisemeta 1

    2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms
  59. Fixing a slow query / Looking into EXPLAIN @louisemeta 1

    2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms
  60. Fixing a slow query / Looking into EXPLAIN @louisemeta 1

    2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms
  61. Fixing a slow query / Looking into EXPLAIN @louisemeta 1

    2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms
  62. Fixing a slow query / adding a partial index Adding

    a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]
  63. Fixing a slow query / adding a partial index Adding

    a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]
  64. Fixing a slow query / adding a partial index Adding

    a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]
  65. Fixing a slow query / adding a partial index Adding

    a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]
  66. Fixing a slow query / adding a partial index Adding

    a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))] New time for the query: 100.388 ms
  67. Without partial index, this query took 219 ms SQL code

    using pg_stat_statements to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; query | SELECT * FROM ad_analytics_campaign WHERE archived is not true average_time | 219.482032024528 Remember we discovered this slow query using pg_stat_statements
  68. More on indexes in Postgres Postgres offers lots of index

    types (GIN, GiST, BTree, …) that help with performance You can learn more about indexes in my DjangoCon talk: https://aka.ms/django-PG-index-talk @louisemeta
  69. Fixing a slow query / scaling UP your server What

    if you already optimized your code, have perfect indexes—and things are still slow? Or getting slower and slower, even if you didn’t change anything? There can be many different reasons! More traffic, more usage of CPU, low memory… Your hardware resources might not be sufficient. @louisemeta This might be a good moment to scale UP—by adding disk, cpu, memory. Tip
  70. Improving performance / scaling UP your server Recommend this talk

    on how to debug, & what tools you can use. To understand if it’s time to scale up your server. https://aka.ms/all-goes-wrong-PG-talk @louisemeta
  71. What if your app is a big success? What if

    you need more scale than you can get from a single machine? @louisemeta
  72. Fixing slow queries / scaling OUT with Citus • Open

    source extension to PostgreSQL (not a fork) • Scales out Postgres horizontally • Distributes your data + your SQL queries • Multi-tenant SaaS, real-time analytics, & HTAP • Looks like single-node Postgres to your app • Now available as Hyperscale (Citus), part of Azure Database for PostgreSQL @louisemeta https://aka.ms/citus
  73. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend compaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus Shards Postgres Across Nodes: Distributed Aggregate
  74. Your 8 takeaways, the Postgres superpowers 1. pg_stat_statements to find

    slow queries 2. django-debug-toolbar is 3. Postgres logs for 100% visibility into queries 4. Loops—be careful 5. Don’t let your ORM dictate what you SELECT 6. Replace your old OFFSET pagination by keyset pagination 7. Indexes—and partial indexes—can help 8. Scale up—or scale out with Hyperscale (Citus) on Azure @louisemeta
  75. © Copyright Microsoft Corporation. All rights reserved. thank you merci

    grazie ευχαριστώ gracias ध"यवाद Louise Grandjonc • Hyperscale (Citus) Quickstart Docs: https://aka.ms/hyperscale-citus-quickstart • Talk on Debugging Postgres: https://aka.ms/all-goes-wrong-PG-talk • DjangoCon talk on Indexes: https://aka.ms/django-PG-index-talk • GitHub repo for Django Ad App: https://aka.ms/django-PG-ad-app • Git repo for Citus open source: https://aka.ms/citus • Join the Citus technical newsletter: https://aka.ms/citus-newsletter @louisemeta @AzureDBPostgres @citusdata