Slide 1

Slide 1 text

Optimize your Python & Django apps with easy-to-learn Postgres superpowers Louise Grandjonc | @louisemeta @AzureDBPostgres @citusdata

Slide 2

Slide 2 text

A bit about me—and these Louise Grandjonc @louisemeta @AzureDBPostgres @citusdata • Python developer • Postgres team at Microsoft (formerly of Citus Data) • Love both Python & Postgres • Really missing PyCon this year • Enjoy complex and modern SQL • Want applications to use database at full potential

Slide 3

Slide 3 text

AGENDA: Easy-to-learn Postgres superpowers #1: What can cause performance issues? #2: What tools can you use to catch them? Loops Limiting query set Pagination Improving Slow Queries @louisemeta

Slide 4

Slide 4 text

What can cause performance issues? @louisemeta

Slide 5

Slide 5 text

ORMs are very handy tools, helping us build powerful applications in short amount of time… @louisemeta

Slide 6

Slide 6 text

We have to be careful @louisemeta 1. ORM executes queries you might not expect 2. Queries might not be optimised—& you won’t know about it 3. Fast query in your dev environment might behave very differently with your full dataset & under load

Slide 7

Slide 7 text

Postgres superpowers to help you catch performance issues @louisemeta

Slide 8

Slide 8 text

pg_stat_statements / for production & dev • Postgres extension • Tracks statistics on queries executed by a server (once enabled) • Great to catch slow queries • Add to production environment • And to dev environment • Installed / enabled by default on Azure Database for PostgreSQL @louisemeta

Slide 9

Slide 9 text

pg_stat_statements / analyzing performance @louisemeta 1 2 3 4 SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;

Slide 10

Slide 10 text

pg_stat_statements / analyzing performance -[ RECORD 1 ]------------------------------------------------------ total_time | 643.624 min_time | 256.424 max_time | 531.417 mean_time | 383.666666667 calls | 3 query | SELECT COUNT(*) FROM letters; @louisemeta 1 2 3 4 SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;

Slide 11

Slide 11 text

pg_stat_statements / analyzing performance -[ RECORD 1 ]------------------------------------------------------ total_time | 643.624 min_time | 256.424 max_time | 531.417 mean_time | 383.666666667 calls | 3 query | SELECT COUNT(*) FROM letters; @louisemeta 1 2 3 4 SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;

Slide 12

Slide 12 text

• Identify long-running queries • Uses pg_stat_statements • Top 5 queries by average duration per execution • Aggregated in 15-min intervals • Alter time window • View additional queries Query Performance Insights / on Azure Wait statistics in Query Performance Insight, on Microsoft Azure @louisemeta

Slide 13

Slide 13 text

• 2 types of custom suggestions • Create Index: suggests new indexes for frequently-run or time- consuming queries • Drop Index: suggests drop rarely-used or redundant indexes Performance Recommendations / on Azure Azure portal performance recommendations @louisemeta

Slide 14

Slide 14 text

@louisemeta django-debug-toolbar

Slide 15

Slide 15 text

django-debug-toolbar / for dev environments • Lists queries in a template view • Gives EXPLAIN plan for query • Trace back queries • Only when DEBUG=True in Django settings @louisemeta

Slide 16

Slide 16 text

Logs on Postgres / are they useful? @louisemeta 1 2 3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:

Slide 17

Slide 17 text

Logs on Postgres / are they useful? @louisemeta 1 2 3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:

Slide 18

Slide 18 text

Logs on Postgres / are they useful? @louisemeta 1 2 3 4 5 6 /* WARNING: Only do this in local environment */ /* or you might end up with very large log files */ log_statement = 'all' logging_collector = on log_min_duration_statement = 0 • Best if have different environments (Golang API, django apps, etc.) • If template is making Ajax calls to API, django-debug-toolbar won't display queries, which makes Postgres logs especially useful! • To have Postgres log all your queries, you have to change your config:

Slide 19

Slide 19 text

Loops, once upon a thousand times @louisemeta

Slide 20

Slide 20 text

About today’s sample application: ”Ad & Campaign App” https://aka.ms/django-PG-ad-app

Slide 21

Slide 21 text

About the schema

Slide 22

Slide 22 text

On my website, have a page to list Campaigns

Slide 23

Slide 23 text

Loops, once upon a thousand times Here is the code for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)

Slide 24

Slide 24 text

Loops, once upon a thousand times Here is the code for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)

Slide 25

Slide 25 text

Loops, once upon a thousand times Here is the code for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)

Slide 26

Slide 26 text

Loops, once upon a thousand times Here is the code for this view @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company)

Slide 27

Slide 27 text

In django-debug-toolbar, I have over 100 queries

Slide 28

Slide 28 text

Loops, once upon a thousand times Thanks to django-debug-toolbar, I can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %}
  • {% endfor %}

    Slide 29

    Slide 29 text

    Loops, once upon a thousand times Thanks to django-debug-toolbar, I can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %}
  • {% endfor %}

    Slide 30

    Slide 30 text

    Loops, once upon a thousand times Thanks to django-debug-toolbar, I can trace back that it comes from my template. Indeed, I have the following code: @louisemeta 1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %}
  • {% endfor %}

    Slide 31

    Slide 31 text

    1 2 3 4 5 6 7 8 9 10 {% for campaign in campaigns %}
  • {% endfor %} Loops, once upon a thousand times Thanks to django-debug-toolbar, I can trace back that it comes from my template. Indeed, I have the following code: @louisemeta

    Slide 32

    Slide 32 text

    Loops, once upon a thousand times Loops can easily be fixed, Django provides 2 functions: • select_related: usually for ForeignKey • prefetch_related: usually for ManyToMany or related objects @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company) .prefetch_related('ads')

    Slide 33

    Slide 33 text

    Loops, once upon a thousand times Loops can easily be fixed, Django provides 2 functions: • select_related: usually for ForeignKey • prefetch_related: usually for ManyToMany or related objects @louisemeta 1 2 3 4 5 6 7 8 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' def get_queryset(self): queryset = super().get_queryset() return queryset.filter(company=self.request.user.employee.company) .prefetch_related('ads')

    Slide 34

    Slide 34 text

    Going from 100 queries down to 6 queries

    Slide 35

    Slide 35 text

    @louisemeta

    Slide 36

    Slide 36 text

    If working with small sets, you might not notice This `pg_stat_statements` query helps you ID most common queries: @louisemeta 1 2 select query,calls from pg_stat_statements ORDER BY calls DESC limit 50; To avoid catastrophes on production, look at queries during dev phase to avoid loops like this What I want you to remember

    Slide 37

    Slide 37 text

    Limiting what you are selecting @louisemeta

    Slide 38

    Slide 38 text

    Limiting what you are selecting SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query: 1 2

    Slide 39

    Slide 39 text

    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) Limiting what you are selecting @louisemeta 1 2 By default, Django ORM selects all columns. When we do prefetch_related, here is the query:

    Slide 40

    Slide 40 text

    Limiting what you are selecting However, in menu, we only display name; we only need id & campaign_id for the link. So, the query should be: @louisemeta 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query:

    Slide 41

    Slide 41 text

    1 2 Limiting what you are selecting However, in menu, we only display name; we only need id & campaign_id for the link. So, the query should be: @louisemeta 1 2 SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image", "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count", "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at", "ad_analytics_ads"."updated_at" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id" FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...) By default, Django ORM selects all columns. When we do prefetch_related, here is the query:

    Slide 42

    Slide 42 text

    Limiting what you are selecting You can limit columns retrieved by Django ORM using function only() If table has tens of columns, limiting what you pull from the database will improve page performance @louisemeta Don't let your Django ORM dictate what you should SELECT :) What I want you to remember

    Slide 43

    Slide 43 text

    Limiting what you are selecting You can limit columns retrieved by Django ORM using function only() If table has tens of columns, limiting what you pull from the database will improve page performance @louisemeta 1 2 Example of how to use the function only() Campaign.objects.filter(company_id=1).only('name')

    Slide 44

    Slide 44 text

    Limiting what you are selecting However—ads are in prefetch_related—so to limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))

    Slide 45

    Slide 45 text

    Limiting what you are selecting However—ads are in prefetch_related—so to limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))

    Slide 46

    Slide 46 text

    Limiting what you are selecting However—ads are in prefetch_related—so to limit queryset to the right fields, we use Prefetch class @louisemeta 1 2 3 4 5 6 7 8 9 10 from django.db.models import Prefetch … def get_queryset(self): queryset = super().get_queryset() return (queryset .filter(company=self.request.user.employee.company) .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id', 'campaign_id', 'name'))) .order_by('id'))

    Slide 47

    Slide 47 text

    Pagination @louisemeta

    Slide 48

    Slide 48 text

    Suppose I generated 1M campaigns for my company. And now the list page is very, very slow @louisemeta

    Slide 49

    Slide 49 text

    Pagination / why is paginating so important? 1. Queries retrieving data for 1M rows is slower than 25 rows 2. The more rows, the slower! 3. When you do prefetch_related, you are doing a query with campaign_id IN () , you can imagine what it would look like for 1M rows! 4. You never actually need to see all of your rows in one page @louisemeta

    Slide 50

    Slide 50 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 51

    Slide 51 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 52

    Slide 52 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 53

    Slide 53 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 54

    Slide 54 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 55

    Slide 55 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 56

    Slide 56 text

    Pagination / Default paginator class @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> from django.core.paginator import Paginator >>> queryset = Campaign.objects.filter(company_id=1).order_by('id') >>> p = Paginator(queryset, 25) >>> p.num_pages 40000 >>> page = p.page(1) >>> page.has_next() True >>> page.has_previous() False >>> page.next_page_number() 2 >>> page.object_list , , The Paginator class in Django takes care of splitting queryset into pages

    Slide 57

    Slide 57 text

    Pagination / Integration in ListView (python) The generic view ListView, that we use for the campaigns page, provides a built-in way to paginate—by simply adding this: @louisemeta 1 2 3 4 5 6 7 8 9 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginate_by = 20 def get_queryset(self): ...

    Slide 58

    Slide 58 text

    Pagination / Integration in ListView (python) The generic view ListView, that we use for the campaigns page, provides a built-in way to paginate—by simply adding this: @louisemeta 1 2 3 4 5 6 7 8 9 class CampaignListView(LoginRequiredMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginate_by = 20 def get_queryset(self): ...

    Slide 59

    Slide 59 text

    Pagination / Integration in ListView (html) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Slide 60

    Slide 60 text

    Pagination / Integration in ListView (html) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Slide 61

    Slide 61 text

    Pagination / Integration in ListView (html) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Slide 62

    Slide 62 text

    Pagination / Integration in ListView

    Slide 63

    Slide 63 text

    Pagination / 2 problems with default pagination 1. Count used by Paginator to determine # of pages is slow—the more rows, the slower it will get! 2. OFFSET … LIMIT 20 gets slower in high pages OFFSET is used to skip first N rows of a query—to do that, database must fetch rows & order, to return last ones @louisemeta

    Slide 64

    Slide 64 text

    Keyset pagination / why you should care Example of what queries are usually like with OFFSET .. LIMIT 20: 1 2 SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;

    Slide 65

    Slide 65 text

    Keyset pagination / why you should care Instead of having an OFFSET query like this: With keyset pagination, the query will have a filter on the ORDER BY fields: To learn about keyset pagination, I recommend this article by Markus Winand on his use-the-index-luke dot com website: https://aka.ms/luke-no-offset. Markus’s website is great if you want to learn more about SQL performance! 1 2 SELECT * FROM campaigns WHERE id > ? ORDER BY id LIMIT 20; 1 2 SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;

    Slide 66

    Slide 66 text

    Keyset pagination / django-keyset-pagination-plus An application has been written to simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20

    Slide 67

    Slide 67 text

    Keyset pagination / django-keyset-pagination-plus An application has been written to simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20

    Slide 68

    Slide 68 text

    Keyset pagination / django-keyset-pagination-plus An application has been written to simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20

    Slide 69

    Slide 69 text

    Keyset pagination / django-keyset-pagination-plus An application has been written to simplify your use of keyset pagination: https://pypi.org/project/django-keyset-pagination-plus/ @louisemeta 1 2 3 4 5 6 7 8 9 10 11 from keyset_pagination.paginator import KeysetPaginator from keyset_pagination.mixin import PaginateMixin class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView): model = Campaign template_name = 'ad_analytics/campaigns/list.html' context_object_name = 'campaigns' paginator_class = KeysetPaginator paginate_by = 20

    Slide 70

    Slide 70 text

    Keyset pagination

    Slide 71

    Slide 71 text

    Fixing slow queries @louisemeta

    Slide 72

    Slide 72 text

    Fixing a slow query / pg_stat_statements SQL code using pg_stat_statements to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;

    Slide 73

    Slide 73 text

    Fixing a slow query / pg_stat_statements SQL code using pg_stat_statements to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; query | SELECT * FROM ad_analytics_campaign WHERE archived is not true average_time | 219.482032024528 One of the slow queries I discover is this one:

    Slide 74

    Slide 74 text

    Fixing a slow query / Looking into EXPLAIN @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms

    Slide 75

    Slide 75 text

    Fixing a slow query / Looking into EXPLAIN @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms

    Slide 76

    Slide 76 text

    Fixing a slow query / Looking into EXPLAIN @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms

    Slide 77

    Slide 77 text

    Fixing a slow query / Looking into EXPLAIN @louisemeta 1 2 3 4 5 6 7 8 9 10 11 12 postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true; QUERY PLAN ------------------------------------------------------------------------------------ ------ Gather (cost=1000.00..30858.95 rows=133 width=62) Workers Planned: 2 -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55 width=62) Filter: (archived IS NOT TRUE) (4 rows) Time: 100.985 ms

    Slide 78

    Slide 78 text

    Fixing a slow query / adding a partial index Adding a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]

    Slide 79

    Slide 79 text

    Fixing a slow query / adding a partial index Adding a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]

    Slide 80

    Slide 80 text

    Fixing a slow query / adding a partial index Adding a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]

    Slide 81

    Slide 81 text

    Fixing a slow query / adding a partial index Adding a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))]

    Slide 82

    Slide 82 text

    Fixing a slow query / adding a partial index Adding a partial index on the column: @louisemeta 1 2 3 4 5 6 7 class Campaign(models.Model): ... class Meta: indexes = [models.Index(fields=['archived'], name='campaign_archived_idx', condition=Q(archived=False))] New time for the query: 100.388 ms

    Slide 83

    Slide 83 text

    Without partial index, this query took 219 ms SQL code using pg_stat_statements to get the 10 slowest queries @louisemeta 1 2 3 SELECT query, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; query | SELECT * FROM ad_analytics_campaign WHERE archived is not true average_time | 219.482032024528 Remember we discovered this slow query using pg_stat_statements

    Slide 84

    Slide 84 text

    More on indexes in Postgres Postgres offers lots of index types (GIN, GiST, BTree, …) that help with performance You can learn more about indexes in my DjangoCon talk: https://aka.ms/django-PG-index-talk @louisemeta

    Slide 85

    Slide 85 text

    Fixing a slow query / scaling UP your server What if you already optimized your code, have perfect indexes—and things are still slow? Or getting slower and slower, even if you didn’t change anything? There can be many different reasons! More traffic, more usage of CPU, low memory… Your hardware resources might not be sufficient. @louisemeta This might be a good moment to scale UP—by adding disk, cpu, memory. Tip

    Slide 86

    Slide 86 text

    Improving performance / scaling UP your server Recommend this talk on how to debug, & what tools you can use. To understand if it’s time to scale up your server. https://aka.ms/all-goes-wrong-PG-talk @louisemeta

    Slide 87

    Slide 87 text

    What if your app is a big success? What if you need more scale than you can get from a single machine? @louisemeta

    Slide 88

    Slide 88 text

    Fixing slow queries / scaling OUT with Citus • Open source extension to PostgreSQL (not a fork) • Scales out Postgres horizontally • Distributes your data + your SQL queries • Multi-tenant SaaS, real-time analytics, & HTAP • Looks like single-node Postgres to your app • Now available as Hyperscale (Citus), part of Azure Database for PostgreSQL @louisemeta https://aka.ms/citus

    Slide 89

    Slide 89 text

    APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend compaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus Shards Postgres Across Nodes: Distributed Aggregate

    Slide 90

    Slide 90 text

    https://aka.ms/hyperscale-citus-quickstart Fixing slow queries / Quickstart docs for Hyperscale (Citus) on Azure

    Slide 91

    Slide 91 text

    Your 8 takeaways Your Postgres superpowers @louisemeta

    Slide 92

    Slide 92 text

    Your 8 takeaways, the Postgres superpowers 1. pg_stat_statements to find slow queries 2. django-debug-toolbar is 3. Postgres logs for 100% visibility into queries 4. Loops—be careful 5. Don’t let your ORM dictate what you SELECT 6. Replace your old OFFSET pagination by keyset pagination 7. Indexes—and partial indexes—can help 8. Scale up—or scale out with Hyperscale (Citus) on Azure @louisemeta

    Slide 93

    Slide 93 text

    © Copyright Microsoft Corporation. All rights reserved. thank you merci grazie ευχαριστώ gracias ध"यवाद Louise Grandjonc • Hyperscale (Citus) Quickstart Docs: https://aka.ms/hyperscale-citus-quickstart • Talk on Debugging Postgres: https://aka.ms/all-goes-wrong-PG-talk • DjangoCon talk on Indexes: https://aka.ms/django-PG-index-talk • GitHub repo for Django Ad App: https://aka.ms/django-PG-ad-app • Git repo for Citus open source: https://aka.ms/citus • Join the Citus technical newsletter: https://aka.ms/citus-newsletter @louisemeta @AzureDBPostgres @citusdata