Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimize your Python & Django apps with easy-to-learn Postgres superpowers | PyCon 2020 Remote | Louise Grandjonc

Optimize your Python & Django apps with easy-to-learn Postgres superpowers | PyCon 2020 Remote | Louise Grandjonc

In this talk you’ll learn Postgres superpowers to optimize performance of Python & Django apps. You’ll dive into the Django ORM to identify performance bottlenecks, analyzing both Python & SQL code. You’ll fish into pg_stat_statements, django-debug-toolbar, Postgres logs, keyset pagination, Azure Performance Recommendations, eradicating loops, limiting what you select—and how to scale out Postgres horizontally with Hyperscale (Citus) on Azure Database for PostgreSQL.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Optimize your Python &
    Django apps with easy-to-learn
    Postgres superpowers
    Louise Grandjonc | @louisemeta @AzureDBPostgres @citusdata

    View full-size slide

  2. A bit about me—and these
    Louise Grandjonc
    @louisemeta
    @AzureDBPostgres
    @citusdata
    • Python developer
    • Postgres team at Microsoft (formerly of Citus Data)
    • Love both Python & Postgres
    • Really missing PyCon this year
    • Enjoy complex and modern SQL
    • Want applications to use database at full potential

    View full-size slide

  3. AGENDA: Easy-to-learn Postgres superpowers
    #1: What can cause performance issues?
    #2: What tools can you use to catch them?
    Loops
    Limiting
    query set
    Pagination
    Improving
    Slow Queries
    @louisemeta

    View full-size slide

  4. What can cause performance issues?
    @louisemeta

    View full-size slide

  5. ORMs are very handy tools, helping us build
    powerful applications in short amount of time…
    @louisemeta

    View full-size slide

  6. We have to be careful
    @louisemeta
    1. ORM executes queries you might not
    expect
    2. Queries might not be optimised—&
    you won’t know about it
    3. Fast query in your dev environment
    might behave very differently with
    your full dataset & under load

    View full-size slide

  7. Postgres superpowers to help
    you catch
    performance issues
    @louisemeta

    View full-size slide

  8. pg_stat_statements / for production & dev
    • Postgres extension
    • Tracks statistics on queries executed by a server (once enabled)
    • Great to catch slow queries
    • Add to production environment
    • And to dev environment
    • Installed / enabled by default on Azure Database for PostgreSQL
    @louisemeta

    View full-size slide

  9. pg_stat_statements / analyzing performance
    @louisemeta
    1
    2
    3
    4
    SELECT total_time, min_time, max_time, mean_time, calls, query
    FROM pg_stat_statements
    ORDER BY mean_time DESC
    LIMIT 100;

    View full-size slide

  10. pg_stat_statements / analyzing performance
    -[ RECORD 1 ]------------------------------------------------------
    total_time | 643.624
    min_time | 256.424
    max_time | 531.417
    mean_time | 383.666666667
    calls | 3
    query | SELECT COUNT(*) FROM letters;
    @louisemeta
    1
    2
    3
    4
    SELECT total_time, min_time, max_time, mean_time, calls, query
    FROM pg_stat_statements
    ORDER BY mean_time DESC
    LIMIT 100;

    View full-size slide

  11. pg_stat_statements / analyzing performance
    -[ RECORD 1 ]------------------------------------------------------
    total_time | 643.624
    min_time | 256.424
    max_time | 531.417
    mean_time | 383.666666667
    calls | 3
    query | SELECT COUNT(*) FROM letters;
    @louisemeta
    1
    2
    3
    4
    SELECT total_time, min_time, max_time, mean_time, calls, query
    FROM pg_stat_statements
    ORDER BY mean_time DESC
    LIMIT 100;

    View full-size slide

  12. • Identify long-running
    queries
    • Uses pg_stat_statements
    • Top 5 queries by average
    duration per execution
    • Aggregated in 15-min
    intervals
    • Alter time window
    • View additional queries
    Query Performance Insights / on Azure
    Wait statistics in Query Performance Insight, on Microsoft Azure
    @louisemeta

    View full-size slide

  13. • 2 types of custom
    suggestions
    • Create Index: suggests
    new indexes for
    frequently-run or time-
    consuming queries
    • Drop Index: suggests
    drop rarely-used or
    redundant indexes
    Performance Recommendations / on Azure
    Azure portal performance recommendations
    @louisemeta

    View full-size slide

  14. @louisemeta
    django-debug-toolbar

    View full-size slide

  15. django-debug-toolbar / for dev environments
    • Lists queries in
    a template
    view
    • Gives EXPLAIN
    plan for query
    • Trace back
    queries
    • Only when
    DEBUG=True
    in Django
    settings
    @louisemeta

    View full-size slide

  16. Logs on Postgres / are they useful?
    @louisemeta
    1
    2
    3
    4
    5
    6
    /* WARNING: Only do this in local environment */
    /* or you might end up with very large log files */
    log_statement = 'all'
    logging_collector = on
    log_min_duration_statement = 0
    • Best if have different environments (Golang API, django apps, etc.)
    • If template is making Ajax calls to API, django-debug-toolbar won't display
    queries, which makes Postgres logs especially useful!
    • To have Postgres log all your queries, you have to change your config:

    View full-size slide

  17. Logs on Postgres / are they useful?
    @louisemeta
    1
    2
    3
    4
    5
    6
    /* WARNING: Only do this in local environment */
    /* or you might end up with very large log files */
    log_statement = 'all'
    logging_collector = on
    log_min_duration_statement = 0
    • Best if have different environments (Golang API, django apps, etc.)
    • If template is making Ajax calls to API, django-debug-toolbar won't display
    queries, which makes Postgres logs especially useful!
    • To have Postgres log all your queries, you have to change your config:

    View full-size slide

  18. Logs on Postgres / are they useful?
    @louisemeta
    1
    2
    3
    4
    5
    6
    /* WARNING: Only do this in local environment */
    /* or you might end up with very large log files */
    log_statement = 'all'
    logging_collector = on
    log_min_duration_statement = 0
    • Best if have different environments (Golang API, django apps, etc.)
    • If template is making Ajax calls to API, django-debug-toolbar won't display
    queries, which makes Postgres logs especially useful!
    • To have Postgres log all your queries, you have to change your config:

    View full-size slide

  19. Loops, once upon a thousand times
    @louisemeta

    View full-size slide

  20. About today’s sample
    application:
    ”Ad & Campaign App”
    https://aka.ms/django-PG-ad-app

    View full-size slide

  21. About the
    schema

    View full-size slide

  22. On my website, have a page to list Campaigns

    View full-size slide

  23. Loops, once upon a thousand times
    Here is the code for this view
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)

    View full-size slide

  24. Loops, once upon a thousand times
    Here is the code for this view
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)

    View full-size slide

  25. Loops, once upon a thousand times
    Here is the code for this view
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)

    View full-size slide

  26. Loops, once upon a thousand times
    Here is the code for this view
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)

    View full-size slide

  27. In django-debug-toolbar, I have over 100 queries

    View full-size slide

  28. Loops, once upon a thousand times
    Thanks to django-debug-toolbar, I can trace back that it comes
    from my template. Indeed, I have the following code:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {% for campaign in campaigns %}


    Manage Campaign
    {% for ad in campaign.ads.all %}
    Ad #{{
    ad.name|truncatechars:30 }}
    {% endfor %}


    {% endfor %}

    View full-size slide

  29. Loops, once upon a thousand times
    Thanks to django-debug-toolbar, I can trace back that it comes
    from my template. Indeed, I have the following code:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {% for campaign in campaigns %}


    Manage Campaign
    {% for ad in campaign.ads.all %}
    Ad #{{
    ad.name|truncatechars:30 }}
    {% endfor %}


    {% endfor %}

    View full-size slide

  30. Loops, once upon a thousand times
    Thanks to django-debug-toolbar, I can trace back that it comes
    from my template. Indeed, I have the following code:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {% for campaign in campaigns %}


    Manage Campaign
    {% for ad in campaign.ads.all %}
    Ad #{{
    ad.name|truncatechars:30 }}
    {% endfor %}


    {% endfor %}

    View full-size slide

  31. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {% for campaign in campaigns %}


    Manage Campaign
    {% for ad in campaign.ads.all %}
    Ad #{{
    ad.name|truncatechars:30 }}
    {% endfor %}


    {% endfor %}
    Loops, once upon a thousand times
    Thanks to django-debug-toolbar, I can trace back that it comes
    from my template. Indeed, I have the following code:
    @louisemeta

    View full-size slide

  32. Loops, once upon a thousand times
    Loops can easily be fixed, Django provides 2 functions:
    • select_related: usually for ForeignKey
    • prefetch_related: usually for ManyToMany or related objects
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)
    .prefetch_related('ads')

    View full-size slide

  33. Loops, once upon a thousand times
    Loops can easily be fixed, Django provides 2 functions:
    • select_related: usually for ForeignKey
    • prefetch_related: usually for ManyToMany or related objects
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    def get_queryset(self):
    queryset = super().get_queryset()
    return queryset.filter(company=self.request.user.employee.company)
    .prefetch_related('ads')

    View full-size slide

  34. Going from 100 queries down to 6 queries

    View full-size slide

  35. If working with small sets, you might not notice
    This `pg_stat_statements` query helps you ID most common queries:
    @louisemeta
    1
    2
    select query,calls from pg_stat_statements ORDER BY calls DESC limit 50;
    To avoid catastrophes on production, look at
    queries during dev phase to avoid loops like this
    What I want you to remember

    View full-size slide

  36. Limiting what you are selecting
    @louisemeta

    View full-size slide

  37. Limiting what you are selecting
    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name",
    "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image",
    "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count",
    "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at",
    "ad_analytics_ads"."updated_at"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    By default, Django ORM selects all columns.
    When we do prefetch_related, here is the query:
    1
    2

    View full-size slide

  38. SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name",
    "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image",
    "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count",
    "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at",
    "ad_analytics_ads"."updated_at"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    Limiting what you are selecting
    @louisemeta
    1
    2
    By default, Django ORM selects all columns.
    When we do prefetch_related, here is the query:

    View full-size slide

  39. Limiting what you are selecting
    However, in menu, we only display name; we only need id & campaign_id
    for the link. So, the query should be:
    @louisemeta
    1
    2
    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name",
    "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image",
    "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count",
    "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at",
    "ad_analytics_ads"."updated_at"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    1
    2
    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    By default, Django ORM selects all columns.
    When we do prefetch_related, here is the query:

    View full-size slide

  40. 1
    2
    Limiting what you are selecting
    However, in menu, we only display name; we only need id & campaign_id
    for the link. So, the query should be:
    @louisemeta
    1
    2
    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name",
    "ad_analytics_ads"."campaign_id", "ad_analytics_ads"."image",
    "ad_analytics_ads"."target_url", "ad_analytics_ads"."impressions_count",
    "ad_analytics_ads"."clicks_count", "ad_analytics_ads"."created_at",
    "ad_analytics_ads"."updated_at"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    SELECT "ad_analytics_ads"."id", "ad_analytics_ads"."name", "ad_analytics_ads"."campaign_id"
    FROM "ad_analytics_ads" WHERE "ad_analytics_ads"."campaign_id" IN (...)
    By default, Django ORM selects all columns.
    When we do prefetch_related, here is the query:

    View full-size slide

  41. Limiting what you are selecting
    You can limit columns retrieved by Django ORM using function only()
    If table has tens of columns, limiting what you pull from the database will
    improve page performance
    @louisemeta
    Don't let your Django ORM dictate what you
    should SELECT :)
    What I want you to remember

    View full-size slide

  42. Limiting what you are selecting
    You can limit columns retrieved by Django ORM using function only()
    If table has tens of columns, limiting what you pull from the database will
    improve page performance
    @louisemeta
    1
    2
    Example of how to use the function only()
    Campaign.objects.filter(company_id=1).only('name')

    View full-size slide

  43. Limiting what you are selecting
    However—ads are in prefetch_related—so to limit queryset to the
    right fields, we use Prefetch class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    from django.db.models import Prefetch

    def get_queryset(self):
    queryset = super().get_queryset()
    return (queryset
    .filter(company=self.request.user.employee.company)
    .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id',
    'campaign_id', 'name')))
    .order_by('id'))

    View full-size slide

  44. Limiting what you are selecting
    However—ads are in prefetch_related—so to limit queryset to the
    right fields, we use Prefetch class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    from django.db.models import Prefetch

    def get_queryset(self):
    queryset = super().get_queryset()
    return (queryset
    .filter(company=self.request.user.employee.company)
    .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id',
    'campaign_id', 'name')))
    .order_by('id'))

    View full-size slide

  45. Limiting what you are selecting
    However—ads are in prefetch_related—so to limit queryset to the
    right fields, we use Prefetch class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    from django.db.models import Prefetch

    def get_queryset(self):
    queryset = super().get_queryset()
    return (queryset
    .filter(company=self.request.user.employee.company)
    .prefetch_related(Prefetch('ads', queryset=Ads.objects.all().only('id',
    'campaign_id', 'name')))
    .order_by('id'))

    View full-size slide

  46. Pagination
    @louisemeta

    View full-size slide

  47. Suppose I generated 1M
    campaigns for my
    company.
    And now the list page is
    very, very slow
    @louisemeta

    View full-size slide

  48. Pagination / why is paginating so important?
    1. Queries retrieving data for 1M rows is slower than 25 rows
    2. The more rows, the slower!
    3. When you do prefetch_related, you are doing a query
    with campaign_id IN () , you can imagine what it would look like for 1M
    rows!
    4. You never actually need to see all of your rows in one page
    @louisemeta

    View full-size slide

  49. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  50. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  51. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  52. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  53. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  54. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  55. Pagination / Default paginator class
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from django.core.paginator import Paginator
    >>> queryset = Campaign.objects.filter(company_id=1).order_by('id')
    >>> p = Paginator(queryset, 25)
    >>> p.num_pages
    40000
    >>> page = p.page(1)
    >>> page.has_next()
    True
    >>> page.has_previous()
    False
    >>> page.next_page_number()
    2
    >>> page.object_list
    , ,
    The Paginator class in Django takes care of splitting queryset into pages

    View full-size slide

  56. Pagination / Integration in ListView (python)
    The generic view ListView, that we use for the campaigns page, provides a
    built-in way to paginate—by simply adding this:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginate_by = 20
    def get_queryset(self):
    ...

    View full-size slide

  57. Pagination / Integration in ListView (python)
    The generic view ListView, that we use for the campaigns page, provides a
    built-in way to paginate—by simply adding this:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    class CampaignListView(LoginRequiredMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginate_by = 20
    def get_queryset(self):
    ...

    View full-size slide

  58. Pagination / Integration in ListView (html)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19



    « first
    {% if page_obj.has_previous %}
    previous
    {% endif %}


    Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}.


    {% if page_obj.has_next %}
    next
    {% endif %}
    last »



    View full-size slide

  59. Pagination / Integration in ListView (html)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19



    « first
    {% if page_obj.has_previous %}
    previous
    {% endif %}


    Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}.


    {% if page_obj.has_next %}
    next
    {% endif %}
    last »



    View full-size slide

  60. Pagination / Integration in ListView (html)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19



    « first
    {% if page_obj.has_previous %}
    previous
    {% endif %}


    Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}.


    {% if page_obj.has_next %}
    next
    {% endif %}
    last »



    View full-size slide

  61. Pagination / Integration in ListView

    View full-size slide

  62. Pagination / 2 problems
    with default pagination
    1. Count used by Paginator to
    determine # of pages is slow—the
    more rows, the slower it will get!
    2. OFFSET … LIMIT 20 gets slower in
    high pages
    OFFSET is used to skip first N rows
    of a query—to do that, database
    must fetch rows & order, to return
    last ones
    @louisemeta

    View full-size slide

  63. Keyset pagination / why you should care
    Example of what queries are usually like with OFFSET .. LIMIT 20:
    1
    2
    SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;

    View full-size slide

  64. Keyset pagination / why you should care
    Instead of having an OFFSET query like this:
    With keyset pagination, the query will have a filter on the ORDER BY fields:
    To learn about keyset pagination, I recommend this article by Markus Winand on his use-the-index-luke dot com
    website: https://aka.ms/luke-no-offset. Markus’s website is great if you want to learn more about SQL performance!
    1
    2
    SELECT * FROM campaigns WHERE id > ? ORDER BY id LIMIT 20;
    1
    2
    SELECT * FROM campaigns ORDER BY id LIMIT 20 OFFSET 20;

    View full-size slide

  65. Keyset pagination / django-keyset-pagination-plus
    An application has been written to simplify your use of keyset pagination:
    https://pypi.org/project/django-keyset-pagination-plus/
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from keyset_pagination.paginator import KeysetPaginator
    from keyset_pagination.mixin import PaginateMixin
    class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginator_class = KeysetPaginator
    paginate_by = 20

    View full-size slide

  66. Keyset pagination / django-keyset-pagination-plus
    An application has been written to simplify your use of keyset pagination:
    https://pypi.org/project/django-keyset-pagination-plus/
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from keyset_pagination.paginator import KeysetPaginator
    from keyset_pagination.mixin import PaginateMixin
    class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginator_class = KeysetPaginator
    paginate_by = 20

    View full-size slide

  67. Keyset pagination / django-keyset-pagination-plus
    An application has been written to simplify your use of keyset pagination:
    https://pypi.org/project/django-keyset-pagination-plus/
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from keyset_pagination.paginator import KeysetPaginator
    from keyset_pagination.mixin import PaginateMixin
    class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginator_class = KeysetPaginator
    paginate_by = 20

    View full-size slide

  68. Keyset pagination / django-keyset-pagination-plus
    An application has been written to simplify your use of keyset pagination:
    https://pypi.org/project/django-keyset-pagination-plus/
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from keyset_pagination.paginator import KeysetPaginator
    from keyset_pagination.mixin import PaginateMixin
    class CampaignListView(LoginRequiredMixin, PaginateMixin, ListView):
    model = Campaign
    template_name = 'ad_analytics/campaigns/list.html'
    context_object_name = 'campaigns'
    paginator_class = KeysetPaginator
    paginate_by = 20

    View full-size slide

  69. Keyset pagination

    View full-size slide

  70. Fixing slow queries
    @louisemeta

    View full-size slide

  71. Fixing a slow query / pg_stat_statements
    SQL code using pg_stat_statements to get the 10 slowest queries
    @louisemeta
    1
    2
    3
    SELECT query, mean_time
    FROM pg_stat_statements
    ORDER BY mean_time DESC LIMIT 10;

    View full-size slide

  72. Fixing a slow query / pg_stat_statements
    SQL code using pg_stat_statements to get the 10 slowest queries
    @louisemeta
    1
    2
    3
    SELECT query, mean_time
    FROM pg_stat_statements
    ORDER BY mean_time DESC LIMIT 10;
    query | SELECT * FROM ad_analytics_campaign WHERE archived is not true
    average_time | 219.482032024528
    One of the slow queries I discover is this one:

    View full-size slide

  73. Fixing a slow query / Looking into EXPLAIN
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true;
    QUERY PLAN
    ------------------------------------------------------------------------------------
    ------
    Gather (cost=1000.00..30858.95 rows=133 width=62)
    Workers Planned: 2
    -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55
    width=62)
    Filter: (archived IS NOT TRUE)
    (4 rows)
    Time: 100.985 ms

    View full-size slide

  74. Fixing a slow query / Looking into EXPLAIN
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true;
    QUERY PLAN
    ------------------------------------------------------------------------------------
    ------
    Gather (cost=1000.00..30858.95 rows=133 width=62)
    Workers Planned: 2
    -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55
    width=62)
    Filter: (archived IS NOT TRUE)
    (4 rows)
    Time: 100.985 ms

    View full-size slide

  75. Fixing a slow query / Looking into EXPLAIN
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true;
    QUERY PLAN
    ------------------------------------------------------------------------------------
    ------
    Gather (cost=1000.00..30858.95 rows=133 width=62)
    Workers Planned: 2
    -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55
    width=62)
    Filter: (archived IS NOT TRUE)
    (4 rows)
    Time: 100.985 ms

    View full-size slide

  76. Fixing a slow query / Looking into EXPLAIN
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    postgres=> EXPLAIN SELECT * FROM ad_analytics_campaign WHERE archived is not true;
    QUERY PLAN
    ------------------------------------------------------------------------------------
    ------
    Gather (cost=1000.00..30858.95 rows=133 width=62)
    Workers Planned: 2
    -> Parallel Seq Scan on ad_analytics_campaign (cost=0.00..29845.65 rows=55
    width=62)
    Filter: (archived IS NOT TRUE)
    (4 rows)
    Time: 100.985 ms

    View full-size slide

  77. Fixing a slow query / adding a partial index
    Adding a partial index on the column:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    class Campaign(models.Model):
    ...
    class Meta:
    indexes = [models.Index(fields=['archived'],
    name='campaign_archived_idx',
    condition=Q(archived=False))]

    View full-size slide

  78. Fixing a slow query / adding a partial index
    Adding a partial index on the column:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    class Campaign(models.Model):
    ...
    class Meta:
    indexes = [models.Index(fields=['archived'],
    name='campaign_archived_idx',
    condition=Q(archived=False))]

    View full-size slide

  79. Fixing a slow query / adding a partial index
    Adding a partial index on the column:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    class Campaign(models.Model):
    ...
    class Meta:
    indexes = [models.Index(fields=['archived'],
    name='campaign_archived_idx',
    condition=Q(archived=False))]

    View full-size slide

  80. Fixing a slow query / adding a partial index
    Adding a partial index on the column:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    class Campaign(models.Model):
    ...
    class Meta:
    indexes = [models.Index(fields=['archived'],
    name='campaign_archived_idx',
    condition=Q(archived=False))]

    View full-size slide

  81. Fixing a slow query / adding a partial index
    Adding a partial index on the column:
    @louisemeta
    1
    2
    3
    4
    5
    6
    7
    class Campaign(models.Model):
    ...
    class Meta:
    indexes = [models.Index(fields=['archived'],
    name='campaign_archived_idx',
    condition=Q(archived=False))]
    New time for the query: 100.388 ms

    View full-size slide

  82. Without partial index, this query took 219 ms
    SQL code using pg_stat_statements to get the 10 slowest queries
    @louisemeta
    1
    2
    3
    SELECT query, mean_time
    FROM pg_stat_statements
    ORDER BY mean_time DESC LIMIT 10;
    query | SELECT * FROM ad_analytics_campaign WHERE archived is not true
    average_time | 219.482032024528
    Remember we discovered this slow query using pg_stat_statements

    View full-size slide

  83. More on indexes in
    Postgres
    Postgres offers lots of index types
    (GIN, GiST, BTree, …) that help with
    performance
    You can learn more about indexes in
    my DjangoCon talk:
    https://aka.ms/django-PG-index-talk
    @louisemeta

    View full-size slide

  84. Fixing a slow query / scaling UP your server
    What if you already optimized your code, have perfect indexes—and things
    are still slow? Or getting slower and slower, even if you didn’t change
    anything?
    There can be many different reasons! More traffic, more usage of CPU, low
    memory…
    Your hardware resources might not be sufficient.
    @louisemeta
    This might be a good moment to scale UP—by
    adding disk, cpu, memory.
    Tip

    View full-size slide

  85. Improving performance /
    scaling UP your server
    Recommend this talk on how to debug,
    & what tools you can use.
    To understand if it’s time to scale up
    your server.
    https://aka.ms/all-goes-wrong-PG-talk
    @louisemeta

    View full-size slide

  86. What if your app is a big
    success?
    What if you
    need more scale
    than you can get from a
    single machine?
    @louisemeta

    View full-size slide

  87. Fixing slow queries / scaling OUT with Citus
    • Open source extension to PostgreSQL (not a fork)
    • Scales out Postgres horizontally
    • Distributes your data + your SQL queries
    • Multi-tenant SaaS, real-time analytics, & HTAP
    • Looks like single-node Postgres to your app
    • Now available as Hyperscale (Citus), part of
    Azure Database for PostgreSQL
    @louisemeta
    https://aka.ms/citus

    View full-size slide

  88. APPLICATION
    SELECT
    FROM
    GROUP BY
    company_id,
    avg(spend) AS avg_campaign_spend
    compaigns
    company_id;
    METADATA
    COORDINATOR NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2009 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2001 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2017 …
    How Citus Shards Postgres Across Nodes: Distributed Aggregate

    View full-size slide

  89. https://aka.ms/hyperscale-citus-quickstart
    Fixing slow queries / Quickstart docs for
    Hyperscale (Citus) on Azure

    View full-size slide

  90. Your 8 takeaways
    Your Postgres superpowers
    @louisemeta

    View full-size slide

  91. Your 8 takeaways, the Postgres superpowers
    1. pg_stat_statements to find slow queries
    2. django-debug-toolbar is
    3. Postgres logs for 100% visibility into queries
    4. Loops—be careful
    5. Don’t let your ORM dictate what you SELECT
    6. Replace your old OFFSET pagination by keyset pagination
    7. Indexes—and partial indexes—can help
    8. Scale up—or scale out with Hyperscale (Citus) on Azure
    @louisemeta

    View full-size slide

  92. © Copyright Microsoft Corporation. All rights reserved.
    thank you
    merci
    grazie
    ευχαριστώ
    gracias
    ध"यवाद
    Louise Grandjonc
    • Hyperscale (Citus) Quickstart Docs:
    https://aka.ms/hyperscale-citus-quickstart
    • Talk on Debugging Postgres:
    https://aka.ms/all-goes-wrong-PG-talk
    • DjangoCon talk on Indexes:
    https://aka.ms/django-PG-index-talk
    • GitHub repo for Django Ad App:
    https://aka.ms/django-PG-ad-app
    • Git repo for Citus open source:
    https://aka.ms/citus
    • Join the Citus technical newsletter:
    https://aka.ms/citus-newsletter
    @louisemeta
    @AzureDBPostgres
    @citusdata

    View full-size slide