Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building PostgreSQL apps at scale with Hyperscale (Citus) | Microsoft Build 2019 | Craig Kerstiens, Sunil Kamath

Building PostgreSQL apps at scale with Hyperscale (Citus) | Microsoft Build 2019 | Craig Kerstiens, Sunil Kamath

Postgres is a powerful open source database with different features and tooling. If you are new to Postgres, how do you ensure it’s performant and scale? We’ll start with the basics you need to know as an app developer, then dig into how to optimize apps and database to achieve optimal performance. We’ll look at things like cache hit ratio, index hit ratio and sharding with Citus as you scale your apps. This session will also include learnings from an architect of a leading banking institute who were able to successfully build a highly performing and scalable app. If you're coming from a SQL server we'll cover how things differ and the equivalent options of Postgres, including the rich set of Postgres extensions for deployment experience.

Citus Data

May 06, 2019
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. View Slide

  2. Craig Kerstiens
    Sunil Kamath
    Mathew Stokes

    View Slide

  3. PostgreSQL is more popular than ever
    loved
    wanted
    https://insights.stackoverflow.com/survey/2019?utm_source=so-owned&utm_medium=blog&utm_campaign=dev-survey-2019&utm_content=launch-blog
    https://db-engines.com/en/blog_post/76
    DBMS
    of the Year

    View Slide

  4. Why Postgres?
    JSONB
    hstore
    Arrays
    GIN index
    PostGIS
    Concurrent indexing
    GiST index
    B-tree index
    MVCC
    Time series
    Safety
    Proven track record
    SP-GiST index
    KNN
    BRIN index
    Listen/notify
    CTEs
    Window functions
    Transactional DDL
    Foreign data wrappers
    Extensions
    Fast column addition

    View Slide

  5. Focus on your application,
    not your database
    Enterprise-grade managed services for
    PostgreSQL
    On-premises
    PostgreSQL
    IaaS
    Azure VMs with
    PostgreSQL
    PaaS
    Azure Database for
    PostgreSQL
    Datacenter
    management
    Hardware
    O/S provision /patching
    Database provision/
    Patch/Scaling
    Virtualization
    Data
    Applications
    High availability
    /DR/Backups
    Datacenter
    management
    Hardware
    Virtualization
    O/S
    Database provision/
    Patch/Scaling
    Data
    Applications
    High availability
    /DR/Backups
    Data
    Applications
    Datacenter
    management
    Hardware
    Virtualization
    O/S
    Database provision/
    Patch/Scaling
    High availability/
    DR/Backups
    Intelligent
    performance/security
    Managed by Microsoft
    Managed by customer
    Machine learning capability

    View Slide

  6. High performance scale-out
    with Hyperscale (Citus)
    Intelligent performance
    optimization
    Flexible and open
    Fully managed and secure
    Single Server
    Hyperscale (Citus) NEW
    Build or migrate your workloads with confidence

    View Slide

  7. Single Server Hyperscale (Citus) NEW
    Worry-free PostgreSQL in the cloud with an architecture
    that is built to scale out
    Example use cases
    • Scaling PostgreSQL multi-tenant, SaaS applications
    • Real-time operational analytics
    • Building high throughput transactional apps
    Community-based single node PostgreSQL with built-in High
    Availability
    Example use cases
    • Transactional and operational analytics workloads
    • Apps requiring JSON, geospatial support or full-text search
    • Greenfield apps built with modern frameworks

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. Cache rules everything
    around me

    View Slide

  18. Cache hit ratio
    SELECT
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    sum(heap_blks_hit) / (sum(heap_blks_hit) +
    sum(heap_blks_read)) as ratio
    FROM
    pg_statio_user_tables;

    View Slide

  19. Cache hit ratio
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99

    View Slide

  20. Index hit ratio
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used,
    n_live_tup rows_in_table
    FROM
    pg_stat_user_tables
    WHERE
    seq_scan + idx_scan > 0
    ORDER BY
    n_live_tup DESC;a

    View Slide

  21. Index hit ratio
    relname | percent_of_times_index_used | rows_in_table
    ---------------------+-----------------------------+---------------
    events | 0 | 669917
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059
    authorizations | 0 | 0
    delayed_jobs | 23 | 0

    View Slide

  22. Table cache hit ratio target
    > 99%

    View Slide

  23. https://github.com/savjani/postgres-
    assets/blob/master/SQL%20Notebooks/Postgres_database_health_check_notebook.ipynb

    View Slide

  24. Customized
    recommendations
    Performance
    troubleshooting
    Data visualization
    Intelligent performance

    View Slide

  25. azure_sys
    azure_mai
    ntenance
    postgres
    UserDB

    View Slide

  26. azure_sys
    azure_mai
    ntenance
    postgres
    UserDB

    View Slide

  27. View Slide

  28. I HAVE NO IDEA

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. 9
    TB

    View Slide

  39. Stay current with
    PostgreSQL innovations
    Blazing performance Simplified infrastructure
    Scale out horizontally

    View Slide

  40. Under the covers data is sharded
    ž APPLICATION
    SELECT
    FROM
    WHERE
    AND
    count(*)
    ads JOIN campaigns ON
    ads.company_id = campaigns.company_id
    ads.designer_name = ‘Isaac’
    campaigns.company_id = ‘Elly Co’ ;
    METADATA
    COORDINATOR NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT …
    FROM
    ads_1001,
    campaigns_2001

    It’s logical to place shards containing related rows of related tables together on the same nodes
    Join queries between related rows can reduce the amount of data sent over the network

    View Slide

  41. Create a table
    CREATE TABLE github_events
    (
    event_id bigint,
    event_type text,
    event_public boolean,
    repo_id bigint,
    payload jsonb,
    repo jsonb,
    user_id bigint,
    org jsonb,
    created_at timestamp
    );
    CREATE TABLE github_users
    (
    user_id bigint,
    url text,
    login text,
    avatar_url text,
    gravatar_id text,
    display_login text
    );

    View Slide

  42. # \dt
    List of relations
    Schema | Name | Type | Owner
    --------+----------------------+-------+-------
    public | github_events_102011 | table | citus
    public | github_events_102015 | table | citus
    public | github_events_102019 | table | citus
    public | github_events_102023 | table | citus
    public | github_events_102027 | table | citus
    public | github_events_102031 | table | citus
    public | github_events_102035 | table | citus
    public | github_events_102039 | table | citus
    public | github_users_102043 | table | citus
    public | github_users_102047 | table | citus
    public | github_users_102051 | table | citus
    public | github_users_102055 | table | citus
    public | github_users_102059 | table | citus
    public | github_users_102063 | table | citus
    public | github_users_102067 | table | citus
    public | github_users_102071 | table | citus
    (16 rows)
    # \dt
    List of relations
    Schema | Name | Type | Owner
    --------+----------------------+-------+-------
    public | github_events_102009 | table | citus
    public | github_events_102013 | table | citus
    public | github_events_102017 | table | citus
    public | github_events_102021 | table | citus
    public | github_events_102025 | table | citus
    public | github_events_102029 | table | citus
    public | github_events_102033 | table | citus
    public | github_events_102037 | table | citus
    public | github_users_102041 | table | citus
    public | github_users_102045 | table | citus
    public | github_users_102049 | table | citus
    public | github_users_102053 | table | citus
    public | github_users_102057 | table | citus
    public | github_users_102061 | table | citus
    public | github_users_102065 | table | citus
    public | github_users_102069 | table | citus
    (16 rows)
    Worker Node 1 Worker Node 2

    View Slide

  43. View Slide

  44. Hyperscale (Citus) helps ASB
    onboard customers 20x faster

    View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. Flexible and open
    High performance scale-out
    with Hyperscale (Citus)
    Fully managed and secure
    Intelligent performance
    optimization






    View Slide

  49. More OSS DBs at Data Showcase Demo

    View Slide