Building PostgreSQL apps at scale with Hyperscale (Citus) | Microsoft Build 2019 | Craig Kerstiens, Sunil Kamath

Building PostgreSQL apps at scale with Hyperscale (Citus) | Microsoft Build 2019 | Craig Kerstiens, Sunil Kamath

Postgres is a powerful open source database with different features and tooling. If you are new to Postgres, how do you ensure it’s performant and scale? We’ll start with the basics you need to know as an app developer, then dig into how to optimize apps and database to achieve optimal performance. We’ll look at things like cache hit ratio, index hit ratio and sharding with Citus as you scale your apps. This session will also include learnings from an architect of a leading banking institute who were able to successfully build a highly performing and scalable app. If you're coming from a SQL server we'll cover how things differ and the equivalent options of Postgres, including the rich set of Postgres extensions for deployment experience.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

May 06, 2019
Tweet

Transcript

  1. 1.
  2. 4.

    Why Postgres? JSONB hstore Arrays GIN index PostGIS Concurrent indexing

    GiST index B-tree index MVCC Time series Safety Proven track record SP-GiST index KNN BRIN index Listen/notify CTEs Window functions Transactional DDL Foreign data wrappers Extensions Fast column addition
  3. 5.

    Focus on your application, not your database Enterprise-grade managed services

    for PostgreSQL On-premises PostgreSQL IaaS Azure VMs with PostgreSQL PaaS Azure Database for PostgreSQL Datacenter management Hardware O/S provision /patching Database provision/ Patch/Scaling Virtualization Data Applications High availability /DR/Backups Datacenter management Hardware Virtualization O/S Database provision/ Patch/Scaling Data Applications High availability /DR/Backups Data Applications Datacenter management Hardware Virtualization O/S Database provision/ Patch/Scaling High availability/ DR/Backups Intelligent performance/security Managed by Microsoft Managed by customer Machine learning capability
  4. 6.

    High performance scale-out with Hyperscale (Citus) Intelligent performance optimization Flexible

    and open Fully managed and secure Single Server Hyperscale (Citus) NEW Build or migrate your workloads with confidence
  5. 7.

    Single Server Hyperscale (Citus) NEW Worry-free PostgreSQL in the cloud

    with an architecture that is built to scale out Example use cases • Scaling PostgreSQL multi-tenant, SaaS applications • Real-time operational analytics • Building high throughput transactional apps Community-based single node PostgreSQL with built-in High Availability Example use cases • Transactional and operational analytics workloads • Apps requiring JSON, geospatial support or full-text search • Greenfield apps built with modern frameworks
  6. 8.
  7. 9.
  8. 10.
  9. 11.
  10. 12.
  11. 13.
  12. 14.
  13. 15.
  14. 16.
  15. 18.

    Cache hit ratio SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit,

    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables;
  16. 20.

    Index hit ratio SELECT relname, 100 * idx_scan / (seq_scan

    + idx_scan) percent_of_times_index_used, n_live_tup rows_in_table FROM pg_stat_user_tables WHERE seq_scan + idx_scan > 0 ORDER BY n_live_tup DESC;a
  17. 21.

    Index hit ratio relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events

    | 0 | 669917 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 authorizations | 0 | 0 delayed_jobs | 23 | 0
  18. 27.
  19. 29.
  20. 30.
  21. 31.
  22. 32.
  23. 33.
  24. 34.
  25. 35.
  26. 36.
  27. 37.
  28. 38.
  29. 40.

    Under the covers data is sharded ž APPLICATION SELECT FROM

    WHERE AND count(*) ads JOIN campaigns ON ads.company_id = campaigns.company_id ads.designer_name = ‘Isaac’ campaigns.company_id = ‘Elly Co’ ; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT … FROM ads_1001, campaigns_2001 … It’s logical to place shards containing related rows of related tables together on the same nodes Join queries between related rows can reduce the amount of data sent over the network
  30. 41.

    Create a table CREATE TABLE github_events ( event_id bigint, event_type

    text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, user_id bigint, org jsonb, created_at timestamp ); CREATE TABLE github_users ( user_id bigint, url text, login text, avatar_url text, gravatar_id text, display_login text );
  31. 42.

    # \dt List of relations Schema | Name | Type

    | Owner --------+----------------------+-------+------- public | github_events_102011 | table | citus public | github_events_102015 | table | citus public | github_events_102019 | table | citus public | github_events_102023 | table | citus public | github_events_102027 | table | citus public | github_events_102031 | table | citus public | github_events_102035 | table | citus public | github_events_102039 | table | citus public | github_users_102043 | table | citus public | github_users_102047 | table | citus public | github_users_102051 | table | citus public | github_users_102055 | table | citus public | github_users_102059 | table | citus public | github_users_102063 | table | citus public | github_users_102067 | table | citus public | github_users_102071 | table | citus (16 rows) # \dt List of relations Schema | Name | Type | Owner --------+----------------------+-------+------- public | github_events_102009 | table | citus public | github_events_102013 | table | citus public | github_events_102017 | table | citus public | github_events_102021 | table | citus public | github_events_102025 | table | citus public | github_events_102029 | table | citus public | github_events_102033 | table | citus public | github_events_102037 | table | citus public | github_users_102041 | table | citus public | github_users_102045 | table | citus public | github_users_102049 | table | citus public | github_users_102053 | table | citus public | github_users_102057 | table | citus public | github_users_102061 | table | citus public | github_users_102065 | table | citus public | github_users_102069 | table | citus (16 rows) Worker Node 1 Worker Node 2
  32. 43.
  33. 45.
  34. 46.
  35. 47.
  36. 48.

    Flexible and open High performance scale-out with Hyperscale (Citus) Fully

    managed and secure Intelligent performance optimization • • • • • •