Scaling Databases for SaaS Applications with Postgres

024d6a0dd14fb31c804969a57a06dfbe?s=47 Citus Data
November 07, 2016

Scaling Databases for SaaS Applications with Postgres

Slides from our webcast on scaling databases for
SaaS applications with Postgres. You can view the video from the webcast at


Citus Data

November 07, 2016


  1. Scaling Databases for SaaS Applications Ozgun Erdogan Lukas Fittl

  2. Webinar Outline 1. Introduction – Why scale your SaaS (multi-tenant)

    database? – When is the right time to scale? 2. Scaling your SaaS database – Design patterns in scaling SaaS databases – Example architecture 3. PostgreSQL and Citus overview – Semi-structured data types – Key database features for multi-tenancy
  3. What is a multi-tenant database • If you’re building a

    B2B application, you already have the notion of tenancy built into your data model • B2B applications that serve other tenants / accounts / organizations use multi-tenant dbs – Physical service providers. For example, food services to other businesses – Digital service providers: Advertising, marketing, and sales automation
  4. Why scale multi-tenant apps • Multi-tenant databases were commonplace in

    on-premises • SaaS applications introduced the motivation to scale further – Cloud enables serving many smaller tenants – Instead of dozens of tenants, new SaaS apps reach to and handle 1K-100K tenants – Storage is cheap: You can store events or track a field’s history
  5. When is the right time to scale out • Scaling

    up is easier than scaling out. If you can throw more hardware at the problem, that’s the easiest way to scale. • Also tune your database: • When is the right time to start thinking about scaling out?
  6. Heuristic #1 on when to scale • Your SaaS business

    is growing, you’re on the second largest instance type available on your cloud / infrastructure provider – Example tipping points: We signed a big customer, and now all our customers are hurting. Or one-off operational queries are bringing the database to a halt
  7. Heuristic #2 • PostgreSQL’s autovacuum daemon can’t catch up with

    our write traffic • Make sure you changed your default vacuum settings to be more aggressive (e.g. autovacuum_vacuum_scale_factor) • You’ve tuned these common settings, improved performance, and you’re still experiencing spikes
  8. Heuristic #3 • Databases will cache recent and frequently accessed

    data in memory for you • The database will track how often you use the cache and hit disk • For OLTP applications, most of your working set should be fulfilled from the cache – Look to serve 99% from the cache
  9. Heuristic #3 – cache hit ratio query SELECT 'index hit

    rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes UNION ALL SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes) name | ratio ----------------+------------------------ cache hit rate | 0.99
  10. Plan ahead • Plan ahead, optimize queries, and don’t wait

    until there isn’t another option • When it’s the right time to scale, what are your options for your multi-tenant database?
  11. Scaling Multi-tenant Databases • Three high level options: 1. Create

    one database per tenant 2. Create one schema per tenant 3. Have all tenants share the same tables (and partition / shard tables)
  12. Create one database per tenant • Create a separate database

    for each tenant 1. Isolation of tenants: Are you giving your tenants direct database access? 2. More predictable compliance story: Certain industries such as healthcare and finance have regulatory requirements
  13. Create one schema per tenant • Create a separate namespace

    (schema) for each tenant 1. Isolate data / queries for one tenant in a particular schema 2. Makes better use of resources than the “one database per tenant” model
  14. Have all tenants share the same tables • Have all

    tenants share the same tables by adding a tenant_id column (and shard) 1. Scales to 1K-100K tenants through better resource sharing 2. Simplifies operations and maintenance
  15. Rule of thumb (simplified) • Each design option can address

    questions around scale and isolation with enough effort. What’s the primary criteria for you app? • If you’re building for scale: Have all tenants share the same table(s) • If you’re building for isolation: Create one database per tenant
  16. “Shared tenants” architecture

  17. Benefit: Resource Sharing • If you create a separate database

    / schema for each tenant, you need to allocate resources to that database. • Hardware: disk, memory, cpu, and network management • Database software: shared_buffers, connection counts, background processes, WAL logs • ORM software: Cached information about databases / schemas
  18. Google F1 – An Example • Google F1 is an

    example that demonstrates a multi-tenant database. • AdWords serves more than 1M tenants. • Benefits of building on an RDBMS: 1. SQL 2. Transactions 3. Joins – avoids data duplication 4. Primary and foreign key constraints
  19. Data modeling for multi-tenant

  20. Key Insight • If you shard your tables on their

    primary key (in the relational model), then distributed transactions, joins, and foreign key constraints become expensive. • Model your tables using the hierarchical database model by adding tenant_id. This colocates data for the same tenant together and dramatically reduces cost.
  21. Does everything fit into hierarchical? • What happens if I

    have a table that doesn’t fit into the hierarchical database model? 1. Large table(s) outside the hierarchy: Orgs and users that are shared across orgs – Shard on different column and don’t join 2. Small table(s) that are common to hierarchy – Denormalize into related large tables – Create reference table replicated across all
  22. How does largest tenant impact scale? • What percentage of

    the total data size belongs to the largest tenant? • Multi-tenant databases usually follow a Zipf / Power law distribution – 10 tenants: Largest tenant holds 60% of data (*) – 10K tenants: Largest tenant holds 2% of data (*) • Look at your data’s distribution to make informed scaling decisions
  23. Benefit: Ease of maintenance • Your database grows with your

    SaaS application. • Schema changes (Alter Table … Add Column) and index creations (Create Index) are common operations. • What happens when you have 10K tenants and you changed the schema for 5,000 of those tenants and observed a failure?
  24. Transition Slide

  25. Scaling with Citus

  26. PostgreSQL Database PostgreSQL Client

  27. PostgreSQL Client

  28. Citus Node PostgreSQL Client Citus Node Citus Node Citus Node

    Citus Node Citus Node
  29. Citus Node Citus Node Citus Node Citus Node Citus Node

    Citus Node Citus Node Shard Shards are PostgreSQL tables on a worker node
  30. Terminology Distributed Table Table that is distributed across nodes by

    splitting the data set into shards Reference Table JOINs with distributed tables (e.g. “timezones”) Coordinator Table Small tables that are independent of a tenant (e.g. users)
  31. Co-Located Tables

  32. Co-Located Tables All data for a single tenant lives on

    the same node Shard every table by tenant_id with the same shard count JOINs between co-located tables just work JOINs between unrelated tables or different tenants will error out
  33. How much of my application do I need to change?

  34. < 500 lines of code to change

  35. Migrating to Citus ORMs tenant_id in SELECT/UPDATE/DELETE statements master_create_distributed_table() calls

    in migrations Cross-Tenant Queries Possible, but have restricted SQL coverage (e.g. no CTEs) Run your test suite / CI environment against Citus
  36. Transactions For SELECT/UPDATE/INSERT/DELETE: Work the same as on a single-node

    Postgres, as long as you access a single shard For DDL (e.g. migrations) Work across shards, with some restrictions (e.g. DROP TABLE can’t run in a transaction)
  37. Semi-structured Datatypes PostgreSQL JSONB Old multi-tenant architecture (e.g. Salesforce)

  38. Deploying Citus Community Edition Citus Cloud Enterprise Edition

  39. High Availability Streaming Replication Citus Built-In Replication

  40. Scaling Up r3.xlarge r3.4xlarge

  41. Scaling Out

  42. Shard Rebalancer

  43. One-Click Scaling with Citus Cloud

  44. Thanks! Q&A