Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Presentation Overview Define terms Rails, SaaS, multi-tenancy, data isolation, sharding Work with a multi-tenant SaaS B2B Rails and PostgreSQL app Options for scaling up with traditional PostgreSQL Recently added capabilities in Active Record and Citus Scale out options beyond PostgreSQL SaaS on Rails on Postgres — andyatkinson.com Slide 2 of 34

Slide 3

Slide 3 text

What is Rails? Tech Created Summary Open Source Language Ruby 1995 General purpose programming language C Ruby on Rails, aka "Rails" 2004 Web App Development Framework Ruby Active Record Framework within Ruby on Rails. ORM. Schema Management. Multi-DBs. Horizontal Sharding. SaaS on Rails on Postgres — andyatkinson.com Slide 3 of 34

Slide 4

Slide 4 text

Scale Out Overview Single primary PostgreSQL capabilities Active Record Horizontal Sharding Citus distributed Postgres sharding Database Sharding Explained - Mahdi Yusuf SaaS on Rails on Postgres — andyatkinson.com Slide 4 of 34

Slide 5

Slide 5 text

github.com/andyatkinson/rideshare Ruby on Rails app with PostgreSQL Code samples for this presentation: Pull Request #198 SaaS on Rails on Postgres — andyatkinson.com Slide 5 of 34

Slide 6

Slide 6 text

SaaS Definitions SaaS is a software delivery model SaaS apps require architectural planning to process large amounts of data while offering good performance and availability Customers are businesses (B2B) or consumers (B2C) B2B SaaS might have dozens, hundreds, or thousands of customers B2C might have thousands or millions SaaS on Rails on Postgres — andyatkinson.com Slide 6 of 34

Slide 7

Slide 7 text

SaaS Apps Common Needs Authentication Profiles Plans & Subscriptions API Integrations Accounts & Teams SaaS on Rails on Postgres — andyatkinson.com Slide 7 of 34

Slide 8

Slide 8 text

SaaS on Rails Bullet Train: https://bullettrain.co (Open Source) JumpStart Rails: https://jumpstartrails.com Business Class: https://businessclasskit.com SaaS on Rails on Postgres — andyatkinson.com Slide 8 of 34

Slide 9

Slide 9 text

Adding Multi-Tenancy For our multi-tenancy, our tenants will be "Companies" Table data related to multiple companies will be stored in the same table We'll identify company data with the company_id foreign key column SaaS on Rails on Postgres — andyatkinson.com Slide 9 of 34

Slide 10

Slide 10 text

Without multi-tenancy Provision a new database, or new instance per tenant Greater isolation More infra costs More maintenance burden SaaS on Rails on Postgres — andyatkinson.com Slide 10 of 34

Slide 11

Slide 11 text

Designing multi-tenancy for Rideshare CREATE TABLE companies ( id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text UNIQUE NOT NULL ); CREATE TABLE trips ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, company_id INT NOT NULL REFERENCES companies(id) ); SaaS on Rails on Postgres — andyatkinson.com Slide 11 of 34

Slide 12

Slide 12 text

Gain efficiencies with multi-tenancy Multi-tenant data in fewer tables, schemas, databases, and instances Gain flexibility for moving or copying tenant data Accommodate external legal requirements for tenant data Besides easier identification, we can add some security SaaS on Rails on Postgres — andyatkinson.com Slide 12 of 34

Slide 13

Slide 13 text

PostgreSQL Row Level Security Create ROLE objects per company Within a multi-tenant table, use the company_id column to target rows Limit access to company_id rows to matching ROLE objects using a POLICY CREATE USER company_one; CREATE POLICY select_policy ON trips FOR SELECT USING (company_id = current_company_id()); SaaS on Rails on Postgres — andyatkinson.com Slide 13 of 34

Slide 14

Slide 14 text

What are some database challenges we'll run into? We can't allocate our instance resources like CPU cores, memory, IO operations to tenants We have a shared set of database connections We have a single buffer cache ( shared_buffers ) Observability for tenant operations may be more challenging We may need to significantly over-provision SaaS on Rails on Postgres — andyatkinson.com Slide 14 of 34

Slide 15

Slide 15 text

Optional: Composite Primary Keys We currently have traditional (surrogate) primary keys defined A composite primary key for trips would include company_id , and allow us to enforce unique data for the tenant CPK support in Active Record 7.1 (Released 2023) SaaS on Rails on Postgres — andyatkinson.com Slide 15 of 34

Slide 16

Slide 16 text

Migrating to CPKs -- Single column primary key CREATE TABLE trips ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, company_id INT NOT NULL ); -- CPK CREATE TABLE trips ( id BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL, company_id INT NOT NULL, CONSTRAINT trips_pkey_cpk PRIMARY KEY (id, company_id) ); SaaS on Rails on Postgres — andyatkinson.com Slide 16 of 34

Slide 17

Slide 17 text

PostgreSQL Table Partitioning in Multi-Tenant SaaS With the LIST type, create a partition per tenant ( company_id ) When adding a new tenant, create a corresponding new partition All queries must include the company_id partition key Caveats: UNIQUE constraints, increased planning time Big benefit: If companies leave platform, efficient detachment and archival of their data is possible* *Using DETACH CONCURRENTLY available since 14 (2021) SaaS on Rails on Postgres — andyatkinson.com Slide 17 of 34

Slide 18

Slide 18 text

PostgreSQL SCHEMA Per Tenant For new companies, create new SCHEMA objects as a grouping method for their tables Limit access to schemas to their corresponding tenant SaaS on Rails on Postgres — andyatkinson.com Slide 18 of 34

Slide 19

Slide 19 text

Multi-tenant data isolation Ruby gems acts_as_tenant gem helps with row-level multi-tenant data Apartment gem offers schema-based data isolation SaaS on Rails on Postgres — andyatkinson.com Slide 19 of 34

Slide 20

Slide 20 text

Data Isolation Database Schema Row Table Partitioning Per-tenant Schema Apartment gem acts_as_tenant gem Row Level Security SaaS on Rails on Postgres — andyatkinson.com Slide 20 of 34

Slide 21

Slide 21 text

Scaling beyond a single PostgreSQL instance Using secondary physical or logical replication instances, we can distribute reads, or even perform Active Active (BDR, avoids PK conflicts). However, there's no native "multi-primary" or "sharding" capability in PostgreSQL. WIP PostgreSQL Sharding, FDW This means row-level or schema-level sharding isn't an option SaaS on Rails on Postgres — andyatkinson.com Slide 21 of 34

Slide 22

Slide 22 text

Multi-Tenancy and Sharding With Active Record Active Record gained the Horizontal Sharding in 6.1 (2020) Using Active Record Horizontal Sharding, we can assign tenants to "shards" ( company_one ) Rails app identifies tenant, routes requests to shards The shard is a same-schema per-tenant database, exclusively with tenant data Shards (databases) can run on same instance, or on separate server instances for greater distribution and scalability SaaS on Rails on Postgres — andyatkinson.com Slide 22 of 34

Slide 23

Slide 23 text

class ShardRecord < ApplicationRecord self.abstract_class = true connects_to shards: { company_one: { writing: :primary_company_one, reading: :primary_company_one_replica } } end Rails.application.configure do config.active_record.shard_selector = { lock: true } config.active_record.shard_resolver = ->(request) { Company.find_by!(subdomain: request.subdomain).shard } end SaaS on Rails on Postgres — andyatkinson.com Slide 23 of 34

Slide 24

Slide 24 text

Citus and Sharding Citus is a distributed Postgres deployment for horizontal scaling One coordinator node, and 1 or more worker nodes Shards can be placed on workers, and "re-sharding" supported Open source, available as a (super) extension, hosted offerings (Microsoft Azure Cosmos DB for PostgreSQL), Crunchy Bridge (AWS, Azure, GCP) SaaS on Rails on Postgres — andyatkinson.com Slide 24 of 34

Slide 25

Slide 25 text

Citus Distributed Tables Citus 12.1 supports IDENTITY columns Row-level sharding, company_id , similar shards can be on same nodes with colocation CREATE TABLE companies ( id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text UNIQUE NOT NULL ); CREATE TABLE trips ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, company_id INT NOT NULL ); SaaS on Rails on Postgres — andyatkinson.com Slide 25 of 34

Slide 26

Slide 26 text

Citus Reference and Distributed Tables Preparing Reference Tables (replicated to all workers) Changing PKs and FKs The activerecord-multi-tenant Ruby gem adds helpers -- Prereq: Drop FKs referring to reference tables SELECT create_reference_table('users'); SELECT create_reference_table('locations'); -- Prereq: Drop single-column PKs SELECT create_distributed_table('trip_requests', 'company_id'); SELECT create_distributed_table('trips', 'company_id'); SaaS on Rails on Postgres — andyatkinson.com Slide 26 of 34

Slide 27

Slide 27 text

Supply distribution column so query goes to correct shard Trip.where(id: 1).select(:id) # slow SELECT id FROM rideshare.trips ... Trip.where(id: 1, company_id: 1).select(:id) # fast SELECT id FROM rideshare.trips_102114 ... SaaS on Rails on Postgres — andyatkinson.com Slide 27 of 34

Slide 28

Slide 28 text

Citus Schema-based Sharding From Citus 12 (2023) onward, schema-based sharding is available Distribution column is not needed Create schema per tenant, schemas become shards Tables in same schema co-located on same node, so JOIN operations and FKs work SET citus.enable_schema_based_sharding TO ON; CREATE SCHEMA company_one; CREATE SCHEMA company_two; SET search_path TO company_one; SaaS on Rails on Postgres — andyatkinson.com Slide 28 of 34

Slide 29

Slide 29 text

Image source: Citus Blog SaaS on Rails on Postgres — andyatkinson.com Slide 29 of 34

Slide 30

Slide 30 text

Sharding App-level Outside app Schema Row Active Record Horizontal Sharding Citus row-based Citus schema-based SaaS on Rails on Postgres — andyatkinson.com Slide 30 of 34

Slide 31

Slide 31 text

Wrapping Up and Advice Single primary PostgreSQL w/ replication can take you far! Retain PK, FK, constraints, JOIN operations Collect usage-metrics for shared resources like CPU, Memory, connections, and IO Identify, measure performance, and plan for future states for high growth tables "Costs" are infrastructure costs and engineering time SaaS on Rails on Postgres — andyatkinson.com Slide 31 of 34

Slide 32

Slide 32 text

Additional Considerations Consider designing for multi-tenancy even just to make tenant data easier to identify Check your PostgreSQL extension support Consider how app-level vs. Postgres-level solutions match your team skill sets and expected maintenance plans Your app-defined schema will evolve. With distributed databases, incremental changes are more complex. SaaS on Rails on Postgres — andyatkinson.com Slide 32 of 34

Slide 33

Slide 33 text

Resources Database Sharding Explained - Mahdi Yusuf Active Record Horizontal Sharding Freshworks: Horizontal sharding in a multi-tenant app with Rails 6.1 - Ritikesh G Scale Out Multi-Tenant Apps based on Ruby on Rails - Lukas Fittl Ruby on Rails Scale Out Citus Documentation Designing your SaaS Database for Scale with Postgres - Ozgun Erdogan Schema-based sharding comes to PostgreSQL with Citus - Onur Tirtir Adding Postgres 16 support to Citus 12.1, plus schema-based sharding improvements - Naisila Puka Active Active in Postgres 16 - Brian Pace SaaS on Rails on Postgres — andyatkinson.com Slide 33 of 34

Slide 34

Slide 34 text

Thanks! x.com/andatki linkedin.com/in/andyatkinson pgrailsbook.com andyatkinson.com SaaS on Rails on Postgres — andyatkinson.com