Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Evolution - The Hard Parts

Gwen (Chen) Shapira
April 22, 2024
21

Schema Evolution - The Hard Parts

Schema evolution, also known as “schema migrations,” is a core part of application development and is considered one of the riskiest - with the potential to bring down entire production systems with one innocent mistake.

Schema evolution becomes especially tricky for SaaS, where the same application serves many tenants. Whether the tenants share the same Postgres databases and schema or are more isolated, schema changes must be coordinated across tenants, application servers, and Postgres schemas.

In this talk, we’ll discuss the challenges of schema evolution in each of the common patterns developers use to implement multi-tenant applications. We’ll explore existing solutions and share how we are solving this problem at Nile, a Serverless Postgres built for SaaS.

Gwen (Chen) Shapira

April 22, 2024
Tweet

Transcript

  1. Nice to meet you! I’m Gwen Shapira - Co-founder of

    Nile: Serverless Postgres for Modern SaaS - Previously: Cloud Native Kafka lead @ Confluent - Previously: Data architect (Hadoop, MySQL, Oracle) - Apache Kafka PMC - Wrote books, tweet a lot @gwenshap 02
  2. • Schema migrations today • Process problems • Lock problems

    • Multi-tenancy problems • and how we solved them • Logical replication problems
  3. Problem 1: Failed migration leftovers: ai_in_eu=> create table cats (

    id serial primary key, name text not null, age int not null, weight float not null ); CREATE TABLE ai_in_eu=> create index on cats (nickname); ERROR: column "nickname" does not exist You now have a table with no index…
  4. Problems 2 + 3: Schema Mismatches & Incompatible changes mydb=>

    CREATE INDEX ON cats (nickname); ERROR: column "nickname" does not exist mydb=> ALTER TABLE cats RENAME name to nickname; But it existed in pre-prod? Will the app handle this?
  5. Solved by locking This works only in databases where DDL

    doesn’t cause transactions to commit. Like Postgres.
  6. Conflicts with SELECT INSERT / UPDATE / DELETE CREATE INDEX

    CONCUR CREATE INDEX ALTER / DROP / TRUNCATE TABLE SELECT ❌ INSERT / UPDATE / DELETE ❌ ❌ CREATE INDEX CONCUR ❌ ❌ ❌ CREATE INDEX ❌ ❌ ❌ ALTER / DROP / TRUNCATE TABLE ❌ ❌ ❌ ❌ ❌
  7. It can get worse BEGIN ALTER TABLE super_busy ADD COLUMN

    ok boolean; ALTER TABLE … ADD CONSTRAINT FOREIGN KEY REFERENCES … CREATE UNIQUE INDEX ON very_large_table … CREATE TABLE … ALTER TABLE huge_table … ADD CONSTRAINT CHECK … UPDATE TABLE also_busy set X = f(x,y,z); -- wait, getting coffee COMMIT
  8. Don’t do: Do this instead: BEGIN ALTER TABLE … ADD

    CONSTRAINT FOREIGN KEY REFERENCES ALTER TABLE huge_table … ADD CONSTRAINT CHECK … -- more stuff COMMIT BEGIN ALTER TABLE … ADD CONSTRAINT FOREIGN KEY REFERENCES … NOT VALID ALTER TABLE huge_table … ADD CONSTRAINT CHECK … NOT VALID -- more stuff COMMIT BEGIN ALTER TABLE … VALIDATE CONSTRAINT COMMIT
  9. Don’t do: Do this instead: BEGIN ALTER TABLE … UPDATE

    huge_table SET … COMMIT BEGIN ALTER TABLE … COMMIT BEGIN UPDATE huge_table SET … COMMIT
  10. Don’t do: Do this instead: BEGIN ALTER TABLE t1 …

    RENAME ALTER TABLE t2 … RENAME COMMIT -- Retry on failure: BEGIN SET LOCAL lock_timeout=100s; LOCK TABLE … IN ACCESS EXCLUSIVE MODE LOCK TABLE … IN ACCESS EXCLUSIVE MODE ALTER TABLE … ALTER TABLE … COMMIT
  11. Multi-tenancy SaaS products usual have the same application stack across

    all customers. The data for the customers can be in a schema / DB for each Tenants. Or the data can be in shared schema, sometimes sharded for scale. 02
  12. Migrating lots of Tenants schemas Starts easy… Migration container Tenant

    1 Tenant 2 schemas=(Tenants1 Tenants2) for schema in "${schemas[@]}" do migrate_schema ${schema} done
  13. Tenants Tenants get_db_list() | xargs -P 5 -n 1 migrate_schema

    Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants
  14. Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants

    Tenants Tenants Tenants Tenants Tenants Tenants Tenants Tenants ✅ You apps, reports, tools… all have to support all scenarios. ❌ ✅ ✅ ✅ ✅ ✅ ❌ 🕠 🕠 ✅ ✅ ❌ 🕠
  15. Coordinating DDLs. The Nile way. It may look like an

    overkill. But Nile exists to solve hard database problems for multi-tenant applications. The goal is to give an experience identical to a single database with same guarantees. While providing isolation when needed. These familiar guarantees make the system simple to reason about and creates a great developer experience.
  16. Tenants Tenants Tenants Tenants Tenants Tenants Migration script Migration transaction

    coordinator ACQUIRE LOCKS ([locks]) CREATE TABLE ACQUIRE LOCKS ([locks]) ALTER TABLE
  17. Tenants Tenants Tenants Tenants Tenants Tenants Migration script Migration transaction

    coordinator SELECT acquire_locks([lock info]) CREATE TABLE SELECT acquire_locks([lock info]) ALTER TABLE
  18. There is also the opposite problem Where multiple tenants share

    the schema… tenant_id employee_id email salary 1 2 [email protected] <private> 1 1 [email protected] <private> 2 5456312 [email protected] <private> 3 4354622 [email protected] <private>
  19. - What if one tenant has constraint conflict? - What

    if one tenant is not ok with maintenance window? 02
  20. SaaS Migrating multi-tenant databases require both coordination and isolation Logical

    Replication It doesn’t do DDLs. There are solutions, but one day someone will solve it right. Do it right DDL is full of problems that you don’t see when they are small and then blow up. To sum it up Schema Migrations are deceptive They look simple and get tricky fast Today Most companies integrate migrations in CI/CD and use migration tools. Some problems are solved with simple tools - transactions, and CI/CD Locks Process Locks are surprisingly tricky and nuanced. Break migrations into steps and test on “realistic” env.