Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Working with Huge Databases and Tables

Working with Huge Databases and Tables

Alternate Title: Zero-downtime Migration on Big Database

Presented at [TOKYO Rails #39](https://www.meetup.com/tokyo-rails/events/259500572/) on April 16, 2019.

Prem Sichanugrist

April 16, 2019
Tweet

More Decks by Prem Sichanugrist

Other Decks in Programming

Transcript

  1. Why Zero-Downtime Migrations? • Global service with users around the

    world ◦ Users using the service 27/4 from different time zones ◦ We want our service to always be available
  2. Why Zero-Downtime Migrations? • Global service with users around the

    world ◦ Users using the service 27/4 from different time zones ◦ We want our service to always be available • Normal migration process locks table ◦ We have a lot of user data (~300 GB on disk, biggest table has 500M rows) ◦ Normal migration locks the whole table and could take service down
  3. Why Zero-Downtime Migrations? • Global service with users around the

    world ◦ Users using the service 27/4 from different time zones ◦ We want our service to always be available • Normal migration process locks table ◦ We have a lot of user data (~300 GB on disk, biggest table has 500M rows) ◦ Normal migration locks the whole table and could take service down • We want to be able to do continuous deployment ◦ No “code freeze”, every merges to master goes to production right away ◦ Normal migration will cause deployment disruption throughout the day
  4. Percona Online Schema Change • Also known as: pt-osc •

    Part of Percona Toolkit (http://www.percona.com/software/) ◦ Collection of advanced command-line tools used by Percona support staff — README.md
  5. Percona Online Schema Change • Also known as: pt-osc •

    Part of Percona Toolkit (http://www.percona.com/software/) ◦ Collection of advanced command-line tools used by Percona support staff — README.md • Provide command to perform online schema change ◦ Create new temporary table with updated schema ◦ Copy data over to the new table ◦ Drop old table, and rename temporary table to old table’s name (atomic)
  6. Percona Online Schema Change pt-online-schema-change \ --alter "ADD COLUMN column_name

    VARCHAR(255)" \ D=database_name,t=table_name https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html
  7. Departure • Hooks into Active Record's connection adapter syntax ◦

    No need to write custom migration! • Construct arguments for pt-online-schema-change ◦ Then runs pt-online-schema-change to perform the migration
  8. One Morning in Slack • A deploy to staging was

    going on for 3 hours • Deployment was stuck at migration step ◦ SRE team started investigate
  9. One Morning in Slack • A deploy to staging was

    going on for 3 hours • Deployment was stuck at migration step ◦ We tried to add a new column to a table that has about 100 million rows
  10. Percona Online Schema Change • Also known as: pt-osc •

    Part of Percona Toolkit (http://www.percona.com/software/) ◦ Collection of advanced command-line tools used by Percona support staff — README.md • Provide command to perform online schema change ◦ Create new temporary table with updated schema ◦ Copy data over to the new table ◦ Drop old table, and rename temporary table to old table’s name (atomic)
  11. One Morning in Slack • A deploy to staging was

    going on for 3 hours • Deployment was stuck at migration step ◦ We tried to add a new column to a table that has about 100M rows ◦ It takes a long time to copy over data of 100M rows to the new table
  12. One Morning in Slack • A deploy to staging was

    going on for 3 hours • Deployment was stuck at migration step ◦ We tried to add a new column to a table that has about 100M rows ◦ It takes a long time to copy over data of 100M rows to the new table • Deployment canceled, SRE team took over the database migration ◦ SRE team runs pt-online-schema-change directly on the server
  13. InnoDB Online DDL Operations • Available from MySQL 5.6+ ◦

    We are using Amazon RDS for MySQL 5.6+ • Various operations can be performed online ◦ CREATE INDEX, DROP INDEX ◦ ADD COLUMN, DROP COLUMN* ◦ Rename column, reorder column* ◦ SET DEFAULT, DROP DEFAULT, change column nullability ◦ Set AUTO_INCREMENT* ◦ Add/remove foreign key ◦ RENAME TABLE*
  14. InnoDB Online DDL Operations • Available from MySQL 5.6+ ◦

    We are using Amazon RDS for MySQL 5.6+ • Various operations can be performed online ◦ CREATE INDEX, DROP INDEX, ADD COLUMN, DROP COLUMN, … • Some DDL queries need to be modified to run online
  15. InnoDB Online DDL Operations ALTER TABLE tbl_name ADD COLUMN column_name

    column_definition; ALTER TABLE tbl_name ADD COLUMN column_name column_definition, ALGORITHM=INPLACE, LOCK=NONE;
  16. InnoDB Online DDL Operations ALTER TABLE tbl_name ALTER COLUMN col

    SET DEFAULT literal; ALTER TABLE tbl_name ALTER COLUMN col SET DEFAULT literal, ALGORITHM=INPLACE, LOCK=NONE;
  17. InnoDB Online DDL Operations CREATE INDEX name ON table (col_list);

    DROP INDEX name ON table; -- Already online operations
  18. Index Operations • CREATE INDEX, DROP INDEX • Disable Departure,

    migration can be run during deploy if not too long
  19. Column Operations • ADD COLUMN, DROP COLUMN, rename column, change

    default value • Can’t run directly as Rails’ generated SQL locks table
  20. Column Operations • ADD COLUMN, DROP COLUMN, rename column, change

    default value • Can’t run directly as Rails’ generated SQL locks table • Open new issue with SRE team, asking to run migration DDL with ALGORITHM=INPLACE, LOCK=NONE; ◦ Then, perform INSERT into schema_migrations with migration version directly
  21. Online DDL Operations are faster, but ... • We now

    require SRE’s help to perform migration tasks
  22. Online DDL Operations are faster, but ... • We now

    require SRE’s help to perform migration tasks • Too many steps to perform by human ◦ Open PR, get it approved ◦ Open ticket with SRE with online SQL command to run ◦ SRE has to run command in MySQL console, has to manually insert migration version ◦ Merges PR
  23. Online DDL Operations are faster, but ... • We now

    require SRE’s help to perform migration tasks • Too many steps required to perform by human ◦ Open PR, get it approved ◦ Open ticket with SRE with online SQL command to run ◦ SRE has to run command in MySQL console, has to manually insert migration version ◦ Merges PR ◦ Mistake can happen if not done properly
  24. Improvements • Create Departure-like gem to hook into Active Record’s

    Migration ◦ Generate SQL that tells MySQL to perform non-locking DDL operations ◦ Migration runs automatically during deployment just like normal migrations ◦ No need to ask for SRE help
  25. Improvements • Create Departure-like gem to hook into Active Record’s

    Migration ◦ Generate SQL that tells MySQL to perform non-locking DDL operations ◦ Migration runs automatically during deployment just like normal migrations ◦ No need to ask for SRE help • Separate migration deploy from code deployment ◦ Prevent blocking code deployment to production ◦ Deploy database migration first on a separate deploy process ▪ Mark that migration as finished, so migration in code PR will be no-op ◦ Needs to write code such that it still works before and after database migration ▪ Multiple steps process for renaming column/table.
  26. Alternatives • Use external tool to manage database schema instead

    ◦ Ridgepole (https://github.com/winebarrel/ridgepole) ◦ Store database schema in a separate database ◦ Database migrations can be apply independently from code changes