Slide 1

Slide 1 text

Working with Huge Databases and Tables Prem Sichanugrist (@sikachu) Tokyo Rails #39 2019-04-16

Slide 2

Slide 2 text

Prem Sichanugrist @sikachu /sikachu

Slide 3

Slide 3 text

Working with Huge Databases and Tables

Slide 4

Slide 4 text

Migrations

Slide 5

Slide 5 text

Zero-Downtime Migrations

Slide 6

Slide 6 text

Why Zero-Downtime Migrations?

Slide 7

Slide 7 text

Why Zero-Downtime Migrations? ● Global service with users around the world ○ Users using the service 27/4 from different time zones ○ We want our service to always be available

Slide 8

Slide 8 text

Why Zero-Downtime Migrations? ● Global service with users around the world ○ Users using the service 27/4 from different time zones ○ We want our service to always be available ● Normal migration process locks table ○ We have a lot of user data (~300 GB on disk, biggest table has 500M rows) ○ Normal migration locks the whole table and could take service down

Slide 9

Slide 9 text

Why Zero-Downtime Migrations? ● Global service with users around the world ○ Users using the service 27/4 from different time zones ○ We want our service to always be available ● Normal migration process locks table ○ We have a lot of user data (~300 GB on disk, biggest table has 500M rows) ○ Normal migration locks the whole table and could take service down ● We want to be able to do continuous deployment ○ No “code freeze”, every merges to master goes to production right away ○ Normal migration will cause deployment disruption throughout the day

Slide 10

Slide 10 text

Zero-Downtime Migrations

Slide 11

Slide 11 text

Percona Online Schema Change

Slide 12

Slide 12 text

Percona Online Schema Change ● Also known as: pt-osc

Slide 13

Slide 13 text

Percona Online Schema Change ● Also known as: pt-osc ● Part of Percona Toolkit (http://www.percona.com/software/) ○ Collection of advanced command-line tools used by Percona support staff — README.md

Slide 14

Slide 14 text

Percona Online Schema Change ● Also known as: pt-osc ● Part of Percona Toolkit (http://www.percona.com/software/) ○ Collection of advanced command-line tools used by Percona support staff — README.md ● Provide command to perform online schema change ○ Create new temporary table with updated schema ○ Copy data over to the new table ○ Drop old table, and rename temporary table to old table’s name (atomic)

Slide 15

Slide 15 text

Percona Online Schema Change pt-online-schema-change \ --alter "ADD COLUMN column_name VARCHAR(255)" \ D=database_name,t=table_name https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html

Slide 16

Slide 16 text

Percona Online Schema Change

Slide 17

Slide 17 text

Percona Online Schema Change + Rails

Slide 18

Slide 18 text

Departure https://github.com/departurerb/departure

Slide 19

Slide 19 text

Departure ● Hooks into Active Record's connection adapter syntax ○ No need to write custom migration!

Slide 20

Slide 20 text

Departure ● Hooks into Active Record's connection adapter syntax ○ No need to write custom migration! ● Construct arguments for pt-online-schema-change ○ Then runs pt-online-schema-change to perform the migration

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

One Morning in Slack ● A deploy to staging was going on for 3 hours.

Slide 25

Slide 25 text

One Morning in Slack ● A deploy to staging was going on for 3 hours ● Deployment was stuck at migration step ○ SRE team started investigate

Slide 26

Slide 26 text

One Morning in Slack ● A deploy to staging was going on for 3 hours ● Deployment was stuck at migration step ○ We tried to add a new column to a table that has about 100 million rows

Slide 27

Slide 27 text

Percona Online Schema Change ● Also known as: pt-osc ● Part of Percona Toolkit (http://www.percona.com/software/) ○ Collection of advanced command-line tools used by Percona support staff — README.md ● Provide command to perform online schema change ○ Create new temporary table with updated schema ○ Copy data over to the new table ○ Drop old table, and rename temporary table to old table’s name (atomic)

Slide 28

Slide 28 text

One Morning in Slack ● A deploy to staging was going on for 3 hours ● Deployment was stuck at migration step ○ We tried to add a new column to a table that has about 100M rows ○ It takes a long time to copy over data of 100M rows to the new table

Slide 29

Slide 29 text

One Morning in Slack ● A deploy to staging was going on for 3 hours ● Deployment was stuck at migration step ○ We tried to add a new column to a table that has about 100M rows ○ It takes a long time to copy over data of 100M rows to the new table ● Deployment canceled, SRE team took over the database migration ○ SRE team runs pt-online-schema-change directly on the server

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

InnoDB Online DDL Operations https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html

Slide 32

Slide 32 text

InnoDB Online DDL Operations ● Available from MySQL 5.6+ ○ We are using Amazon RDS for MySQL 5.6+

Slide 33

Slide 33 text

InnoDB Online DDL Operations ● Available from MySQL 5.6+ ○ We are using Amazon RDS for MySQL 5.6+ ● Various operations can be performed online ○ CREATE INDEX, DROP INDEX ○ ADD COLUMN, DROP COLUMN* ○ Rename column, reorder column* ○ SET DEFAULT, DROP DEFAULT, change column nullability ○ Set AUTO_INCREMENT* ○ Add/remove foreign key ○ RENAME TABLE*

Slide 34

Slide 34 text

InnoDB Online DDL Operations ● Available from MySQL 5.6+ ○ We are using Amazon RDS for MySQL 5.6+ ● Various operations can be performed online ○ CREATE INDEX, DROP INDEX, ADD COLUMN, DROP COLUMN, … ● Some DDL queries need to be modified to run online

Slide 35

Slide 35 text

InnoDB Online DDL Operations ALTER TABLE tbl_name ADD COLUMN column_name column_definition; ALTER TABLE tbl_name ADD COLUMN column_name column_definition, ALGORITHM=INPLACE, LOCK=NONE;

Slide 36

Slide 36 text

InnoDB Online DDL Operations ALTER TABLE tbl_name ALTER COLUMN col SET DEFAULT literal; ALTER TABLE tbl_name ALTER COLUMN col SET DEFAULT literal, ALGORITHM=INPLACE, LOCK=NONE;

Slide 37

Slide 37 text

InnoDB Online DDL Operations CREATE INDEX name ON table (col_list); DROP INDEX name ON table; -- Already online operations

Slide 38

Slide 38 text

InnoDB Online DDL Operations + Rails

Slide 39

Slide 39 text

Index Operations ● CREATE INDEX, DROP INDEX

Slide 40

Slide 40 text

Index Operations ● CREATE INDEX, DROP INDEX ● Disable Departure, migration can be run during deploy if not too long

Slide 41

Slide 41 text

Column Operations ● ADD COLUMN, DROP COLUMN, rename column, change default value

Slide 42

Slide 42 text

Column Operations ● ADD COLUMN, DROP COLUMN, rename column, change default value ● Can’t run directly as Rails’ generated SQL locks table

Slide 43

Slide 43 text

Column Operations ● ADD COLUMN, DROP COLUMN, rename column, change default value ● Can’t run directly as Rails’ generated SQL locks table ● Open new issue with SRE team, asking to run migration DDL with ALGORITHM=INPLACE, LOCK=NONE; ○ Then, perform INSERT into schema_migrations with migration version directly

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Online DDL Operations are faster, but ... ● We now require SRE’s help to perform migration tasks

Slide 47

Slide 47 text

Online DDL Operations are faster, but ... ● We now require SRE’s help to perform migration tasks ● Too many steps to perform by human ○ Open PR, get it approved ○ Open ticket with SRE with online SQL command to run ○ SRE has to run command in MySQL console, has to manually insert migration version ○ Merges PR

Slide 48

Slide 48 text

Online DDL Operations are faster, but ... ● We now require SRE’s help to perform migration tasks ● Too many steps required to perform by human ○ Open PR, get it approved ○ Open ticket with SRE with online SQL command to run ○ SRE has to run command in MySQL console, has to manually insert migration version ○ Merges PR ○ Mistake can happen if not done properly

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Improvements ● Create Departure-like gem to hook into Active Record’s Migration ○ Generate SQL that tells MySQL to perform non-locking DDL operations ○ Migration runs automatically during deployment just like normal migrations ○ No need to ask for SRE help

Slide 52

Slide 52 text

Improvements ● Create Departure-like gem to hook into Active Record’s Migration ○ Generate SQL that tells MySQL to perform non-locking DDL operations ○ Migration runs automatically during deployment just like normal migrations ○ No need to ask for SRE help ● Separate migration deploy from code deployment ○ Prevent blocking code deployment to production ○ Deploy database migration first on a separate deploy process ■ Mark that migration as finished, so migration in code PR will be no-op ○ Needs to write code such that it still works before and after database migration ■ Multiple steps process for renaming column/table.

Slide 53

Slide 53 text

Alternatives ● Use external tool to manage database schema instead ○ Ridgepole (https://github.com/winebarrel/ridgepole) ○ Store database schema in a separate database ○ Database migrations can be apply independently from code changes

Slide 54

Slide 54 text

Prem Sichanugrist @sikachu /sikachu Thank you! All Illustrations are from いらすとや (https://www.irasutoya.com)