[CHISEL] Introduction to zero-downtime schema evolution in SQL databases.

Slide 1

Slide 1 text

Zero-downtime schema evolution in SQL databases

Slide 2

Slide 2 text

Michael de Jong @java_devver

Slide 3

Slide 3 text

Zero-downtime schema evolution in SQL databases You remember these, right?

Slide 4

Slide 4 text

Zero-downtime schema evolution in SQL databases Modifying tables, columns, indices, constraints, yada, yada, yada…

Slide 5

Slide 5 text

Zero-downtime schema evolution in SQL databases Without disruption, or performance degradation!

Slide 6

Slide 6 text

Relevance • Web applications • Healthcare systems • Banking / financial systems • Air travel / booking systems • Anything which is too expensive to go offline • Continuous Deployment

Slide 7

Slide 7 text

Current state It’s not that pretty…

Slide 8

Slide 8 text

• Blue-Green   deployments • Expand-Contract   deployments Approaches

Slide 9

Slide 9 text

• OpenArk kit • Percona toolkit • Large Hadron Migrator • Table Migrator • Online Schema Change Solutions Tools from industry All work more or less   the same way

Slide 10

Slide 10 text

1. Current situation

Slide 11

Slide 11 text

1. Current situation

Slide 12

Slide 12 text

2. Create clone of table

Slide 13

Slide 13 text

3. Copy data to new table

Slide 14

Slide 14 text

3. Copy data to new table

Slide 15

Slide 15 text

4. Atomically rename table

Slide 16

Slide 16 text

• PRISM++ • IMAGO • Column-oriented databases (noSQL) • Google’s Spanner & F1 database (newSQL) Solutions Academic research

Slide 17

Slide 17 text

Challenges • Referential integrity support. • Versioning schema changes. • Rolling back failures. • Complex and mostly manual task. • Mostly tools only support MySQL. • Switching to alternative database can be costly. What could go wrong?

Slide 18

Slide 18 text

What do we do now?

Slide 19

Slide 19 text

Approach 1. Create tool to test databases for blocking behaviour. 2. Profile commonly used SQL databases. 3. Prototype solution to deal with schema changes. 4. Verify prototype shows no blocking behaviour with said profiling tool. 5. Test in production environments / case studies. Run away if that fails

Slide 20

Slide 20 text

Nemesis 1. Create a table and insert 100m records. 2. “Simulate load” by barraging the database with SELECT, UPDATE, INSERT, and DELETE queries. 3. Execute a DDL query to modify the table’s structure. 4. Record all start and end times for every query. 5. Plot the results.

Slide 21

Slide 21 text

Time (100ms periods) Duration of queries (1px = 1ms) Period where DDL statement is executing Colors represent different   types of DML queries

Slide 22

Slide 22 text

The results I have no idea what I’m doing…

Slide 23

Slide 23 text

Add nullable column Add non-nullable column Rename nullable column Rename non-nullable column Drop nullable column Drop non-nullable column Create index Rename index Drop index ! Make column nullable Make column non-nullable Set default on nullable column Set default on non-nullable column Modify type on nullable column Modify type on non-nullable column Modify type from int to text Add non-nullable foreign key Add nullable foreign key ! Non-blocking Semi-blocking Blocking Too fast to proﬁle

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Take away • Don’t ever trust DDL statements   that operate on a “live” table. • Unless you have a very intimate   understanding of your own SQL database. • And then still don’t trust DDL statements.

Slide 26

Slide 26 text

What’s next? Been there, done that…

Slide 27

Slide 27 text

Next up • Discuss results on mailing lists   of profiled SQL databases. • I really should start blogging about this… • Lock myself up in a room,   and start working on a prototype.