My assumptions: ● Schema migrations are frequent. ● Automated schema migration is a goal. ● Stage environment is enough like production for testing. ● Writing a small amount of code is ok.
What sucked about this: ● Wasn't the first time (see 2012 bugs) ● Change snuck into partitioning UDF Jan-April 2013 ● No useful audit trail ● Some partitions affected, not others ● Error dated back to 2010 ● Wake up call to examine process!
What was awesome: ● Used Alembic to manage the change ● Tested in stage ● Experimentation revealed which partitions could be modified without deadlocking ● Rolled out change with a regular release during normal business hours
Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision –-auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision -–auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release 5.Have jenkins run downgrade/upgrade as part of test suite.
Problems Alembic solved: ● Easy-to-deploy migrations including UDFs for dev and stage ● Can embed raw SQL, issue multi- commit changes ● Includes downgrades
Questions to ask: ● How often does your schema change? ● Can the migrations be run without you? ● Can you test a migration before you run it in production?
Questions to ask: ● Can developers create a new schema without your help? ● How hard is it to get from an old schema to a new one using the tool? ● Are change rollbacks a standard use of the tool?
What does our system need to do? ● Communicate change ● Apply changes in the correct order ● Apply a change only once ● Use raw SQL where needed ● Provide a single interface for change ● Rollback gracefully
And good ORM stewardship: ● Fits with existing tooling and developer workflows ● Enables partnership with developers ● Integrates with a testing framework
And: ● Gives you a new way to think about schemas ● Develops compassion for how horrible ORMs can be ● Gives you developer-friendly vocabulary for discussing why ORM- generated code is often terrible
https://alembic.readthedocs.org Vocabulary: revision: a single migration down_revision: previous migration upgrade: apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change
Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema): return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None
Manage User Defined Functions? Chose to use raw SQL files 3 directories, 128 files: procs/ types/ views/ codepath = '/socorro/external/pg/raw_sql/procs' def load_stored_proc(op, filelist): app_path = os.getcwd() + codepath for filename in filelist: sqlfile = app_path + filename with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())
Always roll forward. 1. Put migrations in a separate commit from schema changes. 2. Revert commits for schema change, leave migration commit in-place for downgrade support.
Store schema objects in the smallest, reasonable, composable unit. 1. Use an ORM for core schema. 2. Put types, UDFs and views in separate files. 3. Consider storing the schema in a separate repo from the application.
Write tests. Run them every time. 1. Write a simple tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
Write tests. Run them every time. 1. Write a simple tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
1. Understand partitions 2. Never apply a DEFAULT to a new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences 6.Support multiple branches in history
Other tools: Sqitch http://sqitch.org/ Written by PostgreSQL contributor Erwin http://erwin.com/ Commercial, popular with Oracle South http://south.aeracode.org/ Django-specific, well-supported
Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie chesnok.com https://speakerdeck.com/selenamarie/sane- schema-management-with-alembic