Slide 1

Slide 1 text

Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie chesnok.com

Slide 2

Slide 2 text

I work on Socorro. http://github.com/mozilla/socorro http://crash-stats.mozilla.com

Slide 3

Slide 3 text

Thanks and apologies to Mike Bayer

Slide 4

Slide 4 text

What's sane schema management? Executing schema change in a controled, repeatable way while working with developers and operations.

Slide 5

Slide 5 text

What's alembic? Alembic is a schema migration tool that integrates with SQLAlchemy.

Slide 6

Slide 6 text

My assumptions: ● Schema migrations are frequent. ● Automated schema migration is a goal. ● Stage environment is enough like production for testing. ● Writing a small amount of code is ok.

Slide 7

Slide 7 text

No tool is perfect. DBAs should drive migration tool choice. Chose a tool that your developers like. Or, don't hate.

Slide 8

Slide 8 text

Part 0: #dbaproblems Part 1: Why we should work with developers on migrations Part 2: Picking the right migration tool Part 3: Using Alembic Part 4: Lessons Learned Part 5: Things Alembic could learn

Slide 9

Slide 9 text

Part 0: #dbaproblems

Slide 10

Slide 10 text

Migrations are hard. And messy. And necessary.

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Changing a CHECK constraint on 1000+ partitions. http://tinyurl.com/q5cjh45

Slide 13

Slide 13 text

What sucked about this: ● Wasn't the first time (see 2012 bugs) ● Change snuck into partitioning UDF Jan-April 2013 ● No useful audit trail ● Some partitions affected, not others ● Error dated back to 2010 ● Wake up call to examine process!

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Process before Alembic:

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

What was awesome: ● Used Alembic to manage the change ● Tested in stage ● Experimentation revealed which partitions could be modified without deadlocking ● Rolled out change with a regular release during normal business hours

Slide 18

Slide 18 text

Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision –-auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release

Slide 19

Slide 19 text

Process with Alembic: 1. Make changes to model.py or raw_sql files 2. Run: alembic revision -–auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release

Slide 20

Slide 20 text

Problems Alembic solved: ● Easy-to-deploy migrations including UDFs for dev and stage ● Can embed raw SQL, issue multi- commit changes ● Includes downgrades

Slide 21

Slide 21 text

Problems Alembic solved: ● Enables database change discipline ● Enables code review discipline ● Revisions are decoupled from release versions and branch commit order

Slide 22

Slide 22 text

Problems Alembic solved (continued): ● 100k+ lines of code removed ● No more post-deploy schema checkins ● Enabling a tested, automated stage deployment ● Separated schema definition from version-specific configuration

Slide 23

Slide 23 text

Photo courtesy of secure.flickr.com/photos/lambj HAPPY AS A CAT IN A BOX

Slide 24

Slide 24 text

Part I: Why we should work with developers on migrations

Slide 25

Slide 25 text

Credit: flickr.com/photos/chrisyarzab/

Slide 26

Slide 26 text

Schemas change.

Slide 27

Slide 27 text

Developers find this process really frustrating.

Slide 28

Slide 28 text

Schemas, what are they good for? Signal intent Communicate ideal state of data Highly customizable in Postgres

Slide 29

Slide 29 text

Schemas, what are they not so good for? Rapid iteration Documenting evolution Major changes on big data Data experimentation

Slide 30

Slide 30 text

Database systems resist change.

Slide 31

Slide 31 text

Database systems resist change because: Exist at the center of multiple systems Stability is a core competency Schema often is the only API between components

Slide 32

Slide 32 text

How do we make changes to schemas?

Slide 33

Slide 33 text

Because of resistance, we treat schema change as a one-off.

Slide 34

Slide 34 text

Evolution of schema change process

Slide 35

Slide 35 text

We're in charge of picking up the pieces when a poorly-executed schema change plan fails.

Slide 36

Slide 36 text

Trick question: When is the right time to work with developers on a schema change?

Slide 37

Slide 37 text

How do we safely make changes to schemas?

Slide 38

Slide 38 text

How do we safely make changes to schemas? Process and tooling. Preferably, that we choose and implement.

Slide 39

Slide 39 text

Migration tools are really configuration management tools.

Slide 40

Slide 40 text

Migrations are for: ● Communicating change ● Communicating process ● Executing change in a controled, repeatable way with developers and operations

Slide 41

Slide 41 text

Part 2: Picking the right migration tool

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Questions to ask: ● How often does your schema change? ● Can the migrations be run without you? ● Can you test a migration before you run it in production?

Slide 44

Slide 44 text

Questions to ask: ● Can developers create a new schema without your help? ● How hard is it to get from an old schema to a new one using the tool? ● Are change rollbacks a standard use of the tool?

Slide 45

Slide 45 text

What does our system need to do? ● Communicate change ● Apply changes in the correct order ● Apply a change only once ● Use raw SQL where needed ● Provide a single interface for change ● Rollback gracefully

Slide 46

Slide 46 text

How you are going to feel about the next slide:

Slide 47

Slide 47 text

Use an ORM with the migration tool.

Slide 48

Slide 48 text

Shameful admission: We had three different ways of defining schema in our code and tests.

Slide 49

Slide 49 text

A good ORM provides: ● One source of truth about the schema ● Reusable components ● Database version independence ● Ability to use raw SQL

Slide 50

Slide 50 text

And good ORM stewardship: ● Fits with existing tooling and developer workflows ● Enables partnership with developers ● Integrates with a testing framework

Slide 51

Slide 51 text

And: ● Gives you a new way to think about schemas ● Develops compassion for how horrible ORMs can be ● Gives you developer-friendly vocabulary for discussing why ORM- generated code is often terrible

Slide 52

Slide 52 text

Part 3: Using Alembic

Slide 53

Slide 53 text

Practical Guide to using Alembic http://tinyurl.com/po4mal6

Slide 54

Slide 54 text

https://alembic.readthedocs.org revision: a single migration down_revision: previous migration upgrade: apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change

Slide 55

Slide 55 text

Installing and using: virtualenv venv-alembic . venv-alembic/bin/activate pip install alembic alembic init vi alembic.ini alembic revision -m “new” alembic upgrade head alembic downgrade -1

Slide 56

Slide 56 text

Defining a schema? vi env.py Add: import myproj.model

Slide 57

Slide 57 text

Helper functions? Put your helper functions in a custom library and add this to env.py: import myproj.migrations

Slide 58

Slide 58 text

Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema): return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None

Slide 59

Slide 59 text

Manage User Defined Functions? Chose to use raw SQL files 3 directories, 128 files: procs/ types/ views/ codepath = '/socorro/external/pg/raw_sql/procs' def load_stored_proc(op, filelist): app_path = os.getcwd() + codepath for filename in filelist: sqlfile = app_path + filename with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())

Slide 60

Slide 60 text

Stamping database revision? from alembic.config import Config from alembic import command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "head")

Slide 61

Slide 61 text

Part 4: Lessons Learned

Slide 62

Slide 62 text

Always roll forward. 1. Put migrations in a separate commit from schema changes. 2. Revert commits for schema change, leave migration commit in-place for downgrade support.

Slide 63

Slide 63 text

Store schema objects in the smallest, reasonable, composable unit. 1. Use an ORM for core schema. 2. Put types, UDFs and views in separate files. 3. Consider storing the schema in a separate repo from the application.

Slide 64

Slide 64 text

Write tests. Run them every time. 1. Write a simple tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.

Slide 65

Slide 65 text

Part 5: What Alembic could learn

Slide 66

Slide 66 text

1. Understand partitions 2. Never apply a DEFAULT to a new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences

Slide 67

Slide 67 text

1. Understand partitions 2. Never apply a DEFAULT to a new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences