Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sane Schema Management with Alembic

Sane Schema Management with Alembic

Presented at Postgres Open 2013, Sept 16-18 in Chicago

Working with developers on schema migrations is a perennial challenge for DBAs and developers. Devs tend to like a "set it and forget it" tool. Very few of those tools work well with distributed teams, continuous integration or situations which require raw SQL.

This talk discusses the strategies used to move from a pure SQL and shell migration system to using an ORM and alembic for maximum DBA and developer happiness.

I'll share tips for DBAs and developers in bringing schema elements into the mainstream revision control system, and making it reasonable and easy to get code review on stored procedures from the entire team. Whether you're a Python shop looking for a better migration tool, or a Java-based team looking for something more native to Postgres, you'll get useful insight into the key features your next schema migration system should support.

Our environment is a 2+TB PostgreSQL cluster running 9.2. We have a rapidly evolving schema and reporting system, and over 150 user defined functions in plpgsql. We also support custom types, DOMAINs and data types not supported yet by SQLAlchemy.

Migrating from pure SQL to Python models was the first challenge. Creating a reasonable system for managing the UDFs was the second. The final challenge has been training developers to create and manage their own migrations (called "revisions" in alembic).

Selena Deckelmann

September 17, 2013
Tweet

More Decks by Selena Deckelmann

Other Decks in Programming

Transcript

  1. What's sane schema management? Executing schema change in a controled,

    repeatable way while working with developers and operations.
  2. My assumptions: • Schema migrations are frequent. • Automated schema

    migration is a goal. • Stage environment is enough like production for testing. • Writing a small amount of code is ok.
  3. No tool is perfect. DBAs should drive migration tool choice.

    Chose a tool that your developers like. Or, don't hate.
  4. Part 0: #dbaproblems Part 1: Why we should work with

    developers on migrations Part 2: Picking the right migration tool Part 3: Using Alembic Part 4: Lessons Learned Part 5: Things Alembic could learn
  5. What sucked about this: • Wasn't the first time (see

    2012 bugs) • Change snuck into partitioning UDF Jan-April 2013 • No useful audit trail • Some partitions affected, not others • Error dated back to 2010 • Wake up call to examine process!
  6. What was awesome: • Used Alembic to manage the change

    • Tested in stage • Experimentation revealed which partitions could be modified without deadlocking • Rolled out change with a regular release during normal business hours
  7. Process with Alembic: 1. Make changes to model.py or raw_sql

    files 2. Run: alembic revision –-auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
  8. Process with Alembic: 1. Make changes to model.py or raw_sql

    files 2. Run: alembic revision -–auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
  9. Problems Alembic solved: • Easy-to-deploy migrations including UDFs for dev

    and stage • Can embed raw SQL, issue multi- commit changes • Includes downgrades
  10. Problems Alembic solved: • Enables database change discipline • Enables

    code review discipline • Revisions are decoupled from release versions and branch commit order
  11. Problems Alembic solved (continued): • 100k+ lines of code removed

    • No more post-deploy schema checkins • Enabling a tested, automated stage deployment • Separated schema definition from version-specific configuration
  12. Schemas, what are they good for? Signal intent Communicate ideal

    state of data Highly customizable in Postgres
  13. Schemas, what are they not so good for? Rapid iteration

    Documenting evolution Major changes on big data Data experimentation
  14. Database systems resist change because: Exist at the center of

    multiple systems Stability is a core competency Schema often is the only API between components
  15. We're in charge of picking up the pieces when a

    poorly-executed schema change plan fails.
  16. How do we safely make changes to schemas? Process and

    tooling. Preferably, that we choose and implement.
  17. Migrations are for: • Communicating change • Communicating process •

    Executing change in a controled, repeatable way with developers and operations
  18. Questions to ask: • How often does your schema change?

    • Can the migrations be run without you? • Can you test a migration before you run it in production?
  19. Questions to ask: • Can developers create a new schema

    without your help? • How hard is it to get from an old schema to a new one using the tool? • Are change rollbacks a standard use of the tool?
  20. What does our system need to do? • Communicate change

    • Apply changes in the correct order • Apply a change only once • Use raw SQL where needed • Provide a single interface for change • Rollback gracefully
  21. A good ORM provides: • One source of truth about

    the schema • Reusable components • Database version independence • Ability to use raw SQL
  22. And good ORM stewardship: • Fits with existing tooling and

    developer workflows • Enables partnership with developers • Integrates with a testing framework
  23. And: • Gives you a new way to think about

    schemas • Develops compassion for how horrible ORMs can be • Gives you developer-friendly vocabulary for discussing why ORM- generated code is often terrible
  24. https://alembic.readthedocs.org revision: a single migration down_revision: previous migration upgrade: apply

    'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change
  25. Installing and using: virtualenv venv-alembic . venv-alembic/bin/activate pip install alembic

    alembic init vi alembic.ini alembic revision -m “new” alembic upgrade head alembic downgrade -1
  26. Helper functions? Put your helper functions in a custom library

    and add this to env.py: import myproj.migrations
  27. Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema):

    return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None
  28. Manage User Defined Functions? Chose to use raw SQL files

    3 directories, 128 files: procs/ types/ views/ codepath = '/socorro/external/pg/raw_sql/procs' def load_stored_proc(op, filelist): app_path = os.getcwd() + codepath for filename in filelist: sqlfile = app_path + filename with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())
  29. Stamping database revision? from alembic.config import Config from alembic import

    command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "head")
  30. Always roll forward. 1. Put migrations in a separate commit

    from schema changes. 2. Revert commits for schema change, leave migration commit in-place for downgrade support.
  31. Store schema objects in the smallest, reasonable, composable unit. 1.

    Use an ORM for core schema. 2. Put types, UDFs and views in separate files. 3. Consider storing the schema in a separate repo from the application.
  32. Write tests. Run them every time. 1. Write a simple

    tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
  33. 1. Understand partitions 2. Never apply a DEFAULT to a

    new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences
  34. 1. Understand partitions 2. Never apply a DEFAULT to a

    new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences