Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sane Schema Management with Alembic

Sane Schema Management with Alembic

Presented at Postgres Open 2013, Sept 16-18 in Chicago

Working with developers on schema migrations is a perennial challenge for DBAs and developers. Devs tend to like a "set it and forget it" tool. Very few of those tools work well with distributed teams, continuous integration or situations which require raw SQL.

This talk discusses the strategies used to move from a pure SQL and shell migration system to using an ORM and alembic for maximum DBA and developer happiness.

I'll share tips for DBAs and developers in bringing schema elements into the mainstream revision control system, and making it reasonable and easy to get code review on stored procedures from the entire team. Whether you're a Python shop looking for a better migration tool, or a Java-based team looking for something more native to Postgres, you'll get useful insight into the key features your next schema migration system should support.

Our environment is a 2+TB PostgreSQL cluster running 9.2. We have a rapidly evolving schema and reporting system, and over 150 user defined functions in plpgsql. We also support custom types, DOMAINs and data types not supported yet by SQLAlchemy.

Migrating from pure SQL to Python models was the first challenge. Creating a reasonable system for managing the UDFs was the second. The final challenge has been training developers to create and manage their own migrations (called "revisions" in alembic).

Selena Deckelmann

September 17, 2013
Tweet

More Decks by Selena Deckelmann

Other Decks in Programming

Transcript

  1. Sane Schema Management with
    Alembic and SQLAlchemy
    Selena Deckelmann
    Mozilla
    @selenamarie
    chesnok.com

    View Slide

  2. I work on Socorro.
    http://github.com/mozilla/socorro
    http://crash-stats.mozilla.com

    View Slide

  3. Thanks and apologies to Mike Bayer

    View Slide

  4. What's sane schema management?
    Executing schema change in a controled,
    repeatable way while working with
    developers and operations.

    View Slide

  5. What's alembic?
    Alembic is a schema migration tool that
    integrates with SQLAlchemy.

    View Slide

  6. My assumptions:
    ● Schema migrations are frequent.
    ● Automated schema migration is a goal.
    ● Stage environment is enough like
    production for testing.
    ● Writing a small amount of code is ok.

    View Slide

  7. No tool is perfect.
    DBAs should drive migration tool
    choice.
    Chose a tool that your developers like.
    Or, don't hate.

    View Slide

  8. Part 0: #dbaproblems
    Part 1: Why we should work with developers
    on migrations
    Part 2: Picking the right migration tool
    Part 3: Using Alembic
    Part 4: Lessons Learned
    Part 5: Things Alembic could learn

    View Slide

  9. Part 0: #dbaproblems

    View Slide

  10. Migrations are hard.
    And messy.
    And necessary.

    View Slide

  11. View Slide

  12. Changing a CHECK constraint
    on 1000+ partitions.
    http://tinyurl.com/q5cjh45

    View Slide

  13. What sucked about this:
    ● Wasn't the first time (see 2012 bugs)
    ● Change snuck into partitioning UDF
    Jan-April 2013
    ● No useful audit trail
    ● Some partitions affected, not others
    ● Error dated back to 2010
    ● Wake up call to examine process!

    View Slide

  14. View Slide

  15. Process before Alembic:

    View Slide

  16. View Slide

  17. What was awesome:
    ● Used Alembic to manage the change
    ● Tested in stage
    ● Experimentation revealed which
    partitions could be modified without
    deadlocking
    ● Rolled out change with a regular release
    during normal business hours

    View Slide

  18. Process with Alembic:
    1. Make changes to model.py or
    raw_sql files
    2. Run: alembic revision –-auto-generate
    3. Edit revision file
    4.Commit changes
    5. Run migration on stage after
    auto-deploy of a release

    View Slide

  19. Process with Alembic:
    1. Make changes to model.py or
    raw_sql files
    2. Run: alembic revision -–auto-generate
    3. Edit revision file
    4.Commit changes
    5. Run migration on stage after
    auto-deploy of a release

    View Slide

  20. Problems Alembic solved:
    ● Easy-to-deploy migrations including
    UDFs for dev and stage
    ● Can embed raw SQL, issue multi-
    commit changes
    ● Includes downgrades

    View Slide

  21. Problems Alembic solved:
    ● Enables database change discipline
    ● Enables code review discipline
    ● Revisions are decoupled from release
    versions and branch commit order

    View Slide

  22. Problems Alembic solved (continued):
    ● 100k+ lines of code removed
    ● No more post-deploy schema
    checkins
    ● Enabling a tested, automated stage
    deployment
    ● Separated schema definition from
    version-specific configuration

    View Slide

  23. Photo courtesy of secure.flickr.com/photos/lambj
    HAPPY
    AS A CAT IN A BOX

    View Slide

  24. Part I: Why we should work
    with developers on migrations

    View Slide

  25. Credit: flickr.com/photos/chrisyarzab/

    View Slide

  26. Schemas change.

    View Slide

  27. Developers find this process really
    frustrating.

    View Slide

  28. Schemas, what are they good for?
    Signal intent
    Communicate ideal state of data
    Highly customizable in Postgres

    View Slide

  29. Schemas, what are they not so good for?
    Rapid iteration
    Documenting evolution
    Major changes on big data
    Data experimentation

    View Slide

  30. Database systems resist change.

    View Slide

  31. Database systems resist change because:
    Exist at the center of multiple systems
    Stability is a core competency
    Schema often is the only API between
    components

    View Slide

  32. How do we make changes to schemas?

    View Slide

  33. Because of resistance, we treat
    schema change as a one-off.

    View Slide

  34. Evolution of schema change process

    View Slide

  35. We're in charge of picking up the pieces when
    a poorly-executed schema change plan fails.

    View Slide

  36. Trick question:
    When is the right time to work with
    developers on a schema change?

    View Slide

  37. How do we safely make changes to schemas?

    View Slide

  38. How do we safely make changes to schemas?
    Process and tooling.
    Preferably, that we choose and implement.

    View Slide

  39. Migration tools are really
    configuration management tools.

    View Slide

  40. Migrations are for:
    ● Communicating change
    ● Communicating process
    ● Executing change in a controled,
    repeatable way with developers and
    operations

    View Slide

  41. Part 2: Picking the right migration tool

    View Slide

  42. View Slide

  43. Questions to ask:
    ● How often does your schema change?
    ● Can the migrations be run without you?
    ● Can you test a migration before you run
    it in production?

    View Slide

  44. Questions to ask:
    ● Can developers create a new schema
    without your help?
    ● How hard is it to get from an old
    schema to a new one using the tool?
    ● Are change rollbacks a standard use of
    the tool?

    View Slide

  45. What does our system need to do?
    ● Communicate change
    ● Apply changes in the correct order
    ● Apply a change only once
    ● Use raw SQL where needed
    ● Provide a single interface for change
    ● Rollback gracefully

    View Slide

  46. How you are going to feel
    about the next slide:

    View Slide

  47. Use an ORM with the migration tool.

    View Slide

  48. Shameful admission:
    We had three different ways of defining
    schema in our code and tests.

    View Slide

  49. A good ORM provides:
    ● One source of truth about the schema
    ● Reusable components
    ● Database version independence
    ● Ability to use raw SQL

    View Slide

  50. And good ORM stewardship:
    ● Fits with existing tooling and
    developer workflows
    ● Enables partnership with developers
    ● Integrates with a testing framework

    View Slide

  51. And:
    ● Gives you a new way to think about
    schemas
    ● Develops compassion for how
    horrible ORMs can be
    ● Gives you developer-friendly
    vocabulary for discussing why ORM-
    generated code is often terrible

    View Slide

  52. Part 3: Using Alembic

    View Slide

  53. Practical Guide to using Alembic
    http://tinyurl.com/po4mal6

    View Slide

  54. https://alembic.readthedocs.org
    revision: a single migration
    down_revision: previous migration
    upgrade: apply 'upgrade' change
    downgrade: apply 'downgrade' change
    offline mode: emit raw SQL for a change

    View Slide

  55. Installing and using:
    virtualenv venv-alembic
    . venv-alembic/bin/activate
    pip install alembic
    alembic init
    vi alembic.ini
    alembic revision -m “new”
    alembic upgrade head
    alembic downgrade -1

    View Slide

  56. Defining a schema?
    vi env.py
    Add: import myproj.model

    View Slide

  57. Helper functions?
    Put your helper functions in a custom
    library and add this to env.py:
    import myproj.migrations

    View Slide

  58. Ignore certain schemas or partitions?
    In env.py:
    def include_symbol(tablename, schema):
    return schema in (None, "bixie") and
    re.search(r'_\d{8}$', tablename)
    is None

    View Slide

  59. Manage User Defined Functions?
    Chose to use raw SQL files
    3 directories, 128 files:
    procs/ types/ views/
    codepath = '/socorro/external/pg/raw_sql/procs'
    def load_stored_proc(op, filelist):
    app_path = os.getcwd() + codepath
    for filename in filelist:
    sqlfile = app_path + filename
    with open(myfile, 'r') as stored_proc:
    op.execute(stored_proc.read())

    View Slide

  60. Stamping database revision?
    from alembic.config import Config
    from alembic import command
    alembic_cfg =
    Config("/path/to/yourapp/alembic.ini")
    command.stamp(alembic_cfg, "head")

    View Slide

  61. Part 4: Lessons Learned

    View Slide

  62. Always roll forward.
    1. Put migrations in a separate commit
    from schema changes.
    2. Revert commits for schema change,
    leave migration commit in-place for
    downgrade support.

    View Slide

  63. Store schema objects in the smallest,
    reasonable, composable unit.
    1. Use an ORM for core schema.
    2. Put types, UDFs and views in separate
    files.
    3. Consider storing the schema in a
    separate repo from the application.

    View Slide

  64. Write tests. Run them every time.
    1. Write a simple tool to create a new
    schema from scratch.
    2. Write a simple tool to generate fake
    data.
    3. Write tests for these tools.
    4.When anything fails, add a test.

    View Slide

  65. Part 5: What Alembic could learn

    View Slide

  66. 1. Understand partitions
    2. Never apply a DEFAULT to a new
    column
    3. Help us manage UDFs better
    4.INDEX CONCURRENTLY
    5. Prettier syntax for multi-commit
    sequences

    View Slide

  67. 1. Understand partitions
    2. Never apply a DEFAULT to a new
    column
    3. Help us manage UDFs better
    4.INDEX CONCURRENTLY
    5. Prettier syntax for multi-commit
    sequences

    View Slide