$30 off During Our Annual Pro Sale. View Details »

Sane Schema Management with Alembic and SQLAlchemy by Selena Deckelmann

PyCon 2014
April 12, 2014

Sane Schema Management with Alembic and SQLAlchemy by Selena Deckelmann

PyCon 2014

April 12, 2014
Tweet

More Decks by PyCon 2014

Other Decks in Technology

Transcript

  1. Sane Schema Management with
    Alembic and SQLAlchemy
    Selena Deckelmann
    Data Architect, Mozilla
    @selenamarie
    http://chesnok.com

    View Slide

  2. Slides available at:
    https://speakerdeck.com/selenamarie/sane-
    schema-management-with-alembic

    View Slide

  3. Thanks and apologies to Mike Bayer

    View Slide

  4. I work on Socorro.
    http://github.com/mozilla/socorro
    http://crash-stats.mozilla.com

    View Slide

  5. View Slide

  6. Database systems resist change.

    View Slide

  7. Evolution of schema change process

    View Slide

  8. What's sane schema management?
    Executing schema change in a controled,
    repeatable way while working with
    developers and operations.

    View Slide

  9. Migration tools are really
    configuration management tools.

    View Slide

  10. Migrations are for:
    ● Communicating change
    ● Communicating process
    ● Executing change in a controled,
    repeatable way with developers and
    operations

    View Slide

  11. What's alembic?
    Alembic is a schema migration tool that
    integrates with SQLAlchemy.

    View Slide

  12. My assumptions:
    ● Schema migrations are frequent.
    ● Automated schema migration is a goal.
    ● Stage environment is enough like
    production for testing.
    ● Writing a small amount of code is ok.

    View Slide

  13. No tool is perfect.
    DBAs should drive migration tool
    choice.
    Chose a tool that DBAs and developers
    like.
    Or, don't hate.

    View Slide

  14. Part 0: #dbaproblems
    Part 1: Picking the right migration tool
    Part 2: Using Alembic
    Part 3: Lessons Learned
    Part 4: Things Alembic could learn

    View Slide

  15. Part 0: #dbaproblems

    View Slide

  16. View Slide

  17. Drop and replace
    partition constraints
    that lack
    TIMESTAMP WITH TIMEZONE

    View Slide

  18. Migrations are hard.
    And messy.
    And necessary.

    View Slide

  19. Changing a CHECK constraint
    on 1000+ partitions.
    http://tinyurl.com/q5cjh45

    View Slide

  20. What sucked about this:
    ● Wasn't the first time (see 2012 bugs)
    ● Change snuck into partitioning UDF
    Jan-April 2013
    ● No useful audit trail
    ● Some partitions affected, not others
    ● Error dated back to 2010
    ● Wake up call to examine process!

    View Slide

  21. Process before Alembic:

    View Slide

  22. View Slide

  23. View Slide

  24. What was awesome:
    ● Used Alembic to manage the change
    ● Tested in stage
    ● Experimentation revealed which
    partitions could be modified without
    deadlocking
    ● Rolled out change with a regular release
    during normal business hours

    View Slide

  25. Process with Alembic:
    1. Make changes to model.py or
    raw_sql files
    2. Run: alembic revision –-auto-generate
    3. Edit revision file
    4.Commit changes
    5. Run migration on stage after
    auto-deploy of a release

    View Slide

  26. Process with Alembic:
    1. Make changes to model.py or
    raw_sql files
    2. Run: alembic revision -–auto-generate
    3. Edit revision file
    4.Commit changes
    5. Run migration on stage after
    auto-deploy of a release
    5.Have jenkins run
    downgrade/upgrade as part of test
    suite.

    View Slide

  27. Problems Alembic solved:
    ● Easy-to-deploy migrations including
    UDFs for dev and stage
    ● Can embed raw SQL, issue multi-
    commit changes
    ● Includes downgrades

    View Slide

  28. Problems Alembic solved (continued):
    ● Enables database change discipline
    ● Enables code review discipline
    ● Revisions are decoupled from release
    versions and branch commit order

    View Slide

  29. Problems Alembic solved (continued):
    ● 100k+ lines of code removed
    ● No more post-deploy schema
    checkins
    ● Enabling a tested, automated stage
    deployment
    ● Separated schema definition from
    version-specific configuration

    View Slide

  30. Photo courtesy of secure.flickr.com/photos/lambj
    HAPPY
    AS A CAT IN A BOX

    View Slide

  31. Part 1: Picking the right migration tool

    View Slide

  32. View Slide

  33. Questions to ask:
    ● How often does your schema change?
    ● Can the migrations be run without you?
    ● Can you test a migration before you run
    it in production?

    View Slide

  34. Questions to ask:
    ● Can developers create a new schema
    without your help?
    ● How hard is it to get from an old
    schema to a new one using the tool?
    ● Are change rollbacks a standard use of
    the tool?

    View Slide

  35. What does our system need to do?
    ● Communicate change
    ● Apply changes in the correct order
    ● Apply a change only once
    ● Use raw SQL where needed
    ● Provide a single interface for change
    ● Rollback gracefully

    View Slide

  36. How DBAs are going to feel
    about the next slide:

    View Slide

  37. Use an ORM with the migration tool.

    View Slide

  38. Shameful admission:
    We had three different ways of defining
    schema in our code and tests.

    View Slide

  39. Having the ORM talk with your DBAs.

    View Slide

  40. A good ORM provides:
    ● One source of truth about the schema
    ● Reusable components
    ● Database version independence
    ● Ability to use raw SQL

    View Slide

  41. And good ORM stewardship:
    ● Fits with existing tooling and
    developer workflows
    ● Enables partnership with developers
    ● Integrates with a testing framework

    View Slide

  42. And:
    ● Gives you a new way to think about
    schemas
    ● Develops compassion for how
    horrible ORMs can be
    ● Gives you developer-friendly
    vocabulary for discussing why ORM-
    generated code is often terrible

    View Slide

  43. Part 2: Using Alembic

    View Slide

  44. Practical Guide to using Alembic
    http://tinyurl.com/po4mal6

    View Slide

  45. https://alembic.readthedocs.org
    Vocabulary:
    revision: a single migration
    down_revision: previous migration
    upgrade: apply 'upgrade' change
    downgrade: apply 'downgrade' change
    offline mode: emit raw SQL for a change

    View Slide

  46. Installing and using:
    $ virtualenv venv-alembic
    $ . venv-alembic/bin/activate
    $ pip install alembic
    $ alembic init
    $ vi alembic.ini
    $ alembic revision -m “new”
    $ alembic upgrade head
    $ alembic downgrade -1

    View Slide

  47. Defining a schema?
    vi env.py
    Add: import myproj.model

    View Slide

  48. Helper functions?
    Put your helper functions in a custom
    library and add this to env.py:
    import myproj.migrations

    View Slide

  49. Ignore certain schemas or partitions?
    In env.py:
    def include_symbol(tablename, schema):
    return schema in (None, "bixie") and
    re.search(r'_\d{8}$', tablename)
    is None

    View Slide

  50. Manage User Defined Functions?
    Chose to use raw SQL files
    3 directories, 128 files:
    procs/ types/ views/
    codepath = '/socorro/external/pg/raw_sql/procs'
    def load_stored_proc(op, filelist):
    app_path = os.getcwd() + codepath
    for filename in filelist:
    sqlfile = app_path + filename
    with open(myfile, 'r') as stored_proc:
    op.execute(stored_proc.read())

    View Slide

  51. Stamping database revision?
    from alembic.config import Config
    from alembic import command
    alembic_cfg =
    Config("/path/to/yourapp/alembic.ini")
    command.stamp(alembic_cfg, "head")

    View Slide

  52. Offline mode!
    alembic upgrade c1ac31c8fea -–sql
    alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

    View Slide

  53. Offline mode!
    alembic upgrade c1ac31c8fea -–sql
    alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

    View Slide

  54. Part 3: Lessons Learned

    View Slide

  55. Always roll forward.
    1. Put migrations in a separate commit
    from schema changes.
    2. Revert commits for schema change,
    leave migration commit in-place for
    downgrade support.

    View Slide

  56. Store schema objects in the smallest,
    reasonable, composable unit.
    1. Use an ORM for core schema.
    2. Put types, UDFs and views in separate
    files.
    3. Consider storing the schema in a
    separate repo from the application.

    View Slide

  57. Write tests. Run them every time.
    1. Write a simple tool to create a new
    schema from scratch.
    2. Write a simple tool to generate fake
    data.
    3. Write tests for these tools.
    4.When anything fails, add a test.

    View Slide

  58. Write tests. Run them every time.
    1. Write a simple tool to create a new
    schema from scratch.
    2. Write a simple tool to generate fake
    data.
    3. Write tests for these tools.
    4.When anything fails, add a test.

    View Slide

  59. Part 4: What Alembic could learn

    View Slide

  60. 1. Understand partitions
    2. Never apply a DEFAULT to a new
    column
    3. Help us manage UDFs better
    4.INDEX CONCURRENTLY
    5. Prettier syntax for multi-commit
    sequences
    6.Support multiple branches in history

    View Slide

  61. Epilogue

    View Slide

  62. No tool is perfect.
    DBAs should drive migration tool
    choice.
    Chose a tool that your developers like.
    Or, don't hate.

    View Slide

  63. Other tools:
    Sqitch
    http://sqitch.org/
    Written by PostgreSQL contributor
    Erwin
    http://erwin.com/
    Commercial, popular with Oracle
    South
    http://south.aeracode.org/
    Django-specific, well-supported

    View Slide

  64. Alembic resources:
    bitbucket.org/zzzeek/alembic
    alembic.readthedocs.org
    groups.google.com/group/sqlalchemy-alembic

    View Slide

  65. Sane Schema Management with
    Alembic and SQLAlchemy
    Selena Deckelmann
    Mozilla
    @selenamarie
    chesnok.com
    https://speakerdeck.com/selenamarie/sane-
    schema-management-with-alembic

    View Slide