Sane Schema Management with Alembic and SQLAlchemy by Selena Deckelmann

D21717ea76044d31115c573d368e6ff4?s=47 PyCon 2014
April 12, 2014

Sane Schema Management with Alembic and SQLAlchemy by Selena Deckelmann

D21717ea76044d31115c573d368e6ff4?s=128

PyCon 2014

April 12, 2014
Tweet

Transcript

  1. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Data

    Architect, Mozilla @selenamarie http://chesnok.com
  2. Slides available at: https://speakerdeck.com/selenamarie/sane- schema-management-with-alembic

  3. Thanks and apologies to Mike Bayer

  4. I work on Socorro. http://github.com/mozilla/socorro http://crash-stats.mozilla.com

  5. None
  6. Database systems resist change.

  7. Evolution of schema change process

  8. What's sane schema management? Executing schema change in a controled,

    repeatable way while working with developers and operations.
  9. Migration tools are really configuration management tools.

  10. Migrations are for: • Communicating change • Communicating process •

    Executing change in a controled, repeatable way with developers and operations
  11. What's alembic? Alembic is a schema migration tool that integrates

    with SQLAlchemy.
  12. My assumptions: • Schema migrations are frequent. • Automated schema

    migration is a goal. • Stage environment is enough like production for testing. • Writing a small amount of code is ok.
  13. No tool is perfect. DBAs should drive migration tool choice.

    Chose a tool that DBAs and developers like. Or, don't hate.
  14. Part 0: #dbaproblems Part 1: Picking the right migration tool

    Part 2: Using Alembic Part 3: Lessons Learned Part 4: Things Alembic could learn
  15. Part 0: #dbaproblems

  16. None
  17. Drop and replace partition constraints that lack TIMESTAMP WITH TIMEZONE

  18. Migrations are hard. And messy. And necessary.

  19. Changing a CHECK constraint on 1000+ partitions. http://tinyurl.com/q5cjh45

  20. What sucked about this: • Wasn't the first time (see

    2012 bugs) • Change snuck into partitioning UDF Jan-April 2013 • No useful audit trail • Some partitions affected, not others • Error dated back to 2010 • Wake up call to examine process!
  21. Process before Alembic:

  22. None
  23. None
  24. What was awesome: • Used Alembic to manage the change

    • Tested in stage • Experimentation revealed which partitions could be modified without deadlocking • Rolled out change with a regular release during normal business hours
  25. Process with Alembic: 1. Make changes to model.py or raw_sql

    files 2. Run: alembic revision –-auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release
  26. Process with Alembic: 1. Make changes to model.py or raw_sql

    files 2. Run: alembic revision -–auto-generate 3. Edit revision file 4.Commit changes 5. Run migration on stage after auto-deploy of a release 5.Have jenkins run downgrade/upgrade as part of test suite.
  27. Problems Alembic solved: • Easy-to-deploy migrations including UDFs for dev

    and stage • Can embed raw SQL, issue multi- commit changes • Includes downgrades
  28. Problems Alembic solved (continued): • Enables database change discipline •

    Enables code review discipline • Revisions are decoupled from release versions and branch commit order
  29. Problems Alembic solved (continued): • 100k+ lines of code removed

    • No more post-deploy schema checkins • Enabling a tested, automated stage deployment • Separated schema definition from version-specific configuration
  30. Photo courtesy of secure.flickr.com/photos/lambj HAPPY AS A CAT IN A

    BOX
  31. Part 1: Picking the right migration tool

  32. None
  33. Questions to ask: • How often does your schema change?

    • Can the migrations be run without you? • Can you test a migration before you run it in production?
  34. Questions to ask: • Can developers create a new schema

    without your help? • How hard is it to get from an old schema to a new one using the tool? • Are change rollbacks a standard use of the tool?
  35. What does our system need to do? • Communicate change

    • Apply changes in the correct order • Apply a change only once • Use raw SQL where needed • Provide a single interface for change • Rollback gracefully
  36. How DBAs are going to feel about the next slide:

  37. Use an ORM with the migration tool.

  38. Shameful admission: We had three different ways of defining schema

    in our code and tests.
  39. Having the ORM talk with your DBAs.

  40. A good ORM provides: • One source of truth about

    the schema • Reusable components • Database version independence • Ability to use raw SQL
  41. And good ORM stewardship: • Fits with existing tooling and

    developer workflows • Enables partnership with developers • Integrates with a testing framework
  42. And: • Gives you a new way to think about

    schemas • Develops compassion for how horrible ORMs can be • Gives you developer-friendly vocabulary for discussing why ORM- generated code is often terrible
  43. Part 2: Using Alembic

  44. Practical Guide to using Alembic http://tinyurl.com/po4mal6

  45. https://alembic.readthedocs.org Vocabulary: revision: a single migration down_revision: previous migration upgrade:

    apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change
  46. Installing and using: $ virtualenv venv-alembic $ . venv-alembic/bin/activate $

    pip install alembic $ alembic init $ vi alembic.ini $ alembic revision -m “new” $ alembic upgrade head $ alembic downgrade -1
  47. Defining a schema? vi env.py Add: import myproj.model

  48. Helper functions? Put your helper functions in a custom library

    and add this to env.py: import myproj.migrations
  49. Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema):

    return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None
  50. Manage User Defined Functions? Chose to use raw SQL files

    3 directories, 128 files: procs/ types/ views/ codepath = '/socorro/external/pg/raw_sql/procs' def load_stored_proc(op, filelist): app_path = os.getcwd() + codepath for filename in filelist: sqlfile = app_path + filename with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())
  51. Stamping database revision? from alembic.config import Config from alembic import

    command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "head")
  52. Offline mode! alembic upgrade c1ac31c8fea -–sql alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

  53. Offline mode! alembic upgrade c1ac31c8fea -–sql alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

  54. Part 3: Lessons Learned

  55. Always roll forward. 1. Put migrations in a separate commit

    from schema changes. 2. Revert commits for schema change, leave migration commit in-place for downgrade support.
  56. Store schema objects in the smallest, reasonable, composable unit. 1.

    Use an ORM for core schema. 2. Put types, UDFs and views in separate files. 3. Consider storing the schema in a separate repo from the application.
  57. Write tests. Run them every time. 1. Write a simple

    tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
  58. Write tests. Run them every time. 1. Write a simple

    tool to create a new schema from scratch. 2. Write a simple tool to generate fake data. 3. Write tests for these tools. 4.When anything fails, add a test.
  59. Part 4: What Alembic could learn

  60. 1. Understand partitions 2. Never apply a DEFAULT to a

    new column 3. Help us manage UDFs better 4.INDEX CONCURRENTLY 5. Prettier syntax for multi-commit sequences 6.Support multiple branches in history
  61. Epilogue

  62. No tool is perfect. DBAs should drive migration tool choice.

    Chose a tool that your developers like. Or, don't hate.
  63. Other tools: Sqitch http://sqitch.org/ Written by PostgreSQL contributor Erwin http://erwin.com/

    Commercial, popular with Oracle South http://south.aeracode.org/ Django-specific, well-supported
  64. Alembic resources: bitbucket.org/zzzeek/alembic alembic.readthedocs.org groups.google.com/group/sqlalchemy-alembic

  65. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla

    @selenamarie chesnok.com https://speakerdeck.com/selenamarie/sane- schema-management-with-alembic