Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Alembic and SQLAlchemy: sane schema management

Alembic and SQLAlchemy: sane schema management

A shorter intro to schema management using alembic for Code Fellows

Selena Deckelmann

February 27, 2015

More Decks by Selena Deckelmann

Other Decks in Technology


  1. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla

    @selenamarie http://www.whitecells.org
  2. AMA • Please, ask questions. • Suggestions: Postgres, DB operations,

    Python app development, Mozilla webdev, GSoC, Outreachy (formerly OPW), Open Source/Free Software, Vim vs Emacs, memes involving setting things on fire, release engineering, messy diagrams, power lifting • Can I get a scribe volunteer?
  3. I work on Socorro. http://github.com/mozilla/socorro http://crash-stats.mozilla.com PyLadies PDX, Founder Python

    Software Foundation, Director Ada Initiative, Advisor PostgreSQL, Major Contributor
  4. What's alembic? Alembic is a schema migration tool that integrates

    with SQLAlchemy.
  5. Database systems resist change.

  6. Evolution of schema change process

  7. What's sane schema management? Executing schema change in a controlled,

    repeatable way while working with developers and operations.
  8. Migration tools are really configuration management tools.

  9. Migrations are for: • Communicating change • Communicating process •

    Executing change in a controlled, repeatable way with developers and operations
  10. My environment: • Schema migrations are frequent. • Automated schema

    migration is a goal. • Stage environment is enough like production for testing. • Writing a small amount of code is ok.
  11. Part 0: #dbaproblems Part 1: Picking the right migration tool

    Part 2: Using Alembic Part 3: Lessons Learned Part 4: Things Alembic could learn
  12. Process before Alembic:

  13. None
  14. None
  15. Process with Alembic: 1.Make changes to model.py or raw_sql files

    2.Run: alembic revision –-auto-generate 3.Edit revision file 4.Commit changes 5.Run migration after auto-deploy of a release
  16. Process with Alembic: 1.Make changes to model.py or raw_sql files

    2.Run: alembic revision -–auto- generate 3.Edit revision file 4.Commit changes 5.Run migration after auto-deploy of a release 5.Have jenkins and travis-ci run downgrade/upgrade as part of test suite. 6.Run migration automatically.
  17. Problems Alembic solved: • Easy-to-deploy changes • Can embed raw

    SQL, and issue multi-commit changes • Includes downgrades
  18. Problems Alembic solved (continued): • Enables database change discipline •

    Enables code review discipline • Revisions are decoupled from release versions and branch commit order
  19. Problems Alembic solved (continued): • 100k+ lines of code removed

    • No more post-deploy schema checkins • Enabling a tested, automated stage deployment • Separated schema definition from version-specific configuration
  20. Photo courtesy of secure.flickr.com/photos/lambj HAPPY AS A CAT IN A

  21. Shameful admission: We had three different ways of defining schema

    in our code and tests.
  22. Why use an ORM?

  23. A good ORM provides: • Schema defined in one place

    • Reusable components • Integration with useful tools • Database version independence • Ability to use raw SQL
  24. And good ORM stewardship: • Fits with developer workflows •

    Enables partnership with developers • Integrates with a testing framework
  25. And: • Gives you a new way to think about

    schemas • Develops compassion for how horrible ORMs can be • Gives you developer-friendly vocabulary for discussing why ORM-generated code is often terrible
  26. Part 2: Using Alembic

  27. Practical Guide to using Alembic http://tinyurl.com/po4mal6

  28. https://alembic.readthedocs.org Vocabulary: revision: a single migration down_revision: previous migration upgrade:

    apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change
  29. Installing and using: $ pip install alembic $ alembic init

    $ vi alembic.ini $ alembic revision -m “new” $ alembic upgrade heads $ alembic downgrade -1
  30. Defining a schema? vi env.py Add: import myproj.model

  31. Helper functions? Put your helper functions in a custom library

    and add this to env.py: import myproj.migrations
  32. Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema):

    return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None
  33. Manage User Defined Functions? Chose to use raw SQL files

    3 directories, 128 files: procs/ types/ views/ def load_stored_proc(op, filelist): procs_dir = os.path.normpath(os.path.join( __file__, '../../', 'external/postgresql/raw_sql/procs' )) for filename in filelist: sqlfile = os.path.join(sqlfile,filename) with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())
  34. Stamping database revision? from alembic.config import Config from alembic import

    command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "heads")
  35. Offline mode! alembic upgrade c1ac31c8fea -–sql alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

  36. Demo with Socorro? Flask? https://flask- migrate.readthedocs.org/en/latest/

  37. Part 3: Lessons Learned

  38. Always roll forward. 1.Put migrations in a separate commit from

    schema changes. 2.Revert commits for schema change, leave migration commit in-place for downgrade support.
  39. Store schema objects in the smallest, reasonable, composable unit. 1.Use

    an ORM for core schema. 2.Put types, UDFs and views in separate files. 3.Consider storing the schema in a separate repo from the application.
  40. Write tests. Run them every time. 1.Write a simple tool

    to create a new schema from scratch. 2.Write a simple tool to generate fake data. 3.Write tests for these tools. 4.When anything fails, add a test.
  41. Part 4: What Alembic could learn

  42. 1.Understand partitions 2.Never apply a DEFAULT to a new column

    3.Help us manage UDFs better 4.INDEX CONCURRENTLY 5.Prettier syntax for multi-commit sequences
  43. Other tools: Sqitch http://sqitch.org/ Written by PostgreSQL contributor Erwin http://erwin.com/

    Commercial, popular with Oracle South http://south.aeracode.org/ Django-specific, well-supported
  44. Alembic resources: bitbucket.org/zzzeek/alembic alembic.readthedocs.org groups.google.com/group/sqlalchemy- alembic

  45. Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla

    @selenamarie http://www.whitecells.org https://speakerdeck.com/selenamarie/sane -schema-management-with-alembic