Slide 1

Slide 1 text

Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie http://www.whitecells.org

Slide 2

Slide 2 text

AMA ● Please, ask questions. ● Suggestions: Postgres, DB operations, Python app development, Mozilla webdev, GSoC, Outreachy (formerly OPW), Open Source/Free Software, Vim vs Emacs, memes involving setting things on fire, release engineering, messy diagrams, power lifting ● Can I get a scribe volunteer?

Slide 3

Slide 3 text

I work on Socorro. http://github.com/mozilla/socorro http://crash-stats.mozilla.com PyLadies PDX, Founder Python Software Foundation, Director Ada Initiative, Advisor PostgreSQL, Major Contributor

Slide 4

Slide 4 text

What's alembic? Alembic is a schema migration tool that integrates with SQLAlchemy.

Slide 5

Slide 5 text

Database systems resist change.

Slide 6

Slide 6 text

Evolution of schema change process

Slide 7

Slide 7 text

What's sane schema management? Executing schema change in a controlled, repeatable way while working with developers and operations.

Slide 8

Slide 8 text

Migration tools are really configuration management tools.

Slide 9

Slide 9 text

Migrations are for: ● Communicating change ● Communicating process ● Executing change in a controlled, repeatable way with developers and operations

Slide 10

Slide 10 text

My environment: ● Schema migrations are frequent. ● Automated schema migration is a goal. ● Stage environment is enough like production for testing. ● Writing a small amount of code is ok.

Slide 11

Slide 11 text

Part 0: #dbaproblems Part 1: Picking the right migration tool Part 2: Using Alembic Part 3: Lessons Learned Part 4: Things Alembic could learn

Slide 12

Slide 12 text

Process before Alembic:

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Process with Alembic: 1.Make changes to model.py or raw_sql files 2.Run: alembic revision –-auto-generate 3.Edit revision file 4.Commit changes 5.Run migration after auto-deploy of a release

Slide 16

Slide 16 text

Process with Alembic: 1.Make changes to model.py or raw_sql files 2.Run: alembic revision -–auto- generate 3.Edit revision file 4.Commit changes 5.Run migration after auto-deploy of a release 5.Have jenkins and travis-ci run downgrade/upgrade as part of test suite. 6.Run migration automatically.

Slide 17

Slide 17 text

Problems Alembic solved: ● Easy-to-deploy changes ● Can embed raw SQL, and issue multi-commit changes ● Includes downgrades

Slide 18

Slide 18 text

Problems Alembic solved (continued): ● Enables database change discipline ● Enables code review discipline ● Revisions are decoupled from release versions and branch commit order

Slide 19

Slide 19 text

Problems Alembic solved (continued): ● 100k+ lines of code removed ● No more post-deploy schema checkins ● Enabling a tested, automated stage deployment ● Separated schema definition from version-specific configuration

Slide 20

Slide 20 text

Photo courtesy of secure.flickr.com/photos/lambj HAPPY AS A CAT IN A BOX

Slide 21

Slide 21 text

Shameful admission: We had three different ways of defining schema in our code and tests.

Slide 22

Slide 22 text

Why use an ORM?

Slide 23

Slide 23 text

A good ORM provides: ● Schema defined in one place ● Reusable components ● Integration with useful tools ● Database version independence ● Ability to use raw SQL

Slide 24

Slide 24 text

And good ORM stewardship: ● Fits with developer workflows ● Enables partnership with developers ● Integrates with a testing framework

Slide 25

Slide 25 text

And: ● Gives you a new way to think about schemas ● Develops compassion for how horrible ORMs can be ● Gives you developer-friendly vocabulary for discussing why ORM-generated code is often terrible

Slide 26

Slide 26 text

Part 2: Using Alembic

Slide 27

Slide 27 text

Practical Guide to using Alembic http://tinyurl.com/po4mal6

Slide 28

Slide 28 text

https://alembic.readthedocs.org Vocabulary: revision: a single migration down_revision: previous migration upgrade: apply 'upgrade' change downgrade: apply 'downgrade' change offline mode: emit raw SQL for a change

Slide 29

Slide 29 text

Installing and using: $ pip install alembic $ alembic init $ vi alembic.ini $ alembic revision -m “new” $ alembic upgrade heads $ alembic downgrade -1

Slide 30

Slide 30 text

Defining a schema? vi env.py Add: import myproj.model

Slide 31

Slide 31 text

Helper functions? Put your helper functions in a custom library and add this to env.py: import myproj.migrations

Slide 32

Slide 32 text

Ignore certain schemas or partitions? In env.py: def include_symbol(tablename, schema): return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None

Slide 33

Slide 33 text

Manage User Defined Functions? Chose to use raw SQL files 3 directories, 128 files: procs/ types/ views/ def load_stored_proc(op, filelist): procs_dir = os.path.normpath(os.path.join( __file__, '../../', 'external/postgresql/raw_sql/procs' )) for filename in filelist: sqlfile = os.path.join(sqlfile,filename) with open(myfile, 'r') as stored_proc: op.execute(stored_proc.read())

Slide 34

Slide 34 text

Stamping database revision? from alembic.config import Config from alembic import command alembic_cfg = Config("/path/to/yourapp/alembic.ini") command.stamp(alembic_cfg, "heads")

Slide 35

Slide 35 text

Offline mode! alembic upgrade c1ac31c8fea -–sql alembic downgrade c1ac31c8fea:491cdcf9f97c --sql

Slide 36

Slide 36 text

Demo with Socorro? Flask? https://flask- migrate.readthedocs.org/en/latest/

Slide 37

Slide 37 text

Part 3: Lessons Learned

Slide 38

Slide 38 text

Always roll forward. 1.Put migrations in a separate commit from schema changes. 2.Revert commits for schema change, leave migration commit in-place for downgrade support.

Slide 39

Slide 39 text

Store schema objects in the smallest, reasonable, composable unit. 1.Use an ORM for core schema. 2.Put types, UDFs and views in separate files. 3.Consider storing the schema in a separate repo from the application.

Slide 40

Slide 40 text

Write tests. Run them every time. 1.Write a simple tool to create a new schema from scratch. 2.Write a simple tool to generate fake data. 3.Write tests for these tools. 4.When anything fails, add a test.

Slide 41

Slide 41 text

Part 4: What Alembic could learn

Slide 42

Slide 42 text

1.Understand partitions 2.Never apply a DEFAULT to a new column 3.Help us manage UDFs better 4.INDEX CONCURRENTLY 5.Prettier syntax for multi-commit sequences

Slide 43

Slide 43 text

Other tools: Sqitch http://sqitch.org/ Written by PostgreSQL contributor Erwin http://erwin.com/ Commercial, popular with Oracle South http://south.aeracode.org/ Django-specific, well-supported

Slide 44

Slide 44 text

Alembic resources: bitbucket.org/zzzeek/alembic alembic.readthedocs.org groups.google.com/group/sqlalchemy- alembic

Slide 45

Slide 45 text

Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie http://www.whitecells.org https://speakerdeck.com/selenamarie/sane -schema-management-with-alembic