Slide 1

Slide 1 text

Andrew Godwin @andrewgodwin DATABASES SCHEMAS in an agile world &

Slide 2

Slide 2 text

Andrew Godwin Core Developer Senior Engineer

Slide 3

Slide 3 text

Schemas Explicit & Implicit

Slide 4

Slide 4 text

Explicit Schema ID int Name text Weight uint 1 2 3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }

Slide 5

Slide 5 text

Silent Failure { "id": 342, "name": "David", "weight": 74, } { "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }

Slide 6

Slide 6 text

Schemas inform Storage

Slide 7

Slide 7 text

PostgreSQL

Slide 8

Slide 8 text

Adding NULLable columns: instant But must be at end of table

Slide 9

Slide 9 text

CREATE INDEX CONCURRENTLY Slower, and only one at a time

Slide 10

Slide 10 text

Constraints after column addition This is more general advice

Slide 11

Slide 11 text

MySQL Locks whole table Rewrites entire storage No DDL transactions

Slide 12

Slide 12 text

Oracle / MSSQL / etc. Look into their strengths

Slide 13

Slide 13 text

Workflows Databases aren't code...

Slide 14

Slide 14 text

You can't put your database in a VCS You can put your schema in a VCS But your data won't always survive.

Slide 15

Slide 15 text

Django Migrations Codified schema change format

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Migrations aren't enough You can't automate away a social problem!

Slide 18

Slide 18 text

What if we got rid of the schema? That pesky, pesky schema.

Slide 19

Slide 19 text

The Nesting Problem { "id": 123, "name": "Andrew", "friends": [ {"id": 456, "name": "David"}, {"id": 789, "name": "Mazz"}, ], "likes": [ {"id": 22, "liker": {"id": 789, "name", "Mazz"}}, ], }

Slide 20

Slide 20 text

You don't have to use a document DB (like CouchDB, MongoDB, etc.)

Slide 21

Slide 21 text

Schemaless Columns ID int Name text Weight uint Data json 1 Alice 76 { "nickname": "Al", "bgcolor": "#ff0033" }

Slide 22

Slide 22 text

But that must be slower... Right?

Slide 23

Slide 23 text

Comparison (never trust benchmarks) Loading 1.2 million records PostgreSQL MongoDB 76 sec 8 min Sequential scan PostgreSQL MongoDB 980 ms 980 ms Index scan (Postgres GINhash) PostgreSQL MongoDB 0.7 ms 1 ms

Slide 24

Slide 24 text

Reasonable queries SELECT id, title FROM articles WHERE attributes->'author'->>'first_name' = 'cory'

Slide 25

Slide 25 text

A hybrid solution Normal columns for more static data (e.g. id, title) Schemaless blobs for variable data (e.g. styling)

Slide 26

Slide 26 text

Lessons

Slide 27

Slide 27 text

Schemas are your friend Explicit definitions or checks will save you

Slide 28

Slide 28 text

Read only mode It makes DB downtime more palatable

Slide 29

Slide 29 text

Work to your DBs strengths It's not just a dumb data store

Slide 30

Slide 30 text

Coordinate your team A little coorindation pays big dividends

Slide 31

Slide 31 text

Try hybrid schemas Particularly good for CMSs or enterprise software

Slide 32

Slide 32 text

Thanks! Andrew Godwin @andrewgodwin eventbrite.com/jobs are hiring: