A talk I gave at CRAFT 2014 in Budapest
Andrew Godwin@andrewgodwinDATABASESSCHEMASin an agile world&
View Slide
Andrew GodwinCore DeveloperSenior Engineer
SchemasExplicit & Implicit
Explicit SchemaID int Name text Weight uint123AliceBobCharles768465Implicit Schema{"id": 342,"name": "David","weight": 44,}
Silent Failure{"id": 342,"name": "David","weight": 74,}{"id": 342,"name": "Ellie","weight": "85kg",}{"id": 342,"nom": "Frankie","weight": 77,}{"id": 342,"name": "Frankie","weight": -67,}
SchemasinformStorage
PostgreSQL
Adding NULLable columns: instantBut must be at end of table
CREATE INDEX CONCURRENTLYSlower, and only one at a time
Constraints after column additionThis is more general advice
MySQLLocks whole tableRewrites entire storageNo DDL transactions
Oracle / MSSQL / etc.Look into their strengths
WorkflowsDatabases aren't code...
You can't put your database in a VCSYou can put your schema in a VCSBut your data won't always survive.
Django MigrationsCodified schema change format
Migrations aren't enoughYou can't automate away a social problem!
What if we got rid of the schema?That pesky, pesky schema.
The Nesting Problem{"id": 123,"name": "Andrew","friends": [{"id": 456, "name": "David"},{"id": 789, "name": "Mazz"},],"likes": [{"id": 22, "liker": {"id": 789, "name", "Mazz"}},],}
You don't have to use a document DB(like CouchDB, MongoDB, etc.)
Schemaless ColumnsID int Name text Weight uint Data json1 Alice 76 { "nickname": "Al","bgcolor": "#ff0033" }
But that must be slower...Right?
Comparison(never trust benchmarks)Loading 1.2 million recordsPostgreSQLMongoDB76 sec8 minSequential scanPostgreSQLMongoDB980 ms980 msIndex scan (Postgres GINhash)PostgreSQLMongoDB0.7 ms1 ms
Reasonable queriesSELECT id, title FROM articlesWHERE attributes->'author'->>'first_name' = 'cory'
A hybrid solutionNormal columns for more static data (e.g. id, title)Schemaless blobs for variable data (e.g. styling)
Lessons
Schemas are your friendExplicit definitions or checks will save you
Read only modeIt makes DB downtime more palatable
Work to your DBs strengthsIt's not just a dumb data store
Coordinate your teamA little coorindation pays big dividends
Try hybrid schemasParticularly good for CMSs or enterprise software
Thanks!Andrew Godwin@andrewgodwineventbrite.com/jobsare hiring: