Schemas and Databases in an Agile World

Schemas and Databases in an Agile World

A talk I gave at CRAFT 2014 in Budapest

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

April 24, 2014
Tweet

Transcript

  1. Andrew Godwin @andrewgodwin DATABASES SCHEMAS in an agile world &

  2. Andrew Godwin Core Developer Senior Engineer

  3. Schemas Explicit & Implicit

  4. Explicit Schema ID int Name text Weight uint 1 2

    3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
  5. Silent Failure { "id": 342, "name": "David", "weight": 74, }

    { "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
  6. Schemas inform Storage

  7. PostgreSQL

  8. Adding NULLable columns: instant But must be at end of

    table
  9. CREATE INDEX CONCURRENTLY Slower, and only one at a time

  10. Constraints after column addition This is more general advice

  11. MySQL Locks whole table Rewrites entire storage No DDL transactions

  12. Oracle / MSSQL / etc. Look into their strengths

  13. Workflows Databases aren't code...

  14. You can't put your database in a VCS You can

    put your schema in a VCS But your data won't always survive.
  15. Django Migrations Codified schema change format

  16. None
  17. Migrations aren't enough You can't automate away a social problem!

  18. What if we got rid of the schema? That pesky,

    pesky schema.
  19. The Nesting Problem { "id": 123, "name": "Andrew", "friends": [

    {"id": 456, "name": "David"}, {"id": 789, "name": "Mazz"}, ], "likes": [ {"id": 22, "liker": {"id": 789, "name", "Mazz"}}, ], }
  20. You don't have to use a document DB (like CouchDB,

    MongoDB, etc.)
  21. Schemaless Columns ID int Name text Weight uint Data json

    1 Alice 76 { "nickname": "Al", "bgcolor": "#ff0033" }
  22. But that must be slower... Right?

  23. Comparison (never trust benchmarks) Loading 1.2 million records PostgreSQL MongoDB

    76 sec 8 min Sequential scan PostgreSQL MongoDB 980 ms 980 ms Index scan (Postgres GINhash) PostgreSQL MongoDB 0.7 ms 1 ms
  24. Reasonable queries SELECT id, title FROM articles WHERE attributes->'author'->>'first_name' =

    'cory'
  25. A hybrid solution Normal columns for more static data (e.g.

    id, title) Schemaless blobs for variable data (e.g. styling)
  26. Lessons

  27. Schemas are your friend Explicit definitions or checks will save

    you
  28. Read only mode It makes DB downtime more palatable

  29. Work to your DBs strengths It's not just a dumb

    data store
  30. Coordinate your team A little coorindation pays big dividends

  31. Try hybrid schemas Particularly good for CMSs or enterprise software

  32. Thanks! Andrew Godwin @andrewgodwin eventbrite.com/jobs are hiring: