$30 off During Our Annual Pro Sale. View Details »

Schemas and Databases in an Agile World

Schemas and Databases in an Agile World

A talk I gave at CRAFT 2014 in Budapest

Andrew Godwin

April 24, 2014
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Andrew Godwin
    @andrewgodwin
    DATABASES
    SCHEMAS
    in an agile world
    &

    View Slide

  2. Andrew Godwin
    Core Developer
    Senior Engineer

    View Slide

  3. Schemas
    Explicit & Implicit

    View Slide

  4. Explicit Schema
    ID int Name text Weight uint
    1
    2
    3
    Alice
    Bob
    Charles
    76
    84
    65
    Implicit Schema
    {
    "id": 342,
    "name": "David",
    "weight": 44,
    }

    View Slide

  5. Silent Failure
    {
    "id": 342,
    "name": "David",
    "weight": 74,
    }
    {
    "id": 342,
    "name": "Ellie",
    "weight": "85kg",
    }
    {
    "id": 342,
    "nom": "Frankie",
    "weight": 77,
    }
    {
    "id": 342,
    "name": "Frankie",
    "weight": -67,
    }

    View Slide

  6. Schemas
    inform
    Storage

    View Slide

  7. PostgreSQL

    View Slide

  8. Adding NULLable columns: instant
    But must be at end of table

    View Slide

  9. CREATE INDEX CONCURRENTLY
    Slower, and only one at a time

    View Slide

  10. Constraints after column addition
    This is more general advice

    View Slide

  11. MySQL
    Locks whole table
    Rewrites entire storage
    No DDL transactions

    View Slide

  12. Oracle / MSSQL / etc.
    Look into their strengths

    View Slide

  13. Workflows
    Databases aren't code...

    View Slide

  14. You can't put your database in a VCS
    You can put your schema in a VCS
    But your data won't always survive.

    View Slide

  15. Django Migrations
    Codified schema change format

    View Slide

  16. View Slide

  17. Migrations aren't enough
    You can't automate away a social problem!

    View Slide

  18. What if we got rid of the schema?
    That pesky, pesky schema.

    View Slide

  19. The Nesting Problem
    {
    "id": 123,
    "name": "Andrew",
    "friends": [
    {"id": 456, "name": "David"},
    {"id": 789, "name": "Mazz"},
    ],
    "likes": [
    {"id": 22, "liker": {"id": 789, "name", "Mazz"}},
    ],
    }

    View Slide

  20. You don't have to use a document DB
    (like CouchDB, MongoDB, etc.)

    View Slide

  21. Schemaless Columns
    ID int Name text Weight uint Data json
    1 Alice 76 { "nickname": "Al",
    "bgcolor": "#ff0033" }

    View Slide

  22. But that must be slower...
    Right?

    View Slide

  23. Comparison
    (never trust benchmarks)
    Loading 1.2 million records
    PostgreSQL
    MongoDB
    76 sec
    8 min
    Sequential scan
    PostgreSQL
    MongoDB
    980 ms
    980 ms
    Index scan (Postgres GINhash)
    PostgreSQL
    MongoDB
    0.7 ms
    1 ms

    View Slide

  24. Reasonable queries
    SELECT id, title FROM articles
    WHERE attributes->'author'->>'first_name' = 'cory'

    View Slide

  25. A hybrid solution
    Normal columns for more static data (e.g. id, title)
    Schemaless blobs for variable data (e.g. styling)

    View Slide

  26. Lessons

    View Slide

  27. Schemas are your friend
    Explicit definitions or checks will save you

    View Slide

  28. Read only mode
    It makes DB downtime more palatable

    View Slide

  29. Work to your DBs strengths
    It's not just a dumb data store

    View Slide

  30. Coordinate your team
    A little coorindation pays big dividends

    View Slide

  31. Try hybrid schemas
    Particularly good for CMSs or enterprise software

    View Slide

  32. Thanks!
    Andrew Godwin
    @andrewgodwin
    eventbrite.com/jobs
    are hiring:

    View Slide