$30 off During Our Annual Pro Sale. View Details »

Rapid schema development with PostgreSQL

Rapid schema development with PostgreSQL

A talk I gave at All Your Base Conf 2013.

Andrew Godwin

October 18, 2013
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Andrew Godwin
    @andrewgodwin
    Rapid schema
    development
    with PostgreSQL

    View Slide

  2. Django core developer
    Andrew Godwin
    Long-time PostgreSQL user
    Specialise in schemas and storage
    Hi! I'm

    View Slide

  3. View Slide

  4. View Slide

  5. as you think
    Schemas aren't quite as fixed
    they are

    View Slide

  6. Easy to change
    Don't need to be strict
    Lock-free ALTER TABLE
    CREATE INDEX CONCURRENTLY
    hstore
    json

    View Slide

  7. hstore

    View Slide

  8. hstore
    Key/Value store as a column type
    id title data
    1 Object One model → "4563-a"
    2 Smaller One model → "4563-a"
    scale → 0.43
    3 Another One model → "874-b"
    scale → 2
    engraving → "TEST"

    View Slide

  9. hstore
    example
    CREATE EXTENSION hstore;
    CREATE TABLE widgets (id int, title text, data hstore);
    INSERT INTO widgets (id, text, data) VALUES
    (1, 'Object One', 'model => "8973-b", scale => 2');
    SELECT * FROM widgets WHERE data->'scale' = '2';
    id | title | data
    ----+------------+---------------------------------
    1 | Object One | "model"=>"8973-b", "scale"=>"2"

    View Slide

  10. hstore
    example
    SELECT title, data->'model' AS model
    FROM widgets
    WHERE data ?| ARRAY['model', 'diagram'];
    title | model
    ------------+--------
    Object One | 8973-b
    CREATE INDEX ON widgets USING gin (data);
    CREATE INDEX ON widgets ((data->'scale'));

    View Slide

  11. json

    View Slide

  12. json
    Structured JSON as a column type
    id title data
    1 My Sofa {"colour": "brown"}
    2 Your Sofa {"color"; "red",
    "weight": 2.6}
    3 Their Sofa {"color": "blue",
    "weight": 0.8,
    "size": [4.5, 2.1, 1]}

    View Slide

  13. json
    example
    CREATE TABLE sofas (id int, title text, data json);
    INSERT INTO sofas (id, title, data) VALUES
    (1, 'My Sofa', '{"colour": "brown", "size": [3, 4, 5]}');
    SELECT title, data->>'colour' AS colour FROM sofas
    WHERE data->>'colour' = 'brown';
    title | colour
    ---------+--------
    My Sofa | brown

    View Slide

  14. json
    example
    SELECT title, data->'size'->>0 AS length FROM sofas;
    title | length
    ---------+--------
    My Sofa | 3
    CREATE INDEX ON sofas ((data->>'colour'));

    View Slide

  15. Code on the Inside

    View Slide

  16. Embedded Languages
    pgSQL, Tcl, Perl, Python
    JavaScript, Ruby, Scheme, Java, PHP, more...
    INCLUDED
    ADDONS

    View Slide

  17. Example
    CREATE FUNCTION add_rating(id int, rating int)
    RETURNS float AS
    $$
    var data = plv8.execute("SELECT count, total,
    average FROM ratings WHERE id = $1", [id])[0];
    var newTotal = data.total + rating;
    var newCount = data.count + 1;
    var newAverage = (newTotal / newCount);
    plv8.execute("UPDATE ratings SET count = $1,
    total = $2, average = $3 WHERE id = $4",
    [newCount, newTotal, newAverage, id]);
    return newAverage;
    $$
    LANGUAGE plv8;
    CREATE TABLE ratings (id int, count int,
    total int, average float);

    View Slide

  18. Example
    INSERT INTO ratings VALUES (1, 0, 0, 0);
    SELECT add_rating(1, 5);
    add_rating
    ------------
    5
    SELECT add_rating(1, 4);
    add_rating
    ------------
    4.5

    View Slide

  19. Why?
    No round trips between SELECT/UPDATE
    Separates logic from storage
    Don't need to install language on client
    Language choice
    Encapsulation
    Speed

    View Slide

  20. Why not?
    Need tool to handle updating code
    No interactivity apart from SQL console
    Debugging
    Versioning

    View Slide

  21. Some things are too far...
    CREATE FUNCTION render_template(template text,
    context json) RETURNS text AS
    $$
    import json
    from jinja2 import Template
    tinstance = Template(template)
    return tinstance.render(json.loads(context))
    $$
    LANGUAGE plpythonu;
    SELECT render_template('hello {{name}}', '{"name":
    "world"}');
    render_template
    -----------------
    hello world

    View Slide

  22. Schema changes
    without downtime

    View Slide

  23. Lock-free ALTER TABLE
    Only certain actions, including:
    ∙ Adding a column with NULL
    ∙ Dropping a column
    ∙ Converting very similar types (varchar → text)

    View Slide

  24. CREATE INDEX CONCURRENTLY
    ∙ Slower, but doesn't lock
    ∙ Not allowed inside transactions
    ∙ Can leave unfinished indexes in error cases

    View Slide

  25. Bringing it all together

    View Slide

  26. Hybrid Schemas
    id title data
    1 An Article text → "Article text"
    layout → "3-col"
    slug
    an-article
    common, indexed freeform, changed in bulk

    View Slide

  27. Zero-downtime changes
    add NULL column
    add code
    populate
    remove code remove column
    use schemaless

    View Slide

  28. Schema Schemaless
    Faster querying
    Data integrity
    Slightly too easy to JOIN
    Faster prototyping
    Cleaner for sparse data
    Encourages messy code

    View Slide

  29. Version your schemas!
    (especially for schemaless stuff)

    View Slide

  30. Fill in data in the background
    Either at the database level or your code

    View Slide

  31. Put code in the database!
    But not too much. Only small single actions.

    View Slide

  32. Experiment.
    You can install all this in < 1 minute.

    View Slide

  33. Thanks!
    Andrew Godwin
    @andrewgodwin
    [email protected]

    View Slide