Rapid schema development with PostgreSQL

Rapid schema development with PostgreSQL

A talk I gave at All Your Base Conf 2013.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

October 18, 2013
Tweet

Transcript

  1. Andrew Godwin @andrewgodwin Rapid schema development with PostgreSQL

  2. Django core developer Andrew Godwin Long-time PostgreSQL user Specialise in

    schemas and storage Hi! I'm
  3. None
  4. None
  5. as you think Schemas aren't quite as fixed they are

  6. Easy to change Don't need to be strict Lock-free ALTER

    TABLE CREATE INDEX CONCURRENTLY hstore json
  7. hstore

  8. hstore Key/Value store as a column type id title data

    1 Object One model → "4563-a" 2 Smaller One model → "4563-a" scale → 0.43 3 Another One model → "874-b" scale → 2 engraving → "TEST"
  9. hstore example CREATE EXTENSION hstore; CREATE TABLE widgets (id int,

    title text, data hstore); INSERT INTO widgets (id, text, data) VALUES (1, 'Object One', 'model => "8973-b", scale => 2'); SELECT * FROM widgets WHERE data->'scale' = '2'; id | title | data ----+------------+--------------------------------- 1 | Object One | "model"=>"8973-b", "scale"=>"2"
  10. hstore example SELECT title, data->'model' AS model FROM widgets WHERE

    data ?| ARRAY['model', 'diagram']; title | model ------------+-------- Object One | 8973-b CREATE INDEX ON widgets USING gin (data); CREATE INDEX ON widgets ((data->'scale'));
  11. json

  12. json Structured JSON as a column type id title data

    1 My Sofa {"colour": "brown"} 2 Your Sofa {"color"; "red", "weight": 2.6} 3 Their Sofa {"color": "blue", "weight": 0.8, "size": [4.5, 2.1, 1]}
  13. json example CREATE TABLE sofas (id int, title text, data

    json); INSERT INTO sofas (id, title, data) VALUES (1, 'My Sofa', '{"colour": "brown", "size": [3, 4, 5]}'); SELECT title, data->>'colour' AS colour FROM sofas WHERE data->>'colour' = 'brown'; title | colour ---------+-------- My Sofa | brown
  14. json example SELECT title, data->'size'->>0 AS length FROM sofas; title

    | length ---------+-------- My Sofa | 3 CREATE INDEX ON sofas ((data->>'colour'));
  15. Code on the Inside

  16. Embedded Languages pgSQL, Tcl, Perl, Python JavaScript, Ruby, Scheme, Java,

    PHP, more... INCLUDED ADDONS
  17. Example CREATE FUNCTION add_rating(id int, rating int) RETURNS float AS

    $$ var data = plv8.execute("SELECT count, total, average FROM ratings WHERE id = $1", [id])[0]; var newTotal = data.total + rating; var newCount = data.count + 1; var newAverage = (newTotal / newCount); plv8.execute("UPDATE ratings SET count = $1, total = $2, average = $3 WHERE id = $4", [newCount, newTotal, newAverage, id]); return newAverage; $$ LANGUAGE plv8; CREATE TABLE ratings (id int, count int, total int, average float);
  18. Example INSERT INTO ratings VALUES (1, 0, 0, 0); SELECT

    add_rating(1, 5); add_rating ------------ 5 SELECT add_rating(1, 4); add_rating ------------ 4.5
  19. Why? No round trips between SELECT/UPDATE Separates logic from storage

    Don't need to install language on client Language choice Encapsulation Speed
  20. Why not? Need tool to handle updating code No interactivity

    apart from SQL console Debugging Versioning
  21. Some things are too far... CREATE FUNCTION render_template(template text, context

    json) RETURNS text AS $$ import json from jinja2 import Template tinstance = Template(template) return tinstance.render(json.loads(context)) $$ LANGUAGE plpythonu; SELECT render_template('hello {{name}}', '{"name": "world"}'); render_template ----------------- hello world
  22. Schema changes without downtime

  23. Lock-free ALTER TABLE Only certain actions, including: ∙ Adding a

    column with NULL ∙ Dropping a column ∙ Converting very similar types (varchar → text)
  24. CREATE INDEX CONCURRENTLY ∙ Slower, but doesn't lock ∙ Not

    allowed inside transactions ∙ Can leave unfinished indexes in error cases
  25. Bringing it all together

  26. Hybrid Schemas id title data 1 An Article text →

    "Article text" layout → "3-col" slug an-article common, indexed freeform, changed in bulk
  27. Zero-downtime changes add NULL column add code populate remove code

    remove column use schemaless
  28. Schema Schemaless Faster querying Data integrity Slightly too easy to

    JOIN Faster prototyping Cleaner for sparse data Encourages messy code
  29. Version your schemas! (especially for schemaless stuff)

  30. Fill in data in the background Either at the database

    level or your code
  31. Put code in the database! But not too much. Only

    small single actions.
  32. Experiment. You can install all this in < 1 minute.

  33. Thanks! Andrew Godwin @andrewgodwin andrew@aeracode.org