Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema liberation with JSON and plv8 (and Postgres)

Schema liberation with JSON and plv8 (and Postgres)

Dropbox link: https://www.dropbox.com/s/vc2oheabr5s1x11/schema%20liberation%20with%20json%20and%20plv8.pdf

A few more details: http://www.chesnok.com/daily/2013/05/31/more-about-javascript-and-postgresql/

JSON is the lingua franca of the web. Postgres supports JSON natively and the PLV8 Javascript engine as an extension. Full Javascript is supported inside the database, including direct access to tables and important database features.

And, developer-friendly features make it incredibly easy to transform your existing, normalized schemas into liberated JSON ones!

Tour what putting a document store inside of Postgres looks like, including a look at a production use-case from Mozilla. crash-stats.mozilla.com is backed by a 2 TB Postgres database cluster that's adding about about 5 GB of JSON per week.

All photos of birds were taken in the Amelia Island, FL area during the JSConf Family Reunion in May 2013.

Birds represented:
Baby Snowy Egrets
Great Egret
Yellow throated warbler
Tri-colored Heron
Laughing gull
Roseate spoonbill (endangered)
Black skimmer
Roseate spoonbill flock
Great Egret
Baby great egrets
Crocodile (not a bird)
Wood stork (endangered)
Eggs (we think from a snowy egret)
Green Heron

Selena Deckelmann

May 31, 2013
Tweet

More Decks by Selena Deckelmann

Other Decks in Programming

Transcript

  1. schema liberation
    with json and plv8
    selena deckelmann
    @selenamarie
    [email protected]

    View full-size slide

  2. Schema design and
    JSON
    3
    before plv8 & JSON datatype:

    View full-size slide

  3. What do I mean by
    schema liberation?

    View full-size slide

  4. Data in MongoDB has a flexible schema.
    ...
    Schema flexibility means that you can
    model your documents in MongoDB so
    that they can closely resemble and reflect
    application-level objects.
    -The MongoDB Documentation

    View full-size slide

  5. Data in MongoDB has a flexible schema.
    ...
    Schema flexibility means that you can
    model your documents in MongoDB so
    that they can closely resemble and
    reflect application-level objects.
    -The MongoDB Documentation

    View full-size slide

  6. Example: Bird sightings
    Example: Bird sightings

    View full-size slide

  7. create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View full-size slide

  8. vs.
    create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View full-size slide

  9. {"sightings":
    [{"name": "yellow throated warbler",
    "where": "Amelia Island, FL",
    "when": "2013-05-28"}]
    }
    vs.
    create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View full-size slide

  10. Can Postgres be flexible?

    View full-size slide

  11. ingredients
    9.3 Postgres*
    plv8
    * 9.2 Postgres +
    json_enhancements
    works too

    View full-size slide

  12. v8: JS engine
    plv8:JS engine
    embedded in Postgres
    https://code.google.com/p/plv8js/wiki/PLV8

    View full-size slide

  13. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns)

    View full-size slide

  14. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns) Roseate spoonbill says: “Yuck!”

    View full-size slide

  15. Let’s convert to JSON!

    View full-size slide

  16. select * from reports;

    View full-size slide

  17. SELECT row_to_json(reports)
    FROM reports;

    View full-size slide

  18. CREATE TABLE liberated AS (
    );

    View full-size slide

  19. CREATE TABLE liberated AS (
    SELECT row_to_json(reports)
    FROM reports
    );

    View full-size slide

  20. postgres# \d liberated
    Table "public.liberated"
    Column | Type | Modifiers
    -------------+---------+-----------
    row_to_json | json |

    View full-size slide

  21. postgres# create table liberated
    as (select row_to_json(row) from
    (select * from reports_20130506)
    as row);
    SELECT 2695824
    Time: 110470.716 ms

    View full-size slide

  22. postgres# create table liberated
    as (select row_to_json(row) from
    (select * from reports_20130506)
    as row);
    SELECT 2695824
    Time: 110470.716 ms
    Took about 2 minutes on an older dev system.

    View full-size slide

  23. postgres# select
    pg_size_pretty(pg_table_size('repo
    rts_20130506')) as “original
    size”;
    original size
    ---------------
    2120 MB
    postgres# select
    pg_size_pretty(pg_table_size('libe
    rated')) as “liberated size”;
    liberated size
    ----------------
    4076 MB

    View full-size slide

  24. Liberate the database!

    View full-size slide

  25. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View full-size slide

  26. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View full-size slide

  27. CREATE SCHEMA liberated;
    -- A namespace for our new tables!
    -- a cluster contains databases
    -- a database contains schemas
    -- a schema contains tables

    View full-size slide

  28. CREATE OR REPLACE FUNCTION liberate()
    RETURNS boolean -- Say what we’re returning
    LANGUAGE plv8 -- Say we’ll be using plv8
    AS $function$
    -- Function goes here!
    $function$;

    View full-size slide

  29. -- Find all the names of
    -- all postgres tables!
    SELECT relname AS table_name
    FROM pg_catalog.pg_class c
    LEFT JOIN pg_catalog.pg_namespace n
    ON n.oid = c.relnamespace
    WHERE
    relkind = 'r' -- meaning a table!
    and n.nspname = 'public';

    View full-size slide

  30. CREATE TABLE liberated AS (
    SELECT row_to_json(reports)
    FROM reports
    );

    View full-size slide

  31. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View full-size slide

  32. https://gist.github.com/selenamarie/5646494

    View full-size slide

  33. First class datatype
    create table birds (sighting json);

    View full-size slide

  34. postgres# create table birds (sighting json);
    CREATE TABLE
    postgres# insert into birds VALUES(
    '{"name": "Great Egret",
    "where": "Amelia Island, FL",
    "when":"2013-05-28"}'
    );
    INSERT 0 1
    postgres# select * from birds;
    sighting
    --------------------------------------------------------------------
    {"name": "Great Egret", "where": "Amelia Island, FL", "when": "2013-05-28"}
    (1 row)

    View full-size slide

  35. postgres# select json_object_field_text(sighting, 'name') from birds;
    json_object_field_text
    ------------------------
    Great Egret
    postgres# select json_object_field(sighting, 'name') from birds;
    json_object_field
    -------------------
    "Great Egret"
    postgres# select json_object_keys(sighting) from birds;
    json_object_keys
    ------------------
    name
    where
    when
    And we can create indexes, compare, join against and
    use JSON columns like any other Postgres column.

    View full-size slide

  36. plv8 is trusted.
    can be run by users.

    View full-size slide

  37. plv8 automatically
    parses JSON input.

    View full-size slide

  38. create or replace function json_value(
    j json, key text)
    returns text
    AS $$
    if (typeof j != 'object')
    return NULL;
    return JSON.stringify(j[key]);
    $$ language plv8;

    View full-size slide

  39. plv8 supports running raw SQL,
    prepared statements and cursors:
    rows = plv8.execute(...)
    plan = plv8.prepare(...)
    rows = plan.execute(...)
    plan = plv8.prepare(...)
    cursor = plan.cursor(...)
    while (row = cursor.fetch()) {}

    View full-size slide

  40. create or replace function
    inject_js(src text)
    returns text as $$
    return eval(
    "(function() {" + src + "})")();
    $$ LANGUAGE plv8;

    View full-size slide

  41. Ok that’s cute and all.

    View full-size slide

  42. What about in
    production?

    View full-size slide

  43. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns)

    View full-size slide

  44. create table raw_crashes (
    crash_id uuid,
    date_processed timestamptz,
    raw_crash json
    );

    View full-size slide

  45. postgres# select
    pg_size_pretty(pg_table_size('reports
    _20130513')) as
    reports_20130513_size;
    reports_20130513_size
    -----------------------
    2136 MB
    postgres# select
    pg_size_pretty(pg_table_size('raw_cra
    shes_20130513')) as
    raw_crashes_20130513_size;
    raw_crashes_20130513_size
    ---------------------------
    5246 MB

    View full-size slide

  46. Per crash storage
    JSON: ~2.0k
    normalized table: ~0.8k

    View full-size slide

  47. What’s the
    killer advantage?

    View full-size slide

  48. SQL: 24 seconds
    vs
    Map Reduce: 30 minutes

    View full-size slide

  49. Why use Postgres?

    View full-size slide

  50. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    Why use Postgres?

    View full-size slide

  51. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    2. Easily scale to multi-terabyte DBs
    For write- or read-heavy loads, non-cloud
    storage
    Why use Postgres?

    View full-size slide

  52. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    2. Easily scale to multi-terabyte DBs
    For write- or read-heavy loads, non-cloud
    storage
    3. Manage your data with a language
    you love
    Put the .js for big data jobs with the big data
    Why use Postgres?

    View full-size slide

  53. Try out 9.3beta today!
    http://www.postgresql.org/developer/beta/

    View full-size slide

  54. TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index
    Creation
    Extensions
    Common Table
    Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync
    replication
    Window Functions
    Materialized Views
    Updatable Views
    Momentum

    View full-size slide

  55. liberate your schema!
    with json and plv8
    selena deckelmann
    @selenamarie
    [email protected]
    https://code.google.com/p/plv8js/wiki/PLV8
    http://www.postgresql.org/developer/beta/
    https://gist.github.com/selenamarie/5646494
    http://www.pgxn.org/dist/json_enhancements/doc/
    json_enhancements.html

    View full-size slide

  56. Special thanks to:
    Scott Deckelmann
    Andrew Dunstan (JSON
    datatype developer)
    Hitoshi Harada (plv8 author)
    Will Leinweber (source)
    Craig Kerstiens (source)
    reviewers:
    Brenda Wallace
    Aurynn Shaw
    Doug Gorley
    Andrew Grangaard
    George London
    Jacinta Richardson

    View full-size slide

  57. This work is licensed under the Creative Commons Attribution-
    ShareAlike 3.0 Unported License. To view a copy of this license,
    visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter
    to Creative Commons, 444 Castro Street, Suite 900, Mountain
    View, California, 94041, USA.
    Photos from slides 1, 3, 6, 8, 9, 10, 12, 13, 22, 31, 39, 40, 48, 49, 51,
    53 © Scott Deckelmann, used with permission.

    View full-size slide