Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema liberation with JSON and plv8 (and Postgres)

Schema liberation with JSON and plv8 (and Postgres)

Dropbox link: https://www.dropbox.com/s/vc2oheabr5s1x11/schema%20liberation%20with%20json%20and%20plv8.pdf

A few more details: http://www.chesnok.com/daily/2013/05/31/more-about-javascript-and-postgresql/

JSON is the lingua franca of the web. Postgres supports JSON natively and the PLV8 Javascript engine as an extension. Full Javascript is supported inside the database, including direct access to tables and important database features.

And, developer-friendly features make it incredibly easy to transform your existing, normalized schemas into liberated JSON ones!

Tour what putting a document store inside of Postgres looks like, including a look at a production use-case from Mozilla. crash-stats.mozilla.com is backed by a 2 TB Postgres database cluster that's adding about about 5 GB of JSON per week.

All photos of birds were taken in the Amelia Island, FL area during the JSConf Family Reunion in May 2013.

Birds represented:
Baby Snowy Egrets
Great Egret
Yellow throated warbler
Tri-colored Heron
Laughing gull
Roseate spoonbill (endangered)
Black skimmer
Roseate spoonbill flock
Great Egret
Baby great egrets
Crocodile (not a bird)
Wood stork (endangered)
Eggs (we think from a snowy egret)
Green Heron

Selena Deckelmann

May 31, 2013
Tweet

More Decks by Selena Deckelmann

Other Decks in Programming

Transcript

  1. schema liberation
    with json and plv8
    selena deckelmann
    @selenamarie
    [email protected]

    View Slide

  2. Schema design and
    JSON
    3
    before plv8 & JSON datatype:

    View Slide

  3. What do I mean by
    schema liberation?

    View Slide

  4. Data in MongoDB has a flexible schema.
    ...
    Schema flexibility means that you can
    model your documents in MongoDB so
    that they can closely resemble and reflect
    application-level objects.
    -The MongoDB Documentation

    View Slide

  5. Data in MongoDB has a flexible schema.
    ...
    Schema flexibility means that you can
    model your documents in MongoDB so
    that they can closely resemble and
    reflect application-level objects.
    -The MongoDB Documentation

    View Slide

  6. Example: Bird sightings
    Example: Bird sightings

    View Slide

  7. create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View Slide

  8. vs.
    create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View Slide

  9. {"sightings":
    [{"name": "yellow throated warbler",
    "where": "Amelia Island, FL",
    "when": "2013-05-28"}]
    }
    vs.
    create table birds (id serial, name text);
    create table locations (id serial, state text, city text);
    create table sightings (id serial, seen_on date DEFAULT now()::date,
    bird_id int, location_id int);
    insert into table birds (name) values(‘yellow throated warbler’);
    insert into table locations (state, city)
    values(‘FL’, ‘Amelia Island’);
    insert into sightings (bird_id, location_id) ( with b_id as (select id
    from birds where name = ‘painted bunting’), with l_id as (select id
    from location where state = ‘FL’ and city = ‘Amelia Island’)
    select b_id.id, l_id.id from b_id, l_id );

    View Slide

  10. Can Postgres be flexible?

    View Slide

  11. ingredients
    9.3 Postgres*
    plv8
    * 9.2 Postgres +
    json_enhancements
    works too

    View Slide

  12. v8: JS engine
    plv8:JS engine
    embedded in Postgres
    https://code.google.com/p/plv8js/wiki/PLV8

    View Slide

  13. reports

    View Slide

  14. View Slide

  15. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns)

    View Slide

  16. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns) Roseate spoonbill says: “Yuck!”

    View Slide

  17. Let’s convert to JSON!

    View Slide

  18. select * from reports;

    View Slide

  19. SELECT row_to_json(reports)
    FROM reports;

    View Slide

  20. CREATE TABLE liberated AS (
    );

    View Slide

  21. CREATE TABLE liberated AS (
    SELECT row_to_json(reports)
    FROM reports
    );

    View Slide

  22. postgres# \d liberated
    Table "public.liberated"
    Column | Type | Modifiers
    -------------+---------+-----------
    row_to_json | json |

    View Slide

  23. postgres# create table liberated
    as (select row_to_json(row) from
    (select * from reports_20130506)
    as row);
    SELECT 2695824
    Time: 110470.716 ms

    View Slide

  24. postgres# create table liberated
    as (select row_to_json(row) from
    (select * from reports_20130506)
    as row);
    SELECT 2695824
    Time: 110470.716 ms
    Took about 2 minutes on an older dev system.

    View Slide

  25. postgres# select
    pg_size_pretty(pg_table_size('repo
    rts_20130506')) as “original
    size”;
    original size
    ---------------
    2120 MB
    postgres# select
    pg_size_pretty(pg_table_size('libe
    rated')) as “liberated size”;
    liberated size
    ----------------
    4076 MB

    View Slide

  26. Liberate the database!

    View Slide

  27. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View Slide

  28. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View Slide

  29. CREATE SCHEMA liberated;
    -- A namespace for our new tables!
    -- a cluster contains databases
    -- a database contains schemas
    -- a schema contains tables

    View Slide

  30. CREATE OR REPLACE FUNCTION liberate()
    RETURNS boolean -- Say what we’re returning
    LANGUAGE plv8 -- Say we’ll be using plv8
    AS $function$
    -- Function goes here!
    $function$;

    View Slide

  31. -- Find all the names of
    -- all postgres tables!
    SELECT relname AS table_name
    FROM pg_catalog.pg_class c
    LEFT JOIN pg_catalog.pg_namespace n
    ON n.oid = c.relnamespace
    WHERE
    relkind = 'r' -- meaning a table!
    and n.nspname = 'public';

    View Slide

  32. CREATE TABLE liberated AS (
    SELECT row_to_json(reports)
    FROM reports
    );

    View Slide

  33. CREATE SCHEMA liberated;
    CREATE OR REPLACE FUNCTION public.liberate()
    RETURNS boolean
    LANGUAGE plv8
    IMMUTABLE
    AS $function$
    var tables = plv8.execute(
    "SELECT relname FROM pg_catalog.pg_class c"
    + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace"
    + " WHERE relkind = 'r' AND n.nspname = 'public'");
    for (var i = 0; i < tables.length; i++) {
    plv8.elog(NOTICE, tables[i]['relname']);
    var code = "CREATE TABLE liberated."
    + tables[i]['relname']
    + " AS ( SELECT row_to_json("
    + tables[i]['relname']
    + ") AS liberated_json FROM "
    + tables[i]['relname']
    + ")";
    plv8.execute(code);
    }
    return 1;
    $function$;
    https://gist.github.com/selenamarie/5646494

    View Slide

  34. https://gist.github.com/selenamarie/5646494

    View Slide

  35. First class datatype
    create table birds (sighting json);

    View Slide

  36. postgres# create table birds (sighting json);
    CREATE TABLE
    postgres# insert into birds VALUES(
    '{"name": "Great Egret",
    "where": "Amelia Island, FL",
    "when":"2013-05-28"}'
    );
    INSERT 0 1
    postgres# select * from birds;
    sighting
    --------------------------------------------------------------------
    {"name": "Great Egret", "where": "Amelia Island, FL", "when": "2013-05-28"}
    (1 row)

    View Slide

  37. postgres# select json_object_field_text(sighting, 'name') from birds;
    json_object_field_text
    ------------------------
    Great Egret
    postgres# select json_object_field(sighting, 'name') from birds;
    json_object_field
    -------------------
    "Great Egret"
    postgres# select json_object_keys(sighting) from birds;
    json_object_keys
    ------------------
    name
    where
    when
    And we can create indexes, compare, join against and
    use JSON columns like any other Postgres column.

    View Slide

  38. plv8 is trusted.
    can be run by users.

    View Slide

  39. plv8 automatically
    parses JSON input.

    View Slide

  40. create or replace function json_value(
    j json, key text)
    returns text
    AS $$
    if (typeof j != 'object')
    return NULL;
    return JSON.stringify(j[key]);
    $$ language plv8;

    View Slide

  41. plv8 supports running raw SQL,
    prepared statements and cursors:
    rows = plv8.execute(...)
    plan = plv8.prepare(...)
    rows = plan.execute(...)
    plan = plv8.prepare(...)
    cursor = plan.cursor(...)
    while (row = cursor.fetch()) {}

    View Slide

  42. create or replace function
    inject_js(src text)
    returns text as $$
    return eval(
    "(function() {" + src + "})")();
    $$ LANGUAGE plv8;

    View Slide

  43. Ok that’s cute and all.

    View Slide

  44. What about in
    production?

    View Slide

  45. View Slide

  46. View Slide

  47. postgres# \d reports
    Table "public.reports"
    Column | Type | Modifiers
    ---------------------+--------------------------+------------------------------------------------------
    id | integer | not null default nextval('reports_id_seq'::regclass)
    client_crash_date | timestamp with time zone |
    date_processed | timestamp with time zone |
    uuid | character varying(50) | not null
    product | character varying(30) |
    version | character varying(16) |
    build | character varying(30) |
    signature | character varying(255) |
    install_age | integer |
    last_crash | integer |
    uptime | integer |
    cpu_name | character varying(100) |
    cpu_info | character varying(100) |
    reason | character varying(255) |
    address | character varying(20) |
    os_name | character varying(100) |
    os_version | character varying(100) |
    user_id | character varying(50) |
    started_datetime | timestamp with time zone |
    completed_datetime | timestamp with time zone |
    success | boolean |
    truncated | boolean |
    processor_notes | text |
    user_comments | character varying(1024) |
    app_notes | character varying(1024) |
    distributor | character varying(20) |
    distributor_version | character varying(20) |
    topmost_filenames | text |
    addons_checked | boolean |
    flash_version | text |
    hangid | text |
    process_type | text |
    release_channel | text |
    productid | text |
    exploitability | text |
    flash_process_dump | text |
    url | text |
    email | text |
    (38 columns)

    View Slide

  48. create table raw_crashes (
    crash_id uuid,
    date_processed timestamptz,
    raw_crash json
    );

    View Slide

  49. postgres# select
    pg_size_pretty(pg_table_size('reports
    _20130513')) as
    reports_20130513_size;
    reports_20130513_size
    -----------------------
    2136 MB
    postgres# select
    pg_size_pretty(pg_table_size('raw_cra
    shes_20130513')) as
    raw_crashes_20130513_size;
    raw_crashes_20130513_size
    ---------------------------
    5246 MB

    View Slide

  50. Per crash storage
    JSON: ~2.0k
    normalized table: ~0.8k

    View Slide

  51. What’s the
    killer advantage?

    View Slide

  52. SQL: 24 seconds
    vs
    Map Reduce: 30 minutes

    View Slide

  53. Why use Postgres?

    View Slide

  54. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    Why use Postgres?

    View Slide

  55. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    2. Easily scale to multi-terabyte DBs
    For write- or read-heavy loads, non-cloud
    storage
    Why use Postgres?

    View Slide

  56. 1. Use “bulkbag” schema design +
    schema evolution
    JSON to start, normalize to optimize
    2. Easily scale to multi-terabyte DBs
    For write- or read-heavy loads, non-cloud
    storage
    3. Manage your data with a language
    you love
    Put the .js for big data jobs with the big data
    Why use Postgres?

    View Slide

  57. Try out 9.3beta today!
    http://www.postgresql.org/developer/beta/

    View Slide

  58. TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index
    Creation
    Extensions
    Common Table
    Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync
    replication
    Window Functions
    Materialized Views
    Updatable Views
    Momentum

    View Slide

  59. liberate your schema!
    with json and plv8
    selena deckelmann
    @selenamarie
    [email protected]
    https://code.google.com/p/plv8js/wiki/PLV8
    http://www.postgresql.org/developer/beta/
    https://gist.github.com/selenamarie/5646494
    http://www.pgxn.org/dist/json_enhancements/doc/
    json_enhancements.html

    View Slide

  60. Special thanks to:
    Scott Deckelmann
    Andrew Dunstan (JSON
    datatype developer)
    Hitoshi Harada (plv8 author)
    Will Leinweber (source)
    Craig Kerstiens (source)
    reviewers:
    Brenda Wallace
    Aurynn Shaw
    Doug Gorley
    Andrew Grangaard
    George London
    Jacinta Richardson

    View Slide

  61. This work is licensed under the Creative Commons Attribution-
    ShareAlike 3.0 Unported License. To view a copy of this license,
    visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter
    to Creative Commons, 444 Castro Street, Suite 900, Mountain
    View, California, 94041, USA.
    Photos from slides 1, 3, 6, 8, 9, 10, 12, 13, 22, 31, 39, 40, 48, 49, 51,
    53 © Scott Deckelmann, used with permission.

    View Slide