Schema liberation with JSON and plv8 (and Postgres)

Schema liberation with JSON and plv8 (and Postgres)

Dropbox link: https://www.dropbox.com/s/vc2oheabr5s1x11/schema%20liberation%20with%20json%20and%20plv8.pdf

A few more details: http://www.chesnok.com/daily/2013/05/31/more-about-javascript-and-postgresql/

JSON is the lingua franca of the web. Postgres supports JSON natively and the PLV8 Javascript engine as an extension. Full Javascript is supported inside the database, including direct access to tables and important database features.

And, developer-friendly features make it incredibly easy to transform your existing, normalized schemas into liberated JSON ones!

Tour what putting a document store inside of Postgres looks like, including a look at a production use-case from Mozilla. crash-stats.mozilla.com is backed by a 2 TB Postgres database cluster that's adding about about 5 GB of JSON per week.

All photos of birds were taken in the Amelia Island, FL area during the JSConf Family Reunion in May 2013.

Birds represented:
Baby Snowy Egrets
Great Egret
Yellow throated warbler
Tri-colored Heron
Laughing gull
Roseate spoonbill (endangered)
Black skimmer
Roseate spoonbill flock
Great Egret
Baby great egrets
Crocodile (not a bird)
Wood stork (endangered)
Eggs (we think from a snowy egret)
Green Heron

4535e53ad45275fa955c5b05684342c5?s=128

Selena Deckelmann

May 31, 2013
Tweet

Transcript

  1. schema liberation with json and plv8 selena deckelmann @selenamarie selena@mozilla.com

  2. Schema design and JSON </3 before plv8 & JSON datatype:

  3. What do I mean by schema liberation?

  4. Data in MongoDB has a flexible schema. ... Schema flexibility

    means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects. -The MongoDB Documentation
  5. Data in MongoDB has a flexible schema. ... Schema flexibility

    means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects. -The MongoDB Documentation
  6. Example: Bird sightings Example: Bird sightings

  7. create table birds (id serial, name text); create table locations

    (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  8. vs. create table birds (id serial, name text); create table

    locations (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  9. {"sightings": [{"name": "yellow throated warbler", "where": "Amelia Island, FL", "when":

    "2013-05-28"}] } vs. create table birds (id serial, name text); create table locations (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  10. Can Postgres be flexible?

  11. ingredients 9.3 Postgres* plv8 * 9.2 Postgres + json_enhancements works

    too
  12. v8: JS engine plv8:JS engine embedded in Postgres https://code.google.com/p/plv8js/wiki/PLV8

  13. reports

  14. None
  15. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns)
  16. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns) Roseate spoonbill says: “Yuck!”
  17. Let’s convert to JSON!

  18. select * from reports;

  19. SELECT row_to_json(reports) FROM reports;

  20. CREATE TABLE liberated AS ( );

  21. CREATE TABLE liberated AS ( SELECT row_to_json(reports) FROM reports );

  22. postgres# \d liberated Table "public.liberated" Column | Type | Modifiers

    -------------+---------+----------- row_to_json | json |
  23. postgres# create table liberated as (select row_to_json(row) from (select *

    from reports_20130506) as row); SELECT 2695824 Time: 110470.716 ms
  24. postgres# create table liberated as (select row_to_json(row) from (select *

    from reports_20130506) as row); SELECT 2695824 Time: 110470.716 ms Took about 2 minutes on an older dev system.
  25. postgres# select pg_size_pretty(pg_table_size('repo rts_20130506')) as “original size”; original size ---------------

    2120 MB postgres# select pg_size_pretty(pg_table_size('libe rated')) as “liberated size”; liberated size ---------------- 4076 MB
  26. Liberate the database!

  27. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  28. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  29. CREATE SCHEMA liberated; -- A namespace for our new tables!

    -- a cluster contains databases -- a database contains schemas -- a schema contains tables
  30. CREATE OR REPLACE FUNCTION liberate() RETURNS boolean -- Say what

    we’re returning LANGUAGE plv8 -- Say we’ll be using plv8 AS $function$ -- Function goes here! $function$;
  31. -- Find all the names of -- all postgres tables!

    SELECT relname AS table_name FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' -- meaning a table! and n.nspname = 'public';
  32. CREATE TABLE liberated AS ( SELECT row_to_json(reports) FROM reports );

  33. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  34. https://gist.github.com/selenamarie/5646494

  35. First class datatype create table birds (sighting json);

  36. postgres# create table birds (sighting json); CREATE TABLE postgres# insert

    into birds VALUES( '{"name": "Great Egret", "where": "Amelia Island, FL", "when":"2013-05-28"}' ); INSERT 0 1 postgres# select * from birds; sighting -------------------------------------------------------------------- {"name": "Great Egret", "where": "Amelia Island, FL", "when": "2013-05-28"} (1 row)
  37. postgres# select json_object_field_text(sighting, 'name') from birds; json_object_field_text ------------------------ Great Egret

    postgres# select json_object_field(sighting, 'name') from birds; json_object_field ------------------- "Great Egret" postgres# select json_object_keys(sighting) from birds; json_object_keys ------------------ name where when And we can create indexes, compare, join against and use JSON columns like any other Postgres column.
  38. plv8 is trusted. can be run by users.

  39. plv8 automatically parses JSON input.

  40. create or replace function json_value( j json, key text) returns

    text AS $$ if (typeof j != 'object') return NULL; return JSON.stringify(j[key]); $$ language plv8;
  41. plv8 supports running raw SQL, prepared statements and cursors: rows

    = plv8.execute(...) plan = plv8.prepare(...) rows = plan.execute(...) plan = plv8.prepare(...) cursor = plan.cursor(...) while (row = cursor.fetch()) {}
  42. create or replace function inject_js(src text) returns text as $$

    return eval( "(function() {" + src + "})")(); $$ LANGUAGE plv8;
  43. Ok that’s cute and all.

  44. What about in production?

  45. None
  46. None
  47. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns)
  48. create table raw_crashes ( crash_id uuid, date_processed timestamptz, raw_crash json

    );
  49. postgres# select pg_size_pretty(pg_table_size('reports _20130513')) as reports_20130513_size; reports_20130513_size ----------------------- 2136 MB

    postgres# select pg_size_pretty(pg_table_size('raw_cra shes_20130513')) as raw_crashes_20130513_size; raw_crashes_20130513_size --------------------------- 5246 MB
  50. Per crash storage JSON: ~2.0k normalized table: ~0.8k

  51. What’s the killer advantage?

  52. SQL: 24 seconds vs Map Reduce: 30 minutes

  53. Why use Postgres?

  54. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize Why use Postgres?
  55. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize 2. Easily scale to multi-terabyte DBs For write- or read-heavy loads, non-cloud storage Why use Postgres?
  56. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize 2. Easily scale to multi-terabyte DBs For write- or read-heavy loads, non-cloud storage 3. Manage your data with a language you love Put the .js for big data jobs with the big data Why use Postgres?
  57. Try out 9.3beta today! http://www.postgresql.org/developer/beta/

  58. TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent

    Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window Functions Materialized Views Updatable Views Momentum
  59. liberate your schema! with json and plv8 selena deckelmann @selenamarie

    selena@mozilla.com https://code.google.com/p/plv8js/wiki/PLV8 http://www.postgresql.org/developer/beta/ https://gist.github.com/selenamarie/5646494 http://www.pgxn.org/dist/json_enhancements/doc/ json_enhancements.html
  60. Special thanks to: Scott Deckelmann Andrew Dunstan (JSON datatype developer)

    Hitoshi Harada (plv8 author) Will Leinweber (source) Craig Kerstiens (source) reviewers: Brenda Wallace Aurynn Shaw Doug Gorley Andrew Grangaard George London Jacinta Richardson
  61. This work is licensed under the Creative Commons Attribution- ShareAlike

    3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. Photos from slides 1, 3, 6, 8, 9, 10, 12, 13, 22, 31, 39, 40, 48, 49, 51, 53 © Scott Deckelmann, used with permission.