Ruby and PostgresQL, a love story

Ruby and PostgresQL, a love story

#RubyKaigi 2015

0256f4863b3a08d20a382766f842e63c?s=128

Franck Verrot

December 14, 2015
Tweet

Transcript

  1. 6.
  2. 7.
  3. 8.

    Agenda I <3 SQL I <3 Ruby Just enough of

    PostgreSQL’s APIs Procedural Languages Index Access Methods Foreign Data Wrappers
  4. 9.

    I <3 PostgreSQL •Happy user since 2009 •Fully-featured Relational DBMS

    •Clean and large codebase •Super active and friendly community
  5. 10.
  6. 11.

    I <3 Ruby • Happy user since 2006 • Designing

    & Using DSLs in Ruby is efficient • ETL process (“Extract, Transform & Load”) (牙 Kiba) • Workflows, StateMachines (state_machines) • Fake Data (factory_girl) • Ruby on Rails (rails) • Lots of great gems (rubygems)
  7. 14.

    Generating fake data: in SQL 1. Generate the data using

    a custom script / external tools 2. On-the-fly using functions
  8. 16.

    De-identifying / Anonymizing: in Ruby 1. Ahead Of Time: duplicates

    the “safe” content 2. Just In Time: protected data is already in memory
  9. 17.

    De-identifying / Anonymizing: in SQL • Custom views and functions:

    CREATE VIEW safe_table AS SELECT protect(name) as name, … FROM my_table (views can even be materialized for faster data retrieval)
  10. 19.
  11. 20.

    “ ” I just wish I could inject some existing

    gems in PostgreSQL Me, January 2015
  12. 21.

    “ ” I KNOW, I COULD “JUST” USE RUBY AS

    A PROCEDURAL LANGUAGE!!!1!1! Still me, a few minutes after
  13. 22.
  14. 23.
  15. 24.
  16. 25.
  17. 30.

    Procedural Language Handlers 101 CREATE FUNCTION plmruby_call_handler() RETURNS language_handler AS

    ‘…’ LANGUAGE C IMMUTABLE STRICT; CREATE TRUSTED LANGUAGE plmruby HANDLER plmruby_call_handler;
  18. 31.

    Introducing “plmruby” (a version buggy^Wearly of it) λ CREATE EXTENSION

    plmruby; λ CREATE FUNCTION hello() RETURNS text AS $$ %w[friends! Ruby Hello].sort.join(' ') $$ LANGUAGE plmruby; λ SELECT hello(); hello ------------- Hello Ruby friends! (1 row)
  19. 34.

    Index Access Methods Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … Index CREATE  INDEX  … USING  [btree |  gist  |  …  ]
  20. 35.

    IAM’s API IndexBuildResult * ambuild (Relation heapRelation, Relation indexRelation, IndexInfo

    *indexInf bool aminsert (Relation indexRelation, Datum *values, bool *isnull, ItemPointer heap_tid, Rela void amcostestimate (PlannerInfo *root, IndexPath *path, double loop_count, Cost *indexStartup IndexBulkDeleteResult * ambulkdelete (IndexVacuumInfo *info, IndexBulkDeleteResult *stats, Ind IndexBulkDeleteResult * amvacuumcleanup (IndexVacuumInfo *info, IndexBulkDeleteResult *stats); void ambuildempty (Relation indexRelation); bool amcanreturn (Relation indexRelation); bytea * amoptions (ArrayType *reloptions, bool validate); IndexScanDesc ambeginscan (Relation indexRelation, int nkeys, int norderbys); void amrescan (IndexScanDesc scan, ScanKey keys, int nkeys, ScanKey orderbys, int norderbys); boolean amgettuple (IndexScanDesc scan, ScanDirection direction); int64 amgetbitmap (IndexScanDesc scan, TIDBitmap *tbm); void amendscan (IndexScanDesc scan);
  21. 37.

    Watch it in action # EXPLAIN ANALYZE VERBOSE SELECT *

    FROM my_table WHERE random(my_table) ==> 'foo'; Index Scan using random_index_on_all_cols on public.my_table (cost information) Output: str Index Cond: (random(my_table.*) ==> 'foo'::text)
  22. 38.

    Interesting example: zombodb Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … Index CREATE  INDEX  … USING  zombodb
  23. 39.

    Interesting example: zombodb Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … CREATE  INDEX  … USING  zombodb
  24. 40.

    Interesting example: zombodb 1/2 CREATE INDEX idx_zdb_products ON products USING

    zombodb (zdb('products', products.ctid), zdb(products)) WITH (url='http://some_elastic_search_node:9200/’, shards=5, replicas=1);
  25. 41.

    Interesting example: zombodb 2/2 SELECT * FROM products WHERE zdb('products',

    ctid) ==> 'keywords:(sports,box) or long_description:(wooden w/5 away) and price < 100000';
  26. 44.

    What is a FDW? •A C program •An implementation of

    the FDW API • Defines “foreign servers” • Defines “foreign tables”
  27. 46.

    What are the features supported by a “foreign table”? •Supports

    reads through SELECTs •Supports writes through INSERTs •Supports predicate pushdown (WHERE) •Supports EXPLAIN [ANALYZE]
  28. 47.

    Lots of existing FDWs •Relational DBMS (PostgreSQL, Oracle, SQL Server,

    …) •NoSQL (Hadoop, Cassandra, Redis, …) •Files (Flat file, CSV, XML, JSON) •And many more! (MongoDB seems to like it, a lot!)
  29. 49.

    Datum my_fdw_handler(PG_FUNCTION_ARGS) { FdwRoutine *routine = makeNode(FdwRoutine); # Before reading

    the table routine->BeginForeignScan = …; # Read next “row” routine->IterateForeignScan = …; # After reading all “rows” routine->EndForeignScan = …;
  30. 50.

    CREATE FUNCTION my_fdw_handler() RETURNS fdw_handler AS ‘MODULE_PATHNAME’ LANGUAGE C STRICT;

    CREATE FUNCTION my_fdw_validator(text[], oid) RETURNS void AS ‘MODULE_PATHNAME’ LANGUAGE C STRICT; CREATE FOREIGN DATA WRAPPER my_fdw HANDLER my_fdw_handler VALIDATOR my_fdw_validator;
  31. 52.

    “ ” We need a general purpose Foreign Data Wrapper

    that supports Ruby Me, not thinking this would ruin my life
  32. 54.

    IterateForeignScan IterateForeignScan “Spin up a new VM” “Shutdown the VM”

    “Iterate over… something” GetForeignRelSize GetForeignPaths GetForeignPlan Begin  Foreign  Scan IterateForeignScan EndForeignScan Foreign Data Wrapper Callback routines life-cycle
  33. 55.

    class Producer def initialize(env = {}) end def each @enum

    ||= Enumerator.new do |y| 10.times do |t| y.yield [ Time.now ] end end @enum.next end self end Basic data “producer” ¯\_(ツ)_/¯
  34. 56.
  35. 57.
  36. 58.
  37. 59.

    “ ” [You should use mruby instead of CRuby.] The

    idea would be that it's more “safe” for pg. Terence Lee
  38. 60.

    Embedding CRuby • Challenging • Packaging gems • Parallel executions

    of the CRuby-powered FDW has serious impacts on PG • Required some tricks (forks) • Only one VM per process (MVM in Ruby 3.0? 4.0?) • Only one chance to boot a VM per process [bug #11423]
  39. 61.

    Embedding mruby • スパーかわいい • Super fast VM startup •

    Most of the CRuby language is implemented (at least the useful parts) • No need for multi-process/multi-thread manual tuning • Easy setup (very few dependencies) • Downsides • Small amount of gems: lots to rewrite/port
  40. 62.

    Embedding mruby 101 mrb_state *mrb = mrb_open(); struct mrb_parser_state *ps

    = mrb_parse_string(mrb, “…”, NULL); struct RProc *code = mrb_generate_code(mrb, ps); mrb_value result = mrb_toplevel_run(mrb, code); mrb_close(mrb);
  41. 65.

    Holycorn Redis FDW 1/2 λ redis-cli 127.0.0.1:6379> select 0 OK

    127.0.0.1:6379> keys * (empty list or set) 127.0.0.1:6379> set foo 1 OK … 127.0.0.1:6379> keys * 1) "bar" 2) "foo" 3) "baz"
  42. 66.

    Holycorn Redis FDW 2/2 SELECT * from redis_table; key |

    value -----+------- bar | 2 foo | 1 baz | 3 (3 rows)