Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby and PostgresQL, a love story

Ruby and PostgresQL, a love story

#RubyKaigi 2015

Franck Verrot

December 14, 2015
Tweet

More Decks by Franck Verrot

Other Decks in Programming

Transcript

  1. Agenda I <3 SQL I <3 Ruby Just enough of

    PostgreSQL’s APIs Procedural Languages Index Access Methods Foreign Data Wrappers
  2. I <3 PostgreSQL •Happy user since 2009 •Fully-featured Relational DBMS

    •Clean and large codebase •Super active and friendly community
  3. I <3 Ruby • Happy user since 2006 • Designing

    & Using DSLs in Ruby is efficient • ETL process (“Extract, Transform & Load”) (牙 Kiba) • Workflows, StateMachines (state_machines) • Fake Data (factory_girl) • Ruby on Rails (rails) • Lots of great gems (rubygems)
  4. Generating fake data: in SQL 1. Generate the data using

    a custom script / external tools 2. On-the-fly using functions
  5. De-identifying / Anonymizing: in Ruby 1. Ahead Of Time: duplicates

    the “safe” content 2. Just In Time: protected data is already in memory
  6. De-identifying / Anonymizing: in SQL • Custom views and functions:

    CREATE VIEW safe_table AS SELECT protect(name) as name, … FROM my_table (views can even be materialized for faster data retrieval)
  7. “ ” I just wish I could inject some existing

    gems in PostgreSQL Me, January 2015
  8. “ ” I KNOW, I COULD “JUST” USE RUBY AS

    A PROCEDURAL LANGUAGE!!!1!1! Still me, a few minutes after
  9. Procedural Language Handlers 101 CREATE FUNCTION plmruby_call_handler() RETURNS language_handler AS

    ‘…’ LANGUAGE C IMMUTABLE STRICT; CREATE TRUSTED LANGUAGE plmruby HANDLER plmruby_call_handler;
  10. Introducing “plmruby” (a version buggy^Wearly of it) λ CREATE EXTENSION

    plmruby; λ CREATE FUNCTION hello() RETURNS text AS $$ %w[friends! Ruby Hello].sort.join(' ') $$ LANGUAGE plmruby; λ SELECT hello(); hello ------------- Hello Ruby friends! (1 row)
  11. Index Access Methods Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … Index CREATE  INDEX  … USING  [btree |  gist  |  …  ]
  12. IAM’s API IndexBuildResult * ambuild (Relation heapRelation, Relation indexRelation, IndexInfo

    *indexInf bool aminsert (Relation indexRelation, Datum *values, bool *isnull, ItemPointer heap_tid, Rela void amcostestimate (PlannerInfo *root, IndexPath *path, double loop_count, Cost *indexStartup IndexBulkDeleteResult * ambulkdelete (IndexVacuumInfo *info, IndexBulkDeleteResult *stats, Ind IndexBulkDeleteResult * amvacuumcleanup (IndexVacuumInfo *info, IndexBulkDeleteResult *stats); void ambuildempty (Relation indexRelation); bool amcanreturn (Relation indexRelation); bytea * amoptions (ArrayType *reloptions, bool validate); IndexScanDesc ambeginscan (Relation indexRelation, int nkeys, int norderbys); void amrescan (IndexScanDesc scan, ScanKey keys, int nkeys, ScanKey orderbys, int norderbys); boolean amgettuple (IndexScanDesc scan, ScanDirection direction); int64 amgetbitmap (IndexScanDesc scan, TIDBitmap *tbm); void amendscan (IndexScanDesc scan);
  13. Watch it in action # EXPLAIN ANALYZE VERBOSE SELECT *

    FROM my_table WHERE random(my_table) ==> 'foo'; Index Scan using random_index_on_all_cols on public.my_table (cost information) Output: str Index Cond: (random(my_table.*) ==> 'foo'::text)
  14. Interesting example: zombodb Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … Index CREATE  INDEX  … USING  zombodb
  15. Interesting example: zombodb Table Row  1  (TID  X) Row  2

     (TID  Y) Row  3  (TID  Z) … CREATE  INDEX  … USING  zombodb
  16. Interesting example: zombodb 1/2 CREATE INDEX idx_zdb_products ON products USING

    zombodb (zdb('products', products.ctid), zdb(products)) WITH (url='http://some_elastic_search_node:9200/’, shards=5, replicas=1);
  17. Interesting example: zombodb 2/2 SELECT * FROM products WHERE zdb('products',

    ctid) ==> 'keywords:(sports,box) or long_description:(wooden w/5 away) and price < 100000';
  18. What is a FDW? •A C program •An implementation of

    the FDW API • Defines “foreign servers” • Defines “foreign tables”
  19. What are the features supported by a “foreign table”? •Supports

    reads through SELECTs •Supports writes through INSERTs •Supports predicate pushdown (WHERE) •Supports EXPLAIN [ANALYZE]
  20. Lots of existing FDWs •Relational DBMS (PostgreSQL, Oracle, SQL Server,

    …) •NoSQL (Hadoop, Cassandra, Redis, …) •Files (Flat file, CSV, XML, JSON) •And many more! (MongoDB seems to like it, a lot!)
  21. Datum my_fdw_handler(PG_FUNCTION_ARGS) { FdwRoutine *routine = makeNode(FdwRoutine); # Before reading

    the table routine->BeginForeignScan = …; # Read next “row” routine->IterateForeignScan = …; # After reading all “rows” routine->EndForeignScan = …;
  22. CREATE FUNCTION my_fdw_handler() RETURNS fdw_handler AS ‘MODULE_PATHNAME’ LANGUAGE C STRICT;

    CREATE FUNCTION my_fdw_validator(text[], oid) RETURNS void AS ‘MODULE_PATHNAME’ LANGUAGE C STRICT; CREATE FOREIGN DATA WRAPPER my_fdw HANDLER my_fdw_handler VALIDATOR my_fdw_validator;
  23. “ ” We need a general purpose Foreign Data Wrapper

    that supports Ruby Me, not thinking this would ruin my life
  24. IterateForeignScan IterateForeignScan “Spin up a new VM” “Shutdown the VM”

    “Iterate over… something” GetForeignRelSize GetForeignPaths GetForeignPlan Begin  Foreign  Scan IterateForeignScan EndForeignScan Foreign Data Wrapper Callback routines life-cycle
  25. class Producer def initialize(env = {}) end def each @enum

    ||= Enumerator.new do |y| 10.times do |t| y.yield [ Time.now ] end end @enum.next end self end Basic data “producer” ¯\_(ツ)_/¯
  26. “ ” [You should use mruby instead of CRuby.] The

    idea would be that it's more “safe” for pg. Terence Lee
  27. Embedding CRuby • Challenging • Packaging gems • Parallel executions

    of the CRuby-powered FDW has serious impacts on PG • Required some tricks (forks) • Only one VM per process (MVM in Ruby 3.0? 4.0?) • Only one chance to boot a VM per process [bug #11423]
  28. Embedding mruby • スパーかわいい • Super fast VM startup •

    Most of the CRuby language is implemented (at least the useful parts) • No need for multi-process/multi-thread manual tuning • Easy setup (very few dependencies) • Downsides • Small amount of gems: lots to rewrite/port
  29. Embedding mruby 101 mrb_state *mrb = mrb_open(); struct mrb_parser_state *ps

    = mrb_parse_string(mrb, “…”, NULL); struct RProc *code = mrb_generate_code(mrb, ps); mrb_value result = mrb_toplevel_run(mrb, code); mrb_close(mrb);
  30. Holycorn Redis FDW 1/2 λ redis-cli 127.0.0.1:6379> select 0 OK

    127.0.0.1:6379> keys * (empty list or set) 127.0.0.1:6379> set foo 1 OK … 127.0.0.1:6379> keys * 1) "bar" 2) "foo" 3) "baz"
  31. Holycorn Redis FDW 2/2 SELECT * from redis_table; key |

    value -----+------- bar | 2 foo | 1 baz | 3 (3 rows)