Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Herding Data with Postgres dblink() and Rails

Herding Data with Postgres dblink() and Rails

A lightning talk about writing a Rails extension that piggybacks on Postgres' dblink() to replicate subsets of tables to other Rails applications' databases.

The use case: you have data in your production database, and you need readonly versions of those tables in another read/write database in use by another application.

Vikram Oberoi

December 11, 2013
Tweet

Other Decks in Programming

Transcript

  1. Herding  Data  with  Postgres’   dblink()  and  Rails   Vikram

     Oberoi   Harry’s  –  h=p://www.harrys.com  
  2. What’s  this  talk  about?   A  technique  using  Postgres’  dblink()

     extension   and  Rails  to  replicate  data  for  use  in  other   applicaJons  
  3. The  General  Use  Case   There’s  data  in  your  producJon

      database!   …  and  you  need  to  replicate  tables   regularly  to  another  read/write  DB  for  a   separate  applicaJon  to  process       (…  and  you  use  Postgres)      
  4. A  Use  Case  @  Harry’s   There’s  data  in  Harry’s

     prod!   …  and  we  need  to  replicate  a  subset  of   tables  to  our  email  personalizaJon   plaTorm   …  because  we  need  to  process  it  to  know   where  in  his/her  lifecycle  a  customer  is  to  send   personalized,  non-­‐intrusive,  and  relevant  email      
  5. Another  Use  Case  @  Harry’s   There’s  data  in  Harry’s

     prod!   …  and  we  need  to  replicate  a  subset  of   tables  to  our  data  warehouse   …  so  we  can  consolidate  data  from  a  plethora   of  sources  and  process  it  to  gain  insight   around:  shipping  and  distribuJon  efficacy,  our   auto-­‐refill  program,  revenue,  accounJng,  etc.      
  6. Usage   order.rb, another app! class Order! # Production website-specific

    stuff! end! order.rb, production website! class Order! backed_by_dblink_view :orders, ! :database => :www! ! readonly!! ! # Other-app specific stuff! end!
  7. Usage   order.rb, another app! class Order! # Production website-specific

    stuff! end! order.rb, production website! class Order! backed_by_dblink_view :orders, ! :database => :www! ! readonly!! ! # Other-app specific stuff! end! These  records  will  be   local,  I  promise!  
  8. What  is  dblink()?   It’s  a  plane!   It’s  a

     bird!   It’s  a  Postgres  extension!  
  9. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  10. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)! Yo  dawg!  I  heard  you   like  databases.                   So  I  put  a  database  in   your  database.  
  11. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)! Yo  dawg!  I  heard  you   like  databases.                   So  I  put  a  database  in   your  database.     (not  really,  but  hey)  
  12. Create  views  to  other  DB’s  tables   SELECT * FROM

    dblink(! ’host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  13. Create  views  to  other  DB’s  tables   CREATE VIEW myawesomeview

    AS (! ! SELECT * FROM dblink(! ‘host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesomeview(a int, b text, c text[])! ! );! !
  14. Ta  da!   SELECT * FROM myawesomeview;! ! a |

    b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  15. More  performant  ta  da!   SELECT * FROM myawesometable;! !

    a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  16. Refresh  those  views   DROP VIEW IF EXISTS myawesomeview;! DROP

    TABLE IF EXISTS myawesometable;! ! CREATE VIEW myawesomeview AS (! SELECT * FROM dblink(! ‘host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesomeview(a int, b text, c text[])! );! ! CREATE TABLE myawesometable AS (! SELECT * FROM myawesomeview! );! ! ! ! ! ! !
  17. Less  stale  ta  da!   SELECT * FROM myawesometable;! !

    a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  18. Script  and  automate  this  process   …  wrap  it  in

     a  Rails  extension   …  and  you  got  a  stew  goin’!  
  19. Script  and  automate  this  process   Take  a  look  at

     Mark  Gibson’s  post  on  the   Postgres  mailing  list:     h=p://www.postgresql.org/message-­‐id/ [email protected]     OR     Google  ‘postgres  create  remote  view’    
  20. Do  not…   •  Use  this  to  create  read  replicas

      – Heroku:  create  a  Postgres  follower   – Otherwise:  streaming  replicaJon,  PGPool,  Slony,   whatever   •  Write  to  dblink-­‐backed  tables   •  Use  this  when  you  have  obscene  amounts  of   data