Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Herding Data with Postgres dblink() and Rails

Herding Data with Postgres dblink() and Rails

A lightning talk about writing a Rails extension that piggybacks on Postgres' dblink() to replicate subsets of tables to other Rails applications' databases.

The use case: you have data in your production database, and you need readonly versions of those tables in another read/write database in use by another application.

Avatar for Vikram Oberoi

Vikram Oberoi

December 11, 2013
Tweet

Other Decks in Programming

Transcript

  1. Herding  Data  with  Postgres’   dblink()  and  Rails   Vikram

     Oberoi   Harry’s  –  h=p://www.harrys.com  
  2. What’s  this  talk  about?   A  technique  using  Postgres’  dblink()

     extension   and  Rails  to  replicate  data  for  use  in  other   applicaJons  
  3. The  General  Use  Case   There’s  data  in  your  producJon

      database!   …  and  you  need  to  replicate  tables   regularly  to  another  read/write  DB  for  a   separate  applicaJon  to  process       (…  and  you  use  Postgres)      
  4. A  Use  Case  @  Harry’s   There’s  data  in  Harry’s

     prod!   …  and  we  need  to  replicate  a  subset  of   tables  to  our  email  personalizaJon   plaTorm   …  because  we  need  to  process  it  to  know   where  in  his/her  lifecycle  a  customer  is  to  send   personalized,  non-­‐intrusive,  and  relevant  email      
  5. Another  Use  Case  @  Harry’s   There’s  data  in  Harry’s

     prod!   …  and  we  need  to  replicate  a  subset  of   tables  to  our  data  warehouse   …  so  we  can  consolidate  data  from  a  plethora   of  sources  and  process  it  to  gain  insight   around:  shipping  and  distribuJon  efficacy,  our   auto-­‐refill  program,  revenue,  accounJng,  etc.      
  6. Usage   order.rb, another app! class Order! # Production website-specific

    stuff! end! order.rb, production website! class Order! backed_by_dblink_view :orders, ! :database => :www! ! readonly!! ! # Other-app specific stuff! end!
  7. Usage   order.rb, another app! class Order! # Production website-specific

    stuff! end! order.rb, production website! class Order! backed_by_dblink_view :orders, ! :database => :www! ! readonly!! ! # Other-app specific stuff! end! These  records  will  be   local,  I  promise!  
  8. What  is  dblink()?   It’s  a  plane!   It’s  a

     bird!   It’s  a  Postgres  extension!  
  9. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  10. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)! Yo  dawg!  I  heard  you   like  databases.                   So  I  put  a  database  in   your  database.  
  11. What  is  dblink()?   SELECT * FROM dblink(! ’host=productionhost dbname=productiondb',!

    'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)! Yo  dawg!  I  heard  you   like  databases.                   So  I  put  a  database  in   your  database.     (not  really,  but  hey)  
  12. Create  views  to  other  DB’s  tables   SELECT * FROM

    dblink(! ’host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesometable(a int, b text, c text[]);! ! a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  13. Create  views  to  other  DB’s  tables   CREATE VIEW myawesomeview

    AS (! ! SELECT * FROM dblink(! ‘host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesomeview(a int, b text, c text[])! ! );! !
  14. Ta  da!   SELECT * FROM myawesomeview;! ! a |

    b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  15. More  performant  ta  da!   SELECT * FROM myawesometable;! !

    a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  16. Refresh  those  views   DROP VIEW IF EXISTS myawesomeview;! DROP

    TABLE IF EXISTS myawesometable;! ! CREATE VIEW myawesomeview AS (! SELECT * FROM dblink(! ‘host=productionhost dbname=productiondb',! 'SELECT * FROM foo') ! AS myawesomeview(a int, b text, c text[])! );! ! CREATE TABLE myawesometable AS (! SELECT * FROM myawesomeview! );! ! ! ! ! ! !
  17. Less  stale  ta  da!   SELECT * FROM myawesometable;! !

    a | b | c ! ----+---+---------------! 0 | a | {a0,b0,c0}! 1 | b | {a1,b1,c1}! 2 | c | {a2,b2,c2}! 3 | d | {a3,b3,c3}! 4 | e | {a4,b4,c4}! 5 | f | {a5,b5,c5}! 6 | g | {a6,b6,c6}! 7 | h | {a7,b7,c7}! 8 | i | {a8,b8,c8}! 9 | j | {a9,b9,c9}! 10 | k | {a10,b10,c10}! (11 rows)!
  18. Script  and  automate  this  process   …  wrap  it  in

     a  Rails  extension   …  and  you  got  a  stew  goin’!  
  19. Script  and  automate  this  process   Take  a  look  at

     Mark  Gibson’s  post  on  the   Postgres  mailing  list:     h=p://www.postgresql.org/message-­‐id/ [email protected]     OR     Google  ‘postgres  create  remote  view’    
  20. Do  not…   •  Use  this  to  create  read  replicas

      – Heroku:  create  a  Postgres  follower   – Otherwise:  streaming  replicaJon,  PGPool,  Slony,   whatever   •  Write  to  dblink-­‐backed  tables   •  Use  this  when  you  have  obscene  amounts  of   data