Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Data Dance (PGConfSV)

Making Data Dance (PGConfSV)

Your web framework and ORM are really handy tools to get work done, and do it quickly. They aren't a silver bullet, though. Sometimes the database is the best place to run a complicated query. Especially when you have big transactional or time-series data. Postgres has some powerful functionality that can help transform data from a giant pile of stuff into something actionable and useful. I will show how to take advantage of the JSON datatype, Common Table Expression, and materialized views to build impressive data visualizations.

3a1f18e5445bd821f290ed68c1f2d925?s=128

Barrett Clark

November 16, 2016
Tweet

Transcript

  1. MAKING DATA DANCE PGConf Silicon Valley @barrettclark

  2. HARNESS HERETICAL ANTIPATTERNS FOR FUN AND PROFIT PGConf Silicon Valley

    @barrettclark
  3. BARRETT CLARK @barrettclark PGConf Silicon Valley @barrettclark

  4. PGConf Silicon Valley @barrettclark

  5. Class (Model) Object Relational Mapper Database M agic PGConf Silicon

    Valley @barrettclark
  6. DATABASE AGNOSTIC PGConf Silicon Valley @barrettclark

  7. ActiveRecord Object Relational Mapper PGConf Silicon Valley @barrettclark

  8. ActiveRecord PGConf Silicon Valley @barrettclark

  9. RAW SQL PGConf Silicon Valley @barrettclark

  10. COMPLICATED JOINS PGConf Silicon Valley @barrettclark

  11. Category.joins(articles: [{ comments: :guest }, :tags]) http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations ??? PGConf Silicon

    Valley @barrettclark
  12. SELECT categories.* FROM categories INNER JOIN articles ON articles.category_id =

    categories.id INNER JOIN comments ON comments.article_id = articles.id INNER JOIN guests ON guests.comment_id = comments.id INNER JOIN tags ON tags.article_id = articles.id http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations Category.joins(articles: [{ comments: :guest }, :tags]) PGConf Silicon Valley @barrettclark
  13. DEEPER DATABASE FUNCTIONALITY PGConf Silicon Valley @barrettclark

  14. BUT DATABASE PORTABILITY?! PGConf Silicon Valley @barrettclark

  15. https://flic.kr/p/eg4hih PGConf Silicon Valley @barrettclark

  16. It just doesn't work like that. Database portability pipe dream

    PGConf Silicon Valley @barrettclark
  17. Different databases are different. PGConf Silicon Valley @barrettclark

  18. You chose your database for a reason. Use it! PGConf

    Silicon Valley @barrettclark
  19. CAVEATS • Long-running queries • Connection pool PGConf Silicon Valley

    @barrettclark
  20. LET'S GET OUR HANDS DIRTY PGConf Silicon Valley @barrettclark

  21. PGConf Silicon Valley @barrettclark

  22. WINDOW FUNCTIONS PGConf Silicon Valley @barrettclark

  23. –Postgres Documentation A window function performs a calculation across a

    set of table rows that are somehow related to the current row. PGConf Silicon Valley @barrettclark
  24. –Barrett Clark Fold data from other rows into this row

    PGConf Silicon Valley @barrettclark
  25. WINDOW FUNCTION GREATEST HITS • lead() • lag() • first_value()

    • last_value() • row_number() PGConf Silicon Valley @barrettclark
  26. id fruit 1 apple 2 apple 3 apple 4 pear

    5 apple 6 pear 7 pear 8 pear 9 pear 10 banana PGConf Silicon Valley @barrettclark
  27. SELECT *,LEAD(id) OVER(), LAG(id) OVER() FROM fruits; id | fruit

    | lead | lag ----+--------+------+----- 1 | apple | 2 | 2 | apple | 3 | 1 3 | apple | 4 | 2 4 | pear | 5 | 3 5 | apple | 6 | 4 6 | pear | 7 | 5 7 | pear | 8 | 6 8 | pear | 9 | 7 9 | pear | 10 | 8 10 | banana | | 9 (10 rows) http://sqlfiddle.com/#!15/230c3b/2 PGConf Silicon Valley @barrettclark
  28. –Postgres Documentation A window function performs a calculation across a

    set of table rows that are somehow related to the current row. PGConf Silicon Valley @barrettclark
  29. https://flic.kr/p/7KGqEY PGConf Silicon Valley @barrettclark

  30. SELECT id, fruit, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id)

    FROM fruits; id | fruit | first_value ----+--------+------------- 1 | apple | 1 2 | apple | 1 3 | apple | 1 5 | apple | 1 10 | banana | 10 4 | pear | 4 6 | pear | 4 7 | pear | 4 8 | pear | 4 9 | pear | 4 (10 rows) http://sqlfiddle.com/#!15/230c3b PGConf Silicon Valley @barrettclark
  31. SELECT *, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id), ROW_NUMBER()

    OVER(PARTITION BY fruit ORDER BY id) FROM fruits; id | fruit | first_value | row_number ----+--------+-------------+------------ 1 | apple | 1 | 1 2 | apple | 1 | 2 3 | apple | 1 | 3 5 | apple | 1 | 4 10 | banana | 10 | 1 4 | pear | 4 | 1 6 | pear | 4 | 2 7 | pear | 4 | 3 8 | pear | 4 | 4 9 | pear | 4 | 5 (10 rows) http://sqlfiddle.com/#!15/230c3b PGConf Silicon Valley @barrettclark
  32. PRACTICAL EXAMPLE Chat App PGConf Silicon Valley @barrettclark

  33. SELECT id, message, room_id, user_id FROM messages ORDER BY id;

    id | message | room_id | user_id ----+-----------------------------+---------+--------- 1 | Leaving for the airport now | 1 | 1337 2 | Flight just landed! | 1 | 1337 3 | waiting on an uber | 2 | 1337 4 | omw y'all! | 1 | 1337 9 | hello, world | 1 | 27 (5 rows) PGConf Silicon Valley @barrettclark
  34. SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id

    ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; Look back Defines the groups of rows to evaluate against the current row PGConf Silicon Valley @barrettclark
  35. id | message | room_id | user_id | prev_user_id ----+-----------------------------+---------+---------+--------------

    1 | Leaving for the airport now | 1 | 1337 | 2 | Flight just landed! | 1 | 1337 | 1337 4 | omw y'all! | 1 | 1337 | 1337 9 | hello, world | 1 | 27 | 1337 3 | waiting on an uber | 2 | 1337 | (5 rows) SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; PGConf Silicon Valley @barrettclark
  36. SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id

    ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; ProTip™ 3 1 2 4 5 PGConf Silicon Valley @barrettclark
  37. class Message < ActiveRecord::Base def self.window_example sql = <<-SQL SELECT

    id, message, room_id, user_id, LAG(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS prev_user_id FROM messages ORDER BY 3, 1; SQL connection.execute(sql) end end PGConf Silicon Valley @barrettclark
  38. TOO ANTI-PATTERN FOR YOU? PGConf Silicon Valley @barrettclark

  39. module PGConnection def conn config = YAML.load_file(File.open('config/database.yml'))['development'] @conn ||= PG.connect(

    :dbname => config['database'], :user => config['username'], :password => config['password'], :host => config['host'] || 'localhost' ) end end class Antipattern extend PGConnection def self.window_example sql = <<-SQL SELECT id, message, room_id, user_id, LAG(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS prev_user_id FROM messages ORDER BY 3, 1; SQL conn.exec(sql).values end end PGConf Silicon Valley @barrettclark
  40. MODELS DO NOT HAVE TO BE BACKED BY ACTIVERECORD Or

    even persisted in the database. PGConf Silicon Valley @barrettclark
  41. QUESTIONS WE CAN NOW ANSWER • When did something change?

    • When did someone leave a place? • How long did each thing last? • How long did people stay in which places? • Enter / Exit / Change events PGConf Silicon Valley @barrettclark
  42. https://flic.kr/p/5b8cq3 PGConf Silicon Valley @barrettclark

  43. SUBQUERIES PGConf Silicon Valley @barrettclark

  44. SUBQUERY • To filter and group • For a field

    value • In a join PGConf Silicon Valley @barrettclark
  45. BASE QUERY PGConf Silicon Valley @barrettclark

  46. SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION

    BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5; “For a given phone, pull in the next minor field” PGConf Silicon Valley @barrettclark
  47. WHEN DID SOMETHING CHANGE? PGConf Silicon Valley @barrettclark

  48. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; PGConf Silicon Valley @barrettclark
  49. YO DAWG, I HEARD YOU LIKE QUERIES PGConf Silicon Valley

    @barrettclark
  50. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; PGConf Silicon Valley @barrettclark
  51. major | minor | next_minor | count -------+-------+------------+------- 1 |

    0 | 1 | 18 1 | 0 | 2 | 3 1 | 0 | 3 | 23 1 | 0 | 4 | 7 1 | 1 | 0 | 19 1 | 1 | 2 | 59 1 | 1 | 3 | 158 1 | 1 | 4 | 44 1 | 2 | 0 | 5 1 | 2 | 1 | 59 1 | 2 | 3 | 85 1 | 2 | 4 | 40 1 | 3 | 0 | 21 1 | 3 | 1 | 154 1 | 3 | 2 | 85 1 | 3 | 4 | 52 1 | 4 | 0 | 6 1 | 4 | 1 | 49 1 | 4 | 2 | 42 1 | 4 | 3 | 45 (20 rows) PGConf Silicon Valley @barrettclark
  52. PGConf Silicon Valley @barrettclark

  53. https://flic.kr/p/9tT9Yh PGConf Silicon Valley @barrettclark

  54. REFACTOR PGConf Silicon Valley @barrettclark

  55. CTE Common Table Expression PGConf Silicon Valley @barrettclark

  56. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; PGConf Silicon Valley @barrettclark
  57. WITH beacon_readings_lead AS ( SELECT id, phone, major, minor, reading_timestamp,

    LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; PGConf Silicon Valley @barrettclark
  58. WITH interval_query AS ( SELECT (ts ||' hour')::INTERVAL AS hour_interval

    FROM generate_series(0,23) AS ts ), time_series AS ( SELECT DATE_TRUNC('hour', NOW()) + INTERVAL '60 min' * ROUND(DATE_PART('minute', NOW()) / 60.0) - interval_query.hour_interval AS start_time FROM interval_query ), time_intervals AS ( SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time FROM time_series ORDER BY start_time ), reading_counts AS ( SELECT f.start_time, f.end_time, br.minor, COUNT(DISTINCT br.phone) readings FROM beacon_readings br RIGHT JOIN time_intervals f ON br.reading_timestamp >= f.start_time AND br.reading_timestamp < f.end_time AND br.major = 1 GROUP BY f.start_time, f.end_time, br.minor ORDER BY f.start_time, br.minor ) SELECT * FROM reading_counts; PGConf Silicon Valley @barrettclark
  59. PGConf Silicon Valley @barrettclark

  60. Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition

    Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition PGConf Silicon Valley @barrettclark
  61. VIEW Stored Query PGConf Silicon Valley @barrettclark

  62. CREATE OR REPLACE VIEW beacon_readings_lead_view AS SELECT id, phone, major,

    minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ; PGConf Silicon Valley @barrettclark
  63. class CreateLeadView < ActiveRecord::Migration def up execute <<-SQL.strip_heredoc CREATE OR

    REPLACE VIEW beacon_readings_lead_view AS SELECT ID, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY phone, reading_timestamp; SQL end def down sql = "DROP VIEW IF EXISTS beacon_readings_lead_view;" execute(sql) end end PGConf Silicon Valley @barrettclark
  64. SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead_view WHERE major =

    1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; PGConf Silicon Valley @barrettclark
  65. BUSINESS LOGIC IN THE DATABASE?!? PGConf Silicon Valley @barrettclark

  66. YOU Get to decide where your code runs PGConf Silicon

    Valley @barrettclark
  67. https://flic.kr/p/2MW7SY PGConf Silicon Valley @barrettclark

  68. MY QUERY IS TOO SLOW PGConf Silicon Valley @barrettclark

  69. DO THE HEAVY LIFTING IN A SEPARATE PROCESS PGConf Silicon

    Valley @barrettclark
  70. MATERIALIZED VIEW Stored query AND data PGConf Silicon Valley @barrettclark

  71. CREATE MATERIALIZED VIEW beacon_readings_lead_lag_mv AS SELECT * FROM beacon_readings_lead_lag_view WHERE

    reading_timestamp >= NOW() - '1 day'::INTERVAL; CREATE INDEX index_beacon_readings_mv_on_major ON beacon_readings_lead_lag_mv USING btree( major ASC NULLS LAST ); CREATE INDEX index_beacon_readings_mv_on_phone ON beacon_readings_lead_lag_mv USING btree( phone COLLATE "default" ASC NULLS LAST ); CREATE INDEX index_beacon_readings_mv_on_reading_timestamp ON beacon_readings_lead_lag_mv USING btree( reading_timestamp ASC NULLS LAST ); PGConf Silicon Valley @barrettclark
  72. DATA https://flic.kr/p/khS6jn PGConf Silicon Valley @barrettclark

  73. REFRESH MATERIALIZED VIEW CONCURRENTLY beacon_readings_lead_lag_mv; Available starting in 9.4 PGConf

    Silicon Valley @barrettclark
  74. namespace :db do namespace :heroku do # https://devcenter.heroku.com/articles/scheduler desc 'Update

    the materialized view(s)' task :update_materialized_view => :environment do sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;' ActiveRecord::Base.connection.execute(sql) end end end PGConf Silicon Valley @barrettclark
  75. https://flic.kr/p/bVUL7 PGConf Silicon Valley @barrettclark

  76. SUBQUERY FOR A FIELD VALUE PGConf Silicon Valley @barrettclark

  77. SELECT id, phone, ( SELECT MIN(id) FROM beacon_readings_lead_lag_mv WHERE major

    = br1.major AND minor = 0 AND phone = br1.phone AND id >= br1.id ) AS session_close_id, minor, next_beacon_minor, reading_timestamp FROM beacon_readings_lead_lag_mv br1 WHERE major = 2 AND minor != next_beacon_minor ORDER BY phone, reading_timestamp; PGConf Silicon Valley @barrettclark
  78. SELECT id, phone, ( SELECT MIN(id) FROM beacon_readings_lead_lag_mv WHERE major

    = br1.major AND minor = 0 AND phone = br1.phone AND id >= br1.id ) AS session_close_id, minor, next_beacon_minor, reading_timestamp FROM beacon_readings_lead_lag_mv br1 WHERE major = 2 AND minor != next_beacon_minor ORDER BY phone, reading_timestamp; PGConf Silicon Valley @barrettclark
  79. PGConf Silicon Valley @barrettclark

  80. OTHER USEFUL THINGS PGConf Silicon Valley @barrettclark

  81. POSTGRES • Install from source • Install via package manager

    • Heroku's Postgres.app • Postgres 9.6.1 • PostGIS 2.3.0 PGConf Silicon Valley @barrettclark
  82. POSTGRES • Database GUI tools • pgAdmin3/pgAdmin4 (free) • Navicat

    (free trials) PGConf Silicon Valley @barrettclark
  83. POSTGRES • Visual query plan • http://explain.depesz.com • http://tatiyants.com/pev PGConf

    Silicon Valley @barrettclark
  84. http://explain.depesz.com PGConf Silicon Valley @barrettclark

  85. http://tatiyants.com/pev PGConf Silicon Valley @barrettclark

  86. CAUTION!!! Different environments are different PGConf Silicon Valley @barrettclark

  87. DATATYPES • Array • DateRange and TSRange • JSON, JSONB

    (9.4) • UUID http://edgeguides.rubyonrails.org/active_record_postgresql.html PGConf Silicon Valley @barrettclark
  88. UPSERT!!! New in Postgres 9.5 PGConf Silicon Valley @barrettclark

  89. PARALLEL SEQUENTIAL SCAN, JOINS, AND AGGREGATES New in Postgres 9.6

    PGConf Silicon Valley @barrettclark
  90. FUTURE YOU WILL THANK YOU PGConf Silicon Valley @barrettclark

  91. RECAP PGConf Silicon Valley @barrettclark

  92. IT'S OK TO WRITE SQL PGConf Silicon Valley @barrettclark

  93. REFACTOR • Common Table Expression (CTE) • View • Materialized

    View PGConf Silicon Valley @barrettclark
  94. I'M NOT SAYING DO EVERYTHING IN THE DATABASE PGConf Silicon

    Valley @barrettclark
  95. POSTGRESQL IS AWESOME PGConf Silicon Valley @barrettclark

  96. PGConf Silicon Valley @barrettclark

  97. THANK YOU! PGConf Silicon Valley @barrettclark