Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Data Dance (Abstractions)

Making Data Dance (Abstractions)

Rails and the ActiveRecord gem are really handy tools to get work done, and do it quickly. They aren't a silver bullet, though. Sometimes the database is the best place to run a complicated query. Especially when you have big transactional or time-series data. Postgres has some powerful functionality that can help transform data from a giant pile of stuff into something actionable and useful. I will show how to take advantage of the JSON datatype, Common Table Expression, and materialized views to build impressive data visualizations.

Barrett Clark

August 18, 2016
Tweet

More Decks by Barrett Clark

Other Decks in Programming

Transcript

  1. SELECT categories.* FROM categories INNER JOIN articles ON articles.category_id =

    categories.id INNER JOIN comments ON comments.article_id = articles.id INNER JOIN guests ON guests.comment_id = comments.id INNER JOIN tags ON tags.article_id = articles.id http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations Abstractions @barrettclark Category.joins(articles: [{ comments: :guest }, :tags])
  2. –Postgres Documentation A window function performs a calculation across a

    set of table rows that are somehow related to the current row. Abstractions @barrettclark
  3. WINDOW FUNCTION GREATEST HITS • lead() • lag() • first_value()

    • last_value() • row_number() Abstractions @barrettclark
  4. id fruit 1 apple 2 apple 3 apple 4 pear

    5 apple 6 pear 7 pear 8 pear 9 pear 10 banana Abstractions @barrettclark
  5. SELECT *,LEAD(id) OVER(), LAG(id) OVER() FROM fruits; id | fruit

    | lead | lag ----+--------+------+----- 1 | apple | 2 | 2 | apple | 3 | 1 3 | apple | 4 | 2 4 | pear | 5 | 3 5 | apple | 6 | 4 6 | pear | 7 | 5 7 | pear | 8 | 6 8 | pear | 9 | 7 9 | pear | 10 | 8 10 | banana | | 9 (10 rows) http://sqlfiddle.com/#!15/230c3b/2 Abstractions @barrettclark
  6. –Postgres Documentation A window function performs a calculation across a

    set of table rows that are somehow related to the current row. Abstractions @barrettclark
  7. SELECT id, fruit, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id)

    FROM fruits; id | fruit | first_value ----+--------+------------- 1 | apple | 1 2 | apple | 1 3 | apple | 1 5 | apple | 1 10 | banana | 10 4 | pear | 4 6 | pear | 4 7 | pear | 4 8 | pear | 4 9 | pear | 4 (10 rows) http://sqlfiddle.com/#!15/230c3b Abstractions @barrettclark
  8. SELECT *, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id), ROW_NUMBER()

    OVER(PARTITION BY fruit ORDER BY id) FROM fruits; id | fruit | first_value | row_number ----+--------+-------------+------------ 1 | apple | 1 | 1 2 | apple | 1 | 2 3 | apple | 1 | 3 5 | apple | 1 | 4 10 | banana | 10 | 1 4 | pear | 4 | 1 6 | pear | 4 | 2 7 | pear | 4 | 3 8 | pear | 4 | 4 9 | pear | 4 | 5 (10 rows) http://sqlfiddle.com/#!15/230c3b Abstractions @barrettclark
  9. SELECT id, message, room_id, user_id FROM messages ORDER BY id;

    id | message | room_id | user_id ----+-----------------------------+---------+--------- 1 | Leaving for the airport now | 1 | 1337 2 | Flight just landed! | 1 | 1337 3 | waiting on an uber | 2 | 1337 4 | omw y'all! | 1 | 1337 9 | hello, world | 1 | 27 (5 rows) Abstractions @barrettclark
  10. SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id

    ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; Look back Defines the groups of rows to evaluate against the current row Abstractions @barrettclark
  11. id | message | room_id | user_id | prev_user_id ----+-----------------------------+---------+---------+--------------

    1 | Leaving for the airport now | 1 | 1337 | 2 | Flight just landed! | 1 | 1337 | 1337 4 | omw y'all! | 1 | 1337 | 1337 9 | hello, world | 1 | 27 | 1337 3 | waiting on an uber | 2 | 1337 | (5 rows) SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; Abstractions @barrettclark
  12. SELECT id, message, room_id, user_id, LAG(user_id) OVER( PARTITION BY room_id

    ORDER BY room_id, id ) AS prev_user_id FROM messages ORDER BY 3, 1; ProTip™ 3 1 2 4 5 Abstractions @barrettclark
  13. class Message < ActiveRecord::Base def self.window_example sql = <<-SQL SELECT

    id, message, room_id, user_id, LAG(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS prev_user_id FROM messages ORDER BY 3, 1; SQL connection.execute(sql) end end Abstractions @barrettclark
  14. module PGConnection def conn config = YAML.load_file(File.open('config/database.yml'))['development'] @conn ||= PG.connect(

    :dbname => config['database'], :user => config['username'], :password => config['password'], :host => config['host'] || 'localhost' ) end end class Antipattern extend PGConnection def self.window_example sql = <<-SQL SELECT id, message, room_id, user_id, LAG(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS prev_user_id FROM messages ORDER BY 3, 1; SQL conn.exec(sql).values end end Abstractions @barrettclark
  15. MODELS DO NOT HAVE TO BE BACKED BY ACTIVERECORD Or

    even persisted in the database. Abstractions @barrettclark
  16. QUESTIONS WE CAN NOW ANSWER • When did something change?

    • When did someone leave a place? • How long did each thing last? • How long did people stay in which places? • Enter / Exit / Change events Abstractions @barrettclark
  17. SUBQUERY • To filter and group • For a field

    value • In a join Abstractions @barrettclark
  18. SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION

    BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5; “For a given phone, pull in the next minor field” Abstractions @barrettclark
  19. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; Abstractions @barrettclark
  20. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; Abstractions @barrettclark
  21. major | minor | next_minor | count -------+-------+------------+------- 1 |

    0 | 1 | 18 1 | 0 | 2 | 3 1 | 0 | 3 | 23 1 | 0 | 4 | 7 1 | 1 | 0 | 19 1 | 1 | 2 | 59 1 | 1 | 3 | 158 1 | 1 | 4 | 44 1 | 2 | 0 | 5 1 | 2 | 1 | 59 1 | 2 | 3 | 85 1 | 2 | 4 | 40 1 | 3 | 0 | 21 1 | 3 | 1 | 154 1 | 3 | 2 | 85 1 | 3 | 4 | 52 1 | 4 | 0 | 6 1 | 4 | 1 | 49 1 | 4 | 2 | 42 1 | 4 | 3 | 45 (20 rows) Abstractions @barrettclark
  22. SELECT major, minor, next_minor, COUNT(*) FROM ( SELECT id, phone,

    major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; Abstractions @barrettclark
  23. WITH beacon_readings_lead AS ( SELECT id, phone, major, minor, reading_timestamp,

    LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ) SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; Abstractions @barrettclark
  24. WITH interval_query AS ( SELECT (ts ||' hour')::INTERVAL AS hour_interval

    FROM generate_series(0,23) AS ts ), time_series AS ( SELECT DATE_TRUNC('hour', NOW()) + INTERVAL '60 min' * ROUND(DATE_PART('minute', NOW()) / 60.0) - interval_query.hour_interval AS start_time FROM interval_query ), time_intervals AS ( SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time FROM time_series ORDER BY start_time ), reading_counts AS ( SELECT f.start_time, f.end_time, br.minor, COUNT(DISTINCT br.phone) readings FROM beacon_readings br RIGHT JOIN time_intervals f ON br.reading_timestamp >= f.start_time AND br.reading_timestamp < f.end_time AND br.major = 1 GROUP BY f.start_time, f.end_time, br.minor ORDER BY f.start_time, br.minor ) SELECT * FROM reading_counts; Abstractions @barrettclark
  25. Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition

    Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Abstractions @barrettclark
  26. CREATE OR REPLACE VIEW beacon_readings_lead_view AS SELECT id, phone, major,

    minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5 ; Abstractions @barrettclark
  27. class CreateLeadView < ActiveRecord::Migration def up execute <<-SQL.strip_heredoc CREATE OR

    REPLACE VIEW beacon_readings_lead_view AS SELECT ID, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY phone, reading_timestamp; SQL end def down sql = "DROP VIEW IF EXISTS beacon_readings_lead_view;" execute(sql) end end Abstractions @barrettclark
  28. SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead_view WHERE major =

    1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor; Abstractions @barrettclark
  29. CREATE MATERIALIZED VIEW beacon_readings_lead_lag_mv AS SELECT * FROM beacon_readings_lead_lag_view WHERE

    reading_timestamp >= NOW() - '1 day'::INTERVAL; CREATE INDEX index_beacon_readings_mv_on_major ON beacon_readings_lead_lag_mv USING btree( major ASC NULLS LAST ); CREATE INDEX index_beacon_readings_mv_on_phone ON beacon_readings_lead_lag_mv USING btree( phone COLLATE "default" ASC NULLS LAST ); CREATE INDEX index_beacon_readings_mv_on_reading_timestamp ON beacon_readings_lead_lag_mv USING btree( reading_timestamp ASC NULLS LAST ); Abstractions @barrettclark
  30. namespace :db do namespace :heroku do # https://devcenter.heroku.com/articles/scheduler desc 'Update

    the materialized view(s)' task :update_materialized_view => :environment do sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;' ActiveRecord::Base.connection.execute(sql) end end end Abstractions @barrettclark
  31. SELECT id, phone, ( SELECT MIN(id) FROM beacon_readings_lead_lag_mv WHERE major

    = br1.major AND minor = 0 AND phone = br1.phone AND id >= br1.id ) AS session_close_id, minor, next_beacon_minor, reading_timestamp FROM beacon_readings_lead_lag_mv br1 WHERE major = 2 AND minor != next_beacon_minor ORDER BY phone, reading_timestamp; Abstractions @barrettclark
  32. SELECT id, phone, ( SELECT MIN(id) FROM beacon_readings_lead_lag_mv WHERE major

    = br1.major AND minor = 0 AND phone = br1.phone AND id >= br1.id ) AS session_close_id, minor, next_beacon_minor, reading_timestamp FROM beacon_readings_lead_lag_mv br1 WHERE major = 2 AND minor != next_beacon_minor ORDER BY phone, reading_timestamp; Abstractions @barrettclark
  33. POSTGRES • Install from source • Install via package manager

    • Heroku's Postgres.app • Postgres 9.5.4 • PostGIS 2.2.2 Abstractions @barrettclark
  34. POSTGRES • Database GUI tools • pgAdmin3 (free) • Navicat

    (free trials) Abstractions @barrettclark
  35. DATATYPES • Array • DateRange and TSRange • JSON, JSONB

    (9.4) • UUID http://edgeguides.rubyonrails.org/active_record_postgresql.html Abstractions @barrettclark