$30 off During Our Annual Pro Sale. View Details »

Making Data Dance (Abstractions)

Making Data Dance (Abstractions)

Rails and the ActiveRecord gem are really handy tools to get work done, and do it quickly. They aren't a silver bullet, though. Sometimes the database is the best place to run a complicated query. Especially when you have big transactional or time-series data. Postgres has some powerful functionality that can help transform data from a giant pile of stuff into something actionable and useful. I will show how to take advantage of the JSON datatype, Common Table Expression, and materialized views to build impressive data visualizations.

Barrett Clark

August 18, 2016
Tweet

More Decks by Barrett Clark

Other Decks in Programming

Transcript

  1. MAKING DATA DANCE
    Abstractions @barrettclark

    View Slide

  2. HARNESS HERETICAL
    ANTIPATTERNS
    FOR FUN AND PROFIT
    Abstractions @barrettclark

    View Slide

  3. BARRETT CLARK
    @barrettclark
    Abstractions @barrettclark

    View Slide

  4. Abstractions @barrettclark

    View Slide

  5. Abstractions @barrettclark
    Class (Model)
    Object Relational
    Mapper
    Database
    M
    agic

    View Slide

  6. DATABASE AGNOSTIC
    Abstractions @barrettclark

    View Slide

  7. ActiveRecord
    Object Relational Mapper
    Abstractions @barrettclark

    View Slide

  8. ActiveRecord
    Abstractions @barrettclark

    View Slide

  9. RAW SQL
    Abstractions @barrettclark

    View Slide

  10. COMPLICATED JOINS
    Abstractions @barrettclark

    View Slide

  11. Category.joins(articles: [{ comments: :guest }, :tags])
    http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations
    Abstractions @barrettclark
    ???

    View Slide

  12. SELECT categories.* FROM categories
    INNER JOIN articles ON articles.category_id = categories.id
    INNER JOIN comments ON comments.article_id = articles.id
    INNER JOIN guests ON guests.comment_id = comments.id
    INNER JOIN tags ON tags.article_id = articles.id
    http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations
    Abstractions @barrettclark
    Category.joins(articles: [{ comments: :guest }, :tags])

    View Slide

  13. DEEPER DATABASE
    FUNCTIONALITY
    Abstractions @barrettclark

    View Slide

  14. BUT DATABASE
    PORTABILITY?!
    Abstractions @barrettclark

    View Slide

  15. https://flic.kr/p/eg4hih
    Abstractions @barrettclark

    View Slide

  16. It just doesn't work like that.
    Database portability pipe dream
    Abstractions @barrettclark

    View Slide

  17. Different databases are different.
    Abstractions @barrettclark

    View Slide

  18. You chose your database for a reason.
    Use it!
    Abstractions @barrettclark

    View Slide

  19. CAVEATS
    • Long-running queries
    • Connection pool
    Abstractions @barrettclark

    View Slide

  20. LET'S GET OUR HANDS DIRTY
    Abstractions @barrettclark

    View Slide

  21. Abstractions @barrettclark

    View Slide

  22. WINDOW FUNCTIONS
    Abstractions @barrettclark

    View Slide

  23. –Postgres Documentation
    A window function performs a calculation across a
    set of table rows that are somehow related to the
    current row.
    Abstractions @barrettclark

    View Slide

  24. –Barrett Clark
    Fold data from other rows into this row
    Abstractions @barrettclark

    View Slide

  25. WINDOW FUNCTION GREATEST HITS
    • lead()
    • lag()
    • first_value()
    • last_value()
    • row_number()
    Abstractions @barrettclark

    View Slide

  26. id fruit
    1 apple
    2 apple
    3 apple
    4 pear
    5 apple
    6 pear
    7 pear
    8 pear
    9 pear
    10 banana
    Abstractions @barrettclark

    View Slide

  27. SELECT *,LEAD(id) OVER(), LAG(id) OVER()
    FROM fruits;
    id | fruit | lead | lag
    ----+--------+------+-----
    1 | apple | 2 |
    2 | apple | 3 | 1
    3 | apple | 4 | 2
    4 | pear | 5 | 3
    5 | apple | 6 | 4
    6 | pear | 7 | 5
    7 | pear | 8 | 6
    8 | pear | 9 | 7
    9 | pear | 10 | 8
    10 | banana | | 9
    (10 rows)
    http://sqlfiddle.com/#!15/230c3b/2
    Abstractions @barrettclark

    View Slide

  28. –Postgres Documentation
    A window function performs a calculation across a
    set of table rows that are somehow related to the
    current row.
    Abstractions @barrettclark

    View Slide

  29. Abstractions @barrettclark
    https://flic.kr/p/7KGqEY

    View Slide

  30. SELECT id, fruit,
    FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id)
    FROM fruits;
    id | fruit | first_value
    ----+--------+-------------
    1 | apple | 1
    2 | apple | 1
    3 | apple | 1
    5 | apple | 1
    10 | banana | 10
    4 | pear | 4
    6 | pear | 4
    7 | pear | 4
    8 | pear | 4
    9 | pear | 4
    (10 rows)
    http://sqlfiddle.com/#!15/230c3b
    Abstractions @barrettclark

    View Slide

  31. SELECT *,
    FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id),
    ROW_NUMBER() OVER(PARTITION BY fruit ORDER BY id)
    FROM fruits;
    id | fruit | first_value | row_number
    ----+--------+-------------+------------
    1 | apple | 1 | 1
    2 | apple | 1 | 2
    3 | apple | 1 | 3
    5 | apple | 1 | 4
    10 | banana | 10 | 1
    4 | pear | 4 | 1
    6 | pear | 4 | 2
    7 | pear | 4 | 3
    8 | pear | 4 | 4
    9 | pear | 4 | 5
    (10 rows)
    http://sqlfiddle.com/#!15/230c3b
    Abstractions @barrettclark

    View Slide

  32. PRACTICAL EXAMPLE
    Chat App
    Abstractions @barrettclark

    View Slide

  33. SELECT id, message, room_id, user_id
    FROM messages
    ORDER BY id;
    id | message | room_id | user_id
    ----+-----------------------------+---------+---------
    1 | Leaving for the airport now | 1 | 1337
    2 | Flight just landed! | 1 | 1337
    3 | waiting on an uber | 2 | 1337
    4 | omw y'all! | 1 | 1337
    9 | hello, world | 1 | 27
    (5 rows)
    Abstractions @barrettclark

    View Slide

  34. SELECT id, message, room_id, user_id,
    LAG(user_id)
    OVER(
    PARTITION BY room_id
    ORDER BY room_id, id
    ) AS prev_user_id
    FROM messages
    ORDER BY 3, 1;
    Look back
    Defines the groups of rows to evaluate
    against the current row
    Abstractions @barrettclark

    View Slide

  35. id | message | room_id | user_id | prev_user_id
    ----+-----------------------------+---------+---------+--------------
    1 | Leaving for the airport now | 1 | 1337 |
    2 | Flight just landed! | 1 | 1337 | 1337
    4 | omw y'all! | 1 | 1337 | 1337
    9 | hello, world | 1 | 27 | 1337
    3 | waiting on an uber | 2 | 1337 |
    (5 rows)
    SELECT id, message, room_id, user_id,
    LAG(user_id)
    OVER(
    PARTITION BY room_id
    ORDER BY room_id, id
    ) AS prev_user_id
    FROM messages
    ORDER BY 3, 1;
    Abstractions @barrettclark

    View Slide

  36. SELECT id, message, room_id, user_id,
    LAG(user_id)
    OVER(
    PARTITION BY room_id
    ORDER BY room_id, id
    ) AS prev_user_id
    FROM messages
    ORDER BY 3, 1;
    ProTip™
    3
    1 2 4 5
    Abstractions @barrettclark

    View Slide

  37. class Message < ActiveRecord::Base
    def self.window_example
    sql = <<-SQL
    SELECT id, message, room_id, user_id,
    LAG(user_id)
    OVER(PARTITION BY room_id ORDER BY room_id, id)
    AS prev_user_id
    FROM messages
    ORDER BY 3, 1;
    SQL
    connection.execute(sql)
    end
    end
    Abstractions @barrettclark

    View Slide

  38. TOO ANTI-PATTERN FOR YOU?
    Abstractions @barrettclark

    View Slide

  39. module PGConnection
    def conn
    config = YAML.load_file(File.open('config/database.yml'))['development']
    @conn ||= PG.connect(
    :dbname => config['database'],
    :user => config['username'],
    :password => config['password'],
    :host => config['host'] || 'localhost'
    )
    end
    end
    class Antipattern
    extend PGConnection
    def self.window_example
    sql = <<-SQL
    SELECT id, message, room_id, user_id,
    LAG(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS prev_user_id
    FROM messages
    ORDER BY 3, 1;
    SQL
    conn.exec(sql).values
    end
    end
    Abstractions @barrettclark

    View Slide

  40. MODELS DO NOT HAVE TO BE
    BACKED BY ACTIVERECORD
    Or even persisted in the database.
    Abstractions @barrettclark

    View Slide

  41. QUESTIONS WE CAN NOW ANSWER
    • When did something change?
    • When did someone leave a place?
    • How long did each thing last?
    • How long did people stay in which places?
    • Enter / Exit / Change events
    Abstractions @barrettclark

    View Slide

  42. Abstractions @barrettclark
    https://flic.kr/p/5b8cq3

    View Slide

  43. SUBQUERIES
    Abstractions @barrettclark

    View Slide

  44. SUBQUERY
    • To filter and group
    • For a field value
    • In a join
    Abstractions @barrettclark

    View Slide

  45. BASE QUERY
    Abstractions @barrettclark

    View Slide

  46. SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5;
    “For a given phone, pull in the next minor field”
    Abstractions @barrettclark

    View Slide

  47. WHEN DID SOMETHING CHANGE?
    Abstractions @barrettclark

    View Slide

  48. SELECT major, minor, next_minor, COUNT(*)
    FROM (
    SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5
    ) AS beacon_readings_lead
    WHERE major = 1
    AND minor != next_minor
    GROUP BY major, minor, next_minor
    ORDER BY major, minor, next_minor;
    Abstractions @barrettclark

    View Slide

  49. YO DAWG, I HEARD YOU LIKE QUERIES
    Abstractions @barrettclark

    View Slide

  50. SELECT major, minor, next_minor, COUNT(*)
    FROM (
    SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5
    ) AS beacon_readings_lead
    WHERE major = 1
    AND minor != next_minor
    GROUP BY major, minor, next_minor
    ORDER BY major, minor, next_minor;
    Abstractions @barrettclark

    View Slide

  51. major | minor | next_minor | count
    -------+-------+------------+-------
    1 | 0 | 1 | 18
    1 | 0 | 2 | 3
    1 | 0 | 3 | 23
    1 | 0 | 4 | 7
    1 | 1 | 0 | 19
    1 | 1 | 2 | 59
    1 | 1 | 3 | 158
    1 | 1 | 4 | 44
    1 | 2 | 0 | 5
    1 | 2 | 1 | 59
    1 | 2 | 3 | 85
    1 | 2 | 4 | 40
    1 | 3 | 0 | 21
    1 | 3 | 1 | 154
    1 | 3 | 2 | 85
    1 | 3 | 4 | 52
    1 | 4 | 0 | 6
    1 | 4 | 1 | 49
    1 | 4 | 2 | 42
    1 | 4 | 3 | 45
    (20 rows)
    Abstractions @barrettclark

    View Slide

  52. Abstractions @barrettclark

    View Slide

  53. https://flic.kr/p/9tT9Yh
    Abstractions @barrettclark

    View Slide

  54. REFACTOR
    Abstractions @barrettclark

    View Slide

  55. CTE
    Common Table Expression
    Abstractions @barrettclark

    View Slide

  56. SELECT major, minor, next_minor, COUNT(*)
    FROM (
    SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5
    ) AS beacon_readings_lead
    WHERE major = 1
    AND minor != next_minor
    GROUP BY major, minor, next_minor
    ORDER BY major, minor, next_minor;
    Abstractions @barrettclark

    View Slide

  57. WITH beacon_readings_lead AS (
    SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5
    )
    SELECT major, minor, next_minor, COUNT(*)
    FROM beacon_readings_lead
    WHERE major = 1
    AND minor != next_minor
    GROUP BY major, minor, next_minor
    ORDER BY major, minor, next_minor;
    Abstractions @barrettclark

    View Slide

  58. WITH interval_query AS (
    SELECT (ts ||' hour')::INTERVAL AS hour_interval
    FROM generate_series(0,23) AS ts
    ), time_series AS (
    SELECT DATE_TRUNC('hour', NOW()) +
    INTERVAL '60 min' * ROUND(DATE_PART('minute', NOW()) / 60.0) -
    interval_query.hour_interval AS start_time
    FROM interval_query
    ), time_intervals AS (
    SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
    FROM time_series ORDER BY start_time
    ), reading_counts AS (
    SELECT f.start_time, f.end_time, br.minor,
    COUNT(DISTINCT br.phone) readings
    FROM beacon_readings br
    RIGHT JOIN time_intervals f
    ON br.reading_timestamp >= f.start_time
    AND br.reading_timestamp < f.end_time
    AND br.major = 1
    GROUP BY f.start_time, f.end_time, br.minor
    ORDER BY f.start_time, br.minor
    )
    SELECT * FROM reading_counts;
    Abstractions @barrettclark

    View Slide

  59. Abstractions @barrettclark

    View Slide

  60. Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Repetition Repetition Repetition Repetition
    Repetition Repetition Repetition
    Abstractions @barrettclark

    View Slide

  61. VIEW
    Stored Query
    Abstractions @barrettclark

    View Slide

  62. CREATE OR REPLACE VIEW beacon_readings_lead_view
    AS
    SELECT id, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY 2, 5
    ;
    Abstractions @barrettclark

    View Slide

  63. class CreateLeadView < ActiveRecord::Migration
    def up
    execute <<-SQL.strip_heredoc
    CREATE OR REPLACE VIEW beacon_readings_lead_view
    AS
    SELECT
    ID, phone, major, minor, reading_timestamp,
    LEAD(minor) OVER (
    PARTITION BY phone
    ORDER BY phone, reading_timestamp
    ) AS next_minor
    FROM beacon_readings
    ORDER BY phone, reading_timestamp;
    SQL
    end
    def down
    sql = "DROP VIEW IF EXISTS beacon_readings_lead_view;"
    execute(sql)
    end
    end
    Abstractions @barrettclark

    View Slide

  64. SELECT major, minor, next_minor, COUNT(*)
    FROM beacon_readings_lead_view
    WHERE major = 1
    AND minor != next_minor
    GROUP BY major, minor, next_minor
    ORDER BY major, minor, next_minor;
    Abstractions @barrettclark

    View Slide

  65. BUSINESS LOGIC
    IN THE DATABASE?!?

    Abstractions @barrettclark

    View Slide

  66. YOU
    Get to decide where your code runs
    Abstractions @barrettclark

    View Slide

  67. https://flic.kr/p/2MW7SY
    Abstractions @barrettclark

    View Slide

  68. MY QUERY IS TOO SLOW
    Abstractions @barrettclark

    View Slide

  69. DO THE HEAVY LIFTING
    IN A SEPARATE PROCESS
    Abstractions @barrettclark

    View Slide

  70. MATERIALIZED VIEW
    Stored query AND data
    Abstractions @barrettclark

    View Slide

  71. CREATE MATERIALIZED VIEW beacon_readings_lead_lag_mv AS
    SELECT *
    FROM beacon_readings_lead_lag_view
    WHERE reading_timestamp >= NOW() - '1 day'::INTERVAL;
    CREATE INDEX index_beacon_readings_mv_on_major
    ON beacon_readings_lead_lag_mv USING btree(
    major ASC NULLS LAST
    );
    CREATE INDEX index_beacon_readings_mv_on_phone
    ON beacon_readings_lead_lag_mv USING btree(
    phone COLLATE "default" ASC NULLS LAST
    );
    CREATE INDEX index_beacon_readings_mv_on_reading_timestamp
    ON beacon_readings_lead_lag_mv USING btree(
    reading_timestamp ASC NULLS LAST
    );
    Abstractions @barrettclark

    View Slide

  72. DATA
    Abstractions @barrettclark
    https://flic.kr/p/khS6jn

    View Slide

  73. REFRESH MATERIALIZED VIEW CONCURRENTLY
    beacon_readings_lead_lag_mv;
    Available starting in 9.4
    Abstractions @barrettclark

    View Slide

  74. namespace :db do
    namespace :heroku do
    # https://devcenter.heroku.com/articles/scheduler
    desc 'Update the materialized view(s)'
    task :update_materialized_view => :environment do
    sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;'
    ActiveRecord::Base.connection.execute(sql)
    end
    end
    end
    Abstractions @barrettclark

    View Slide

  75. Abstractions @barrettclark
    https://flic.kr/p/bVUL7

    View Slide

  76. SUBQUERY FOR A
    FIELD VALUE
    Abstractions @barrettclark

    View Slide

  77. SELECT
    id,
    phone,
    ( SELECT MIN(id)
    FROM beacon_readings_lead_lag_mv
    WHERE
    major = br1.major
    AND minor = 0
    AND phone = br1.phone
    AND id >= br1.id
    ) AS session_close_id,
    minor,
    next_beacon_minor,
    reading_timestamp
    FROM beacon_readings_lead_lag_mv br1
    WHERE major = 2
    AND minor != next_beacon_minor
    ORDER BY phone, reading_timestamp;
    Abstractions @barrettclark

    View Slide

  78. SELECT
    id,
    phone,
    ( SELECT MIN(id)
    FROM beacon_readings_lead_lag_mv
    WHERE
    major = br1.major
    AND minor = 0
    AND phone = br1.phone
    AND id >= br1.id
    ) AS session_close_id,
    minor,
    next_beacon_minor,
    reading_timestamp
    FROM beacon_readings_lead_lag_mv br1
    WHERE major = 2
    AND minor != next_beacon_minor
    ORDER BY phone, reading_timestamp;
    Abstractions @barrettclark

    View Slide

  79. Abstractions @barrettclark

    View Slide

  80. OTHER USEFUL THINGS
    Abstractions @barrettclark

    View Slide

  81. POSTGRES
    • Install from source
    • Install via package manager
    • Heroku's Postgres.app
    • Postgres 9.5.4
    • PostGIS 2.2.2
    Abstractions @barrettclark

    View Slide

  82. POSTGRES
    • Database GUI tools
    • pgAdmin3 (free)
    • Navicat (free trials)
    Abstractions @barrettclark

    View Slide

  83. POSTGRES
    • Visual query plan
    • http://explain.depesz.com/
    • http://tatiyants.com/pev
    Abstractions @barrettclark

    View Slide

  84. Abstractions @barrettclark
    http://explain.depesz.com

    View Slide

  85. Abstractions @barrettclark
    http://tatiyants.com/pev

    View Slide

  86. CAUTION!!!
    Different environments are different
    Abstractions @barrettclark

    View Slide

  87. DATATYPES
    • Array
    • DateRange and TSRange
    • JSON, JSONB (9.4)
    • UUID
    http://edgeguides.rubyonrails.org/active_record_postgresql.html
    Abstractions @barrettclark

    View Slide

  88. UPSERT!!!
    New in Postgres 9.5

    View Slide

  89. PARALLEL SEQUENTIAL SCAN,
    JOINS, AND AGGREGATES
    New in Postgres 9.6

    View Slide

  90. FUTURE YOU
    WILL THANK
    YOU
    Abstractions @barrettclark

    View Slide

  91. RECAP
    Abstractions @barrettclark

    View Slide

  92. IT'S OK TO WRITE SQL
    Abstractions @barrettclark

    View Slide

  93. REFACTOR
    • Common Table Expression (CTE)
    • View
    • Materialized View
    Abstractions @barrettclark

    View Slide

  94. I'M NOT SAYING
    DO EVERYTHING IN THE DATABASE
    Abstractions @barrettclark

    View Slide

  95. POSTGRESQL IS AWESOME
    Abstractions @barrettclark

    View Slide

  96. Abstractions @barrettclark

    View Slide

  97. THANK YOU!
    Abstractions @barrettclark

    View Slide