Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tips and tricks for efficient SQL querying

Gousto Tech
February 16, 2017

Tips and tricks for efficient SQL querying

Internal session for all our data scientists/analysts and others using Redshift.

Gousto Tech

February 16, 2017
Tweet

More Decks by Gousto Tech

Other Decks in Technology

Transcript

  1. ERROR: Out of memory Detail: ---------------------------------- error: Out of memory

    code: 1004 context: alloc(134217728,35) query: 29949080 location: alloc.cpp:817 process: query1_28 [pid=19292] ----------------------------------
  2. 1. Use CTEs 2. Reduce your dataset (CTE) 3. Avoid

    joins 4. Avoid table scans 5. Use window funcBons SQL commandments
  3. Why is that important? 1. Imagine matching words in two

    (massive) books 2. RedshiH is a distributed columnar database Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur erat enim, consequat at feugiat non, Cncidunt interdum enim. Phasellus tellus tortor, maximus et condimentum eu, cursus sit amet diam. Mauris semper faucibus ipsum, a maGs diam scelerisque eget. In et dolor sit amet diam Cncidunt eleifend. Sed lacus nisl, imperdiet sed loborCs eu, aliquet tempor quam. Aenean trisCque arcu sed nulla vesCbulum iaculis. Integer pharetra lorem ultricies sem semper, id bibendum tellus consequat. In vulputate feugiat maGs. Duis viverra vehicula maGs. Donec pharetra nibh at aliquam congue. Quisque sit amet sapien nec nulla vehicula ornare id sit amet nibh. Proin a turpis risus. VesCbulum nec purus tellus. Aliquam euismod fringilla congue. ECam ultricies mauris in blandit feugiat. Sed ut consequat ligula. Donec sagiGs lorem eget mi loborCs, ut posuere ex dapibus. Vivamus volutpat nec tortor ultricies placerat. Aliquam placerat urna ipsum, sit amet consectetur mi trisCque ut. Mauris sodales nisl ut est ornare, in sodales arcu placerat. Nulla facilisi. Vivamus lacinia eleifend enim at placerat. Morbi a nisl a justo scelerisque Cncidunt at vitae nunc. Donec maGs preCum lorem, at aliquam metus ultrices non. Sed arcu justo, dapibus consequat fringilla sagiGs, hendrerit sed odio. Nulla est nunc, interdum nec Cncidunt a, ullamcorper at dui. VesCbulum tellus orci, gravida sodales ligula quis, ullamcorper fermentum nunc. Phasellus loborCs vel orci aliquet malesuada. Sed semper bibendum erat, in suscipit nulla malesuada in. Sed Cncidunt posuere malesuada. Curabitur ut pharetra justo. ECam porGtor finibus nunc, et ullamcorper metus molesCe sit amet. Cras malesuada fermentum odio, in varius libero porta ac. Vivamus laoreet odio eget finibus ultricies. Curabitur cursus convallis lacus, blandit luctus lacus egestas vel. Cras vitae feugiat ante, nec Cncidunt quam. Nunc sollicitudin blandit nisi vitae vesCbulum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus gravida tellus vitae fermentum posuere. In efficitur in velit sed consectetur. Praesent ultrices leo a ligula finibus fringilla. Curabitur ultricies congue massa, sed iaculis tellus tempor sed. Nullam sed sollicitudin est. Fusce consectetur dapibus tortor, blandit aliquet ex luctus ac. Pellentesque vitae ligula id enim faucibus volutpat et at est. In elementum id enim sed Cncidunt. Curabitur nisi leo, egestas vel magna ac, ornare consequat dolor. Aliquam sed placerat dui, sed ullamcorper purus. Quisque cursus dapibus mi eget fermentum. Morbi ac sem at dolor pulvinar sollicitudin nec in nulla. Nam porGtor laoreet tempus. VesCbulum sed iaculis neque. Nunc porGtor nibh ligula, et gravida velit viverra trisCque. Aenean urna felis, consectetur et loborCs id, euismod sit amet augue. Donec luctus tortor non dui congue varius ut id quam. Sed auctor congue sem a venenaCs. Vivamus eleifend neque nibh, eu sollicitudin nunc eleifend facilisis. Phasellus ex nulla, scelerisque loborCs ligula quis, placerat venenaCs felis. Donec elementum iaculis lectus, ac pharetra sem faucibus id. Nulla facilisi. Fusce eu eros dui. Curabitur malesuada urna a leo loborCs, nec dictum nibh loborCs. Fusce varius risus diam, et aliquam ante ullamcorper non. Duis sed metus interdum, porGtor urna vel, luctus mi. Suspendisse scelerisque leo sagiGs Cncidunt vehicula. Praesent convallis commodo imperdiet. Maecenas finibus, leo ut tempor consectetur, mi felis vesCbulum urna, nec vulputate augue dui non ligula. Suspendisse laoreet ut nisi ut Cncidunt. Praesent eleifend auctor scelerisque. Nam non nisi ac lectus dapibus commodo. Suspendisse non odio convallis, dapibus sem vel, feugiat ante. Nam mauris urna, dapibus ac malesuada eu, facilisis semper neque. Proin urna mi, tempus in congue ac, laoreet a tellus. Sed aliquet vitae lacus
  4. 1. Use CTEs WITH order_events AS ( SELECT DISTINCT id,

    user_id, updated_at, MIN(root_tstamp) OVER (…) root_tstamp FROM snowplow_atomic.uk_co_gousto_order_1 ), events_pre AS ( SELECT event_id, page_url, useragent FROM snowplow_atomic.events WHERE useragent NOT ILIKE '%pingdom%' AND useragent NOT ILIKE 'App/%' AND useragent NOT ILIKE 'Symfony/%' ) CTE stand for Common Table Expressions. The main reasons to use are: • reusability and • readability. CTEs are actually not performing beOer than sub-queries, but the conceptual benefits outweigh the performance drawbacks. If CTEs are big and/or used numerous Bmes, it might be beOer to materialise them, i.e. write ETL that stores the data physically.
  5. 2. Reduce your dataset WITH order_items AS ( SELECT *

    FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query?
  6. 2. Reduce your dataset WITH order_items AS ( SELECT *

    FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query? Nothing is wrong, but it’s inefficient! A few Bps: 1. select only columns you really need 2. filter (reduce) rows as early as possible 3. aggregate before join
  7. 2. Reduce your dataset WITH order_items AS ( SELECT *

    FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query? Nothing is wrong, but it’s inefficient! A few Bps: 1. select only columns you really need 2. filter (reduce) rows as early as possible 3. aggregate before join Let’s give it a try.
  8. 3. Avoid joins Query too long to put it on

    a slide :) Full outer join with exclusion Let’s see it in ac0on. Think about other ways to query the data, e.g. using UNION etc.
  9. 4. Avoid table scans SELECT COUNT(1) total FROM trans.users WHERE

    created_at - interval ‘8 hour’ > date(now() - interval ‘8 hour’) SELECT COUNT(1) total FROM trans.users WHERE created_at > date(now() - interval ‘8 hour’) + interval ‘8 hour’
  10. 5. Use window funcCons Window funcCon is a funcBon applied

    to a set of rows defined by a window descripBon and returns a single value for each row from the underlying query. It’s similar to GROUP BY, but much more powerful (and efficient). Example: avoid GROUP BY and self-joins. group 1 …. …… …………. ……… ………. ……. ……… group 1 …. …… …………. ……… ………. ……. ……… group 1 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ………
  11. 5. Use window funcCons - example SELECT user_id, FIRST_VALUE(box_type_id) OVER(PARTITION

    BY user_id ORDER BY period_id) first_box_type_id FROM trans.orders WITH user_first_order AS ( SELECT user_id, MIN(period_id) first_period_id FROM trans.orders GROUP BY 1 ) SELECT user_id, box_type_id FROM user_first_orders ufo JOIN trans.orders o ON o.user_id = ufo.usr_id AND o.period_id = ufo.first_period_id VS
  12. 5. Use window funcCons AVG COUNT ROW_NUMBER FIRST_VALUE LAST_VALUE LAG

    LEAD LISTAGG NTH_VALUE RATIO_TO_REPORT NTILE PERCENT_RANK PERCENTILE_CONT PERCENTILE_DISC FIRST_VALUE RANK STDDEV_SAMP VAR_SAMP