Slide 1

Slide 1 text

TIPS & TRICKS FOR EFFICIENT SQL QUERYING Dejan Petelin Head of Data Science

Slide 2

Slide 2 text

ERROR: Out of memory Detail: ---------------------------------- error: Out of memory code: 1004 context: alloc(134217728,35) query: 29949080 location: alloc.cpp:817 process: query1_28 [pid=19292] ----------------------------------

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

1. Use CTEs 2. Reduce your dataset (CTE) 3. Avoid joins 4. Avoid table scans 5. Use window funcBons SQL commandments

Slide 5

Slide 5 text

Why is that important? 1. Imagine matching words in two (massive) books 2. RedshiH is a distributed columnar database Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur erat enim, consequat at feugiat non, Cncidunt interdum enim. Phasellus tellus tortor, maximus et condimentum eu, cursus sit amet diam. Mauris semper faucibus ipsum, a maGs diam scelerisque eget. In et dolor sit amet diam Cncidunt eleifend. Sed lacus nisl, imperdiet sed loborCs eu, aliquet tempor quam. Aenean trisCque arcu sed nulla vesCbulum iaculis. Integer pharetra lorem ultricies sem semper, id bibendum tellus consequat. In vulputate feugiat maGs. Duis viverra vehicula maGs. Donec pharetra nibh at aliquam congue. Quisque sit amet sapien nec nulla vehicula ornare id sit amet nibh. Proin a turpis risus. VesCbulum nec purus tellus. Aliquam euismod fringilla congue. ECam ultricies mauris in blandit feugiat. Sed ut consequat ligula. Donec sagiGs lorem eget mi loborCs, ut posuere ex dapibus. Vivamus volutpat nec tortor ultricies placerat. Aliquam placerat urna ipsum, sit amet consectetur mi trisCque ut. Mauris sodales nisl ut est ornare, in sodales arcu placerat. Nulla facilisi. Vivamus lacinia eleifend enim at placerat. Morbi a nisl a justo scelerisque Cncidunt at vitae nunc. Donec maGs preCum lorem, at aliquam metus ultrices non. Sed arcu justo, dapibus consequat fringilla sagiGs, hendrerit sed odio. Nulla est nunc, interdum nec Cncidunt a, ullamcorper at dui. VesCbulum tellus orci, gravida sodales ligula quis, ullamcorper fermentum nunc. Phasellus loborCs vel orci aliquet malesuada. Sed semper bibendum erat, in suscipit nulla malesuada in. Sed Cncidunt posuere malesuada. Curabitur ut pharetra justo. ECam porGtor finibus nunc, et ullamcorper metus molesCe sit amet. Cras malesuada fermentum odio, in varius libero porta ac. Vivamus laoreet odio eget finibus ultricies. Curabitur cursus convallis lacus, blandit luctus lacus egestas vel. Cras vitae feugiat ante, nec Cncidunt quam. Nunc sollicitudin blandit nisi vitae vesCbulum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus gravida tellus vitae fermentum posuere. In efficitur in velit sed consectetur. Praesent ultrices leo a ligula finibus fringilla. Curabitur ultricies congue massa, sed iaculis tellus tempor sed. Nullam sed sollicitudin est. Fusce consectetur dapibus tortor, blandit aliquet ex luctus ac. Pellentesque vitae ligula id enim faucibus volutpat et at est. In elementum id enim sed Cncidunt. Curabitur nisi leo, egestas vel magna ac, ornare consequat dolor. Aliquam sed placerat dui, sed ullamcorper purus. Quisque cursus dapibus mi eget fermentum. Morbi ac sem at dolor pulvinar sollicitudin nec in nulla. Nam porGtor laoreet tempus. VesCbulum sed iaculis neque. Nunc porGtor nibh ligula, et gravida velit viverra trisCque. Aenean urna felis, consectetur et loborCs id, euismod sit amet augue. Donec luctus tortor non dui congue varius ut id quam. Sed auctor congue sem a venenaCs. Vivamus eleifend neque nibh, eu sollicitudin nunc eleifend facilisis. Phasellus ex nulla, scelerisque loborCs ligula quis, placerat venenaCs felis. Donec elementum iaculis lectus, ac pharetra sem faucibus id. Nulla facilisi. Fusce eu eros dui. Curabitur malesuada urna a leo loborCs, nec dictum nibh loborCs. Fusce varius risus diam, et aliquam ante ullamcorper non. Duis sed metus interdum, porGtor urna vel, luctus mi. Suspendisse scelerisque leo sagiGs Cncidunt vehicula. Praesent convallis commodo imperdiet. Maecenas finibus, leo ut tempor consectetur, mi felis vesCbulum urna, nec vulputate augue dui non ligula. Suspendisse laoreet ut nisi ut Cncidunt. Praesent eleifend auctor scelerisque. Nam non nisi ac lectus dapibus commodo. Suspendisse non odio convallis, dapibus sem vel, feugiat ante. Nam mauris urna, dapibus ac malesuada eu, facilisis semper neque. Proin urna mi, tempus in congue ac, laoreet a tellus. Sed aliquet vitae lacus

Slide 6

Slide 6 text

Let’s revise what joins are Full outer join LeH outer join with exclusion Inner join

Slide 7

Slide 7 text

1. Use CTEs WITH order_events AS ( SELECT DISTINCT id, user_id, updated_at, MIN(root_tstamp) OVER (…) root_tstamp FROM snowplow_atomic.uk_co_gousto_order_1 ), events_pre AS ( SELECT event_id, page_url, useragent FROM snowplow_atomic.events WHERE useragent NOT ILIKE '%pingdom%' AND useragent NOT ILIKE 'App/%' AND useragent NOT ILIKE 'Symfony/%' ) CTE stand for Common Table Expressions. The main reasons to use are: • reusability and • readability. CTEs are actually not performing beOer than sub-queries, but the conceptual benefits outweigh the performance drawbacks. If CTEs are big and/or used numerous Bmes, it might be beOer to materialise them, i.e. write ETL that stores the data physically.

Slide 8

Slide 8 text

2. Reduce your dataset WITH order_items AS ( SELECT * FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query?

Slide 9

Slide 9 text

2. Reduce your dataset WITH order_items AS ( SELECT * FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query? Nothing is wrong, but it’s inefficient! A few Bps: 1. select only columns you really need 2. filter (reduce) rows as early as possible 3. aggregate before join

Slide 10

Slide 10 text

2. Reduce your dataset WITH order_items AS ( SELECT * FROM trans.orders o JOIN trans.order_items oi ON oi.order_id = o.id ) SELECT u.id, oi.order_id, COUNT(1) total_items FROM order_items oi JOIN trans.users u ON u.id = oi.user_id WHERE oi.order_state_id = 2 AND oi.period_id = 154 GROUP BY 1, 2 What’s wrong with that query? Nothing is wrong, but it’s inefficient! A few Bps: 1. select only columns you really need 2. filter (reduce) rows as early as possible 3. aggregate before join Let’s give it a try.

Slide 11

Slide 11 text

3. Avoid joins Query too long to put it on a slide :) Full outer join with exclusion Let’s see it in ac0on. Think about other ways to query the data, e.g. using UNION etc.

Slide 12

Slide 12 text

4. Avoid table scans SELECT COUNT(1) total FROM trans.users WHERE created_at - interval ‘8 hour’ > date(now() - interval ‘8 hour’) SELECT COUNT(1) total FROM trans.users WHERE created_at > date(now() - interval ‘8 hour’) + interval ‘8 hour’

Slide 13

Slide 13 text

5. Use window funcCons Window funcCon is a funcBon applied to a set of rows defined by a window descripBon and returns a single value for each row from the underlying query. It’s similar to GROUP BY, but much more powerful (and efficient). Example: avoid GROUP BY and self-joins. group 1 …. …… …………. ……… ………. ……. ……… group 1 …. …… …………. ……… ………. ……. ……… group 1 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 2 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 3 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ……… group 4 …. …… …………. ……… ………. ……. ………

Slide 14

Slide 14 text

5. Use window funcCons - example SELECT user_id, FIRST_VALUE(box_type_id) OVER(PARTITION BY user_id ORDER BY period_id) first_box_type_id FROM trans.orders WITH user_first_order AS ( SELECT user_id, MIN(period_id) first_period_id FROM trans.orders GROUP BY 1 ) SELECT user_id, box_type_id FROM user_first_orders ufo JOIN trans.orders o ON o.user_id = ufo.usr_id AND o.period_id = ufo.first_period_id VS

Slide 15

Slide 15 text

5. Use window funcCons AVG COUNT ROW_NUMBER FIRST_VALUE LAST_VALUE LAG LEAD LISTAGG NTH_VALUE RATIO_TO_REPORT NTILE PERCENT_RANK PERCENTILE_CONT PERCENTILE_DISC FIRST_VALUE RANK STDDEV_SAMP VAR_SAMP

Slide 16

Slide 16 text

No content