Presto Performance Tuning

Presto Performance Tuning Fine-Tune Your SQL Queries

How to optimize our SQL Query? In Presto

Only specifying the columns you need! [Bad]: SELECT * FROM
table [Good]: SELECT id, name, role FROM table

Leveraging Time-based partitioning WHERE time <=> Integer TD_TIME_RANGE

Example WHERE time <=> Integer SELECT ... WHERE time >
1349393020 TD_TIME_RANGE SELECT … WHERE TD_TIME_RANGE(time, '2020-03-01','2020-03-15', 'UTC')

Considering the cardinality within GROUP BY [BAD]: SELECT ... GROUP
BY gender, id [GOOD]: SELECT … GROUP BY id, gender *Another tip: Use numbers instead of strings for the GROUP BY column

Use LIMIT with ORDER BY [BAD]: SELECT * FROM tbl
ORDER BY time [GOOD]: SELECT * FROM tbl ORDER BY time LIMIT 10

Aggregating a series of LIKE clauses in one single regexp_like
clause

SELECT id, name FROM access WHERE method LIKE '%GET%' OR
method LIKE '%POST%' OR method LIKE '%PUT%' OR method LIKE '%DELETE%' SELECT Id, name FROM access WHERE regexp_like(method, 'GET|POST|PUT|DELETE') *Or in other DB: PostgreSQL: SIMILAR TO Redshift: SIMILAR TO

Specifying large tables first in join clause “The default join
algorithm of Presto is broadcast join”

Improving Performance of Very Large Queries CTAS ♥ DROP DROP
TABLE IF EXISTS my_result; CREATE TABLE my_result AS SELECT * FROM my_table;

Did you ever received an exceeded max (local) memory xxGB
error?

Here is a list of memory-intensive • Distinct: ◦ SELECT
distinct c1, c2, c3 FROM my_table ◦ Count(Distinct x) -> Approx_distinct() • UNION: Prefer UNION ALL over UNION • ORDER BY: Avoid ORDER BY • GROUP BY: Reduce the number of GROUP BY columns • Joins: From the largest table to smaller tables ◦ Use distributed joins ◦ -- set session join_distribution_type = 'PARTITIONED' SELECT * FROM table_a, table_b WHERE table_a.id = table_b.id

Let’s Practice!

Simplified this Query! SELECT property_name FROM property_tb WHERE pic_role like
'%Owner%' or pic_role like '%owner%' or pic_email like '%[email protected]%'

Optimize this Query! SELECT hotel_name FROM hotel_tb WHERE pic_role =
‘Hotel Owner' OR pic_role = 'Assistant' OR pic_role = 'Hotel Ops'

Optimize this Query! SELECT pic_name, pic_contact FROM hotel_tb WHERE pic_role
= ‘Hotel Ops’ UNION SELECT pic_name, pic_contact FROM hotel_tb WHERE pic_role like ‘h%’ OR hotel_address = ‘Letjen S Parman no 23’

• 7.10. Regular Expression Functions — Presto 0.233 Documentation •
Pattern (Java Platform SE 8 ) • 1.2. Presto Concepts — Presto 0.233.1 Documentation • Treasure Data Function Reference • 5. Query Optimizer — Presto 331 Documentation Additional Link

• Presto Performance Tuning • Presto Query FAQs • Presto
Best Practices References

Presto Performance Tuning

Presto Performance Tuning

Nellie Maysela

Other Decks in Technology

Featured

Transcript