Window Functions

Databases run better with Percona

SQL Window Function Analytics and more David.Stokes@Percona.com @Stoker

©2023 Percona | Confidential Who Am I I am Dave
Stokes Technology Evangelist at Percona Author of MySQL & JSON - A Practical Programming Guide Over a decade on the Oracle MySQL Community Team Started with MySQL 3.29 David.Stokes@Percona.com @Stoker https://speakerdeck.com/stoker

Percona is an open source database software, support, and services
company that helps make databases and applications run better. Percona.com

SQL Window Functions

©2023 Percona | Confidential This is not an easy subject
6 Your brain may be hurt during this presentation.

©2023 Percona | Confidential In database management, an aggregate function
or aggregation function is a function where multiple values are processed together to form a single summary statistic. Common aggregate functions include: • Average (i.e., arithmetic mean) • Count • Maximum • Median • Minimum • Mode • Range • Sum 7 https://en.wikipedia.org/wiki/Aggregate_function#:~:text=In%20database%20management%2C%20an%20aggregate,Sum

©2023 Percona | Confidential In SQL, a window function or
analytic function[1] is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.[2] 8 https://en.wikipedia.org/wiki/Window_function_(SQL)

©2023 Percona | Confidential 9 Analytics, Statistics, Numbers, and Your
Sanity • Presenting numbers is often tough ◦ Learning the ‘stuff’ to present numbers from a Structured Query Language Database is tougher ◦ Practice helps ◦ Build in ‘sanity checks’ -> Do the numbers make sense? • Aggregate Functions are fairly straightforward • Window Function are not straightforward ◦ Syntax is odd ◦ Partitioning can be hard ◦ Functions inside Window Functions are not ‘intuitive’ ◦ You need lots of practice! Lots!

©2023 Percona | Confidential Warning! This is not a spreadsheet
10

©2023 Percona | Confidential Partitions 11 CURRENT ROW N Preceding
N Following Unbounded Preceding Unbounded Following

©2023 Percona | Confidential SQL Aggregate Functions are fine but
limited. test=# select count(id) as "# records", sum(price), round(avg(price),2), max(price) from data; # records | sum | round | max -----------+------------+-----------+----------- 10 | 1545897.00 | 154589.70 | 999999.00 (1 row) 13 It is easy to use aggregate functions to generate information on a column

©2023 Percona | Confidential GROUPing helps bring a little more
clarity test=# select brand, count(brand), sum(price), round(avg(price),2), max(price) from data group by brand; brand | count | sum | round | max ------------+-------+-----------+-----------+----------- Ford | 3 | 138400.00 | 46133.33 | 53400.00 Dodge | 2 | 147999.00 | 73999.50 | 74999.00 Audi | 1 | 54000.00 | 54000.00 | 54000.00 Chevrolet | 2 | 187499.00 | 93749.50 | 122500.00 Pontiac | 1 | 18000.00 | 18000.00 | 18000.00 Ferrari | 1 | 999999.00 | 999999.00 | 999999.00 (6 rows) 14 GROUP BY can be used to put related information together.

©2023 Percona | Confidential Order can improve readability test=# select
brand, count(brand), sum(price), round(avg(price),2), max(price) from data group by brand order by brand; brand | count | sum | round | max ------------+-------+-----------+-----------+----------- Audi | 1 | 54000.00 | 54000.00 | 54000.00 Chevrolet | 2 | 187499.00 | 93749.50 | 122500.00 Dodge | 2 | 147999.00 | 73999.50 | 74999.00 Ferrari | 1 | 999999.00 | 999999.00 | 999999.00 Ford | 3 | 138400.00 | 46133.33 | 53400.00 Pontiac | 1 | 18000.00 | 18000.00 | 18000.00 (6 rows) 15

©2023 Percona | Confidential But what about differences within those
groups? Window functions allow the partitioning of the data for the purpose of analysis 16

Better Analsys

©2023 Percona | Confidential OVER() is the Window Function call
test=# select brand, price, sum(price) OVER () from data order by brand; brand | price | sum ------------+-----------+------------ Audi | 54000.00 | 1545897.00 Chevrolet | 64999.00 | 1545897.00 Chevrolet | 122500.00 | 1545897.00 Dodge | 74999.00 | 1545897.00 Dodge | 73000.00 | 1545897.00 Ferrari | 999999.00 | 1545897.00 Ford | 53400.00 | 1545897.00 Ford | 37000.00 | 1545897.00 Ford | 48000.00 | 1545897.00 Pontiac | 18000.00 | 1545897.00 (10 rows) 18 OVER() without anything in the brackets tells the database instance to apply the calculation over the ENTIRE column. Here, sum is for all the values is the price column

©2023 Percona | Confidential You can have multiple OVER()s test=#
select brand, count(id) over(), sum(price) over () from data order by brand; brand | count | sum ------------+-------+------------ Audi | 10 | 1545897.00 Chevrolet | 10 | 1545897.00 Chevrolet | 10 | 1545897.00 Dodge | 10 | 1545897.00 Dodge | 10 | 1545897.00 Ferrari | 10 | 1545897.00 Ford | 10 | 1545897.00 Ford | 10 | 1545897.00 Ford | 10 | 1545897.00 Pontiac | 10 | 1545897.00 (10 rows) 19 Here are two WFs where we are counting all the columns with an ‘id’ value and another that is summing all the ‘price’ values.

©2023 Percona | Confidential Move window definition around test=# select
brand, price, sum(price) OVER w from data window w as (); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 1545897.00 Audi | 54000.00 | 1545897.00 Dodge | 74999.00 | 1545897.00 Ford | 37000.00 | 1545897.00 Pontiac | 18000.00 | 1545897.00 Chevrolet | 64999.00 | 1545897.00 Chevrolet | 122500.00 | 1545897.00 Ford | 53400.00 | 1545897.00 Dodge | 73000.00 | 1545897.00 Ferrari | 999999.00 | 1545897.00 (10 rows) 20 We can also move the definition of the window away from the SELECT portion of the query, often provides better readability.

©2023 Percona | Confidential The definition clause can get weird
test=# select brand, price, sum(price) over w from data window w as (rows current row); brand | price | sum ------------+-----------+----------- Ford | 48000.00 | 48000.00 Audi | 54000.00 | 54000.00 Dodge | 74999.00 | 74999.00 Ford | 37000.00 | 37000.00 Pontiac | 18000.00 | 18000.00 Chevrolet | 64999.00 | 64999.00 Chevrolet | 122500.00 | 122500.00 Ford | 53400.00 | 53400.00 Dodge | 73000.00 | 73000.00 Ferrari | 999999.00 | 999999.00 (10 rows) 21 Clarity is of prime importance so when in doubt, spell it out!

©2023 Percona | Confidential Unbounded? test=# select brand, price, sum(price)
over w from data window w as (rows unbounded preceding); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 48000.00 Audi | 54000.00 | 102000.00 Dodge | 74999.00 | 176999.00 Ford | 37000.00 | 213999.00 Pontiac | 18000.00 | 231999.00 Chevrolet | 64999.00 | 296998.00 Chevrolet | 122500.00 | 419498.00 Ford | 53400.00 | 472898.00 Dodge | 73000.00 | 545898.00 Ferrari | 999999.00 | 1545897.00 (10 rows) 22

©2023 Percona | Confidential Partitions 23 CURRENT ROW N Preceding
N Following Unbounded Preceding Unbounded Following This might be cleared after the examples

©2023 Percona | Confidential Unbounded return? test=# select brand, price,
sum(price) over w from data window w as (rows unbounded preceding); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 48000.00sum (48,000 + previous row) Audi | 54000.00 | 102000.00sum (48,000 + 54,000) = 102,000 Dodge | 74999.00 | 176999.00sum (102,000 + 74,999) = 176,999 Ford | 37000.00 | 213999.00 Pontiac | 18000.00 | 231999.00 Chevrolet | 64999.00 | 296998.00 Chevrolet | 122500.00 | 419498.00 Ford | 53400.00 | 472898.00 Dodge | 73000.00 | 545898.00 Ferrari | 999999.00 | 1545897.00 (10 rows) 24

©2023 Percona | Confidential Rows between unbounded preceding and current
row test=# select brand, price, sum(price) over w from data window w as (rows between unbounded preceding and current row); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 48000.00 Audi | 54000.00 | 102000.00 Dodge | 74999.00 | 176999.00 Ford | 37000.00 | 213999.00 Pontiac | 18000.00 | 231999.00 Chevrolet | 64999.00 | 296998.00 Chevrolet | 122500.00 | 419498.00 Ford | 53400.00 | 472898.00 Dodge | 73000.00 | 545898.00 Ferrari | 999999.00 | 1545897.00 (10 rows) 25 rows unbounded preceding = rows between unbounded preceding and current row Obvious, eh? This is why practice is needed

©2023 Percona | Confidential Rows between current row and unbounded
following test=# select brand, price, sum(price) over w from data window w as (rows between current row and unbounded following); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 1545897.00(1,497,897 + 48,000) = 1,545,897 Audi | 54000.00 | 1497897.00 Dodge | 74999.00 | 1443897.00 Ford | 37000.00 | 1368898.00 Pontiac | 18000.00 | 1331898.00 Chevrolet | 64999.00 | 1313898.00 Chevrolet | 122500.00 | 1248899.00 Ford | 53400.00 | 1126399.00 Dodge | 73000.00 | 1072999.00 Ferrari | 999999.00 | 999999.00 (10 rows) 26 rows between unbounded preceding and current row is the opposite of rows between current row and unbounded following

©2023 Percona | Confidential Rows between 1 preceding and current
row test=# select brand, price, sum(price) over w from data window w as (rows between 1 preceding and current row); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 48000.00 Audi | 54000.00 | 102000.00(48,000 + 54,000) = 102,000 Dodge | 74999.00 | 128999.00(74,999 + 54,000) = 128,999 Ford | 37000.00 | 111999.00(74,999 + 37,000) = 111,999 Pontiac | 18000.00 | 55000.00 Chevrolet | 64999.00 | 82999.00 Chevrolet | 122500.00 | 187499.00 Ford | 53400.00 | 175900.00 Dodge | 73000.00 | 126400.00 Ferrari | 999999.00 | 1072999.00 (10 rows) 27

©2023 Percona | Confidential Rows between 1 preceding and 1
following test=# select brand, price, sum(price) over w from data window w as (rows between 1 preceding and 1 following); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 102000.0048,000 + 54,000 = 102,000 Audi | 54000.00 | 176999.0074,999 + 54,000 + 48,000 = 176,699 Dodge | 74999.00 | 165999.0037,000 + 74,999 + 54,000 = 165,999 Ford | 37000.00 | 129999.0018,000 + 37,000 + 74,999 = 129,999 Pontiac | 18000.00 | 119999.00 Chevrolet | 64999.00 | 205499.00 Chevrolet | 122500.00 | 240899.00 Ford | 53400.00 | 248900.00 Dodge | 73000.00 | 1126399.00 Ferrari | 999999.00 | 1072999.00 (10 rows) 28

©2023 Percona | Confidential Rows between 2 preceding and current
row test=# select brand, price, sum(price) over w from data window w as (rows between 2 preceding and current row); brand | price | sum ------------+-----------+------------ Ford | 48000.00 | 48000.0048,000 = 48,000 Audi | 54000.00 | 102000.0048,000 + 54,000 = 102,000 Dodge | 74999.00 | 176999.0074,999 + 54,000 + 48,000 = 176,999 Ford | 37000.00 | 165999.0037,000 + 74,999 + 54,000 = 165,999 Pontiac | 18000.00 | 129999.00 Chevrolet | 64999.00 | 119999.00 Chevrolet | 122500.00 | 205499.00 Ford | 53400.00 | 240899.00 Dodge | 73000.00 | 248900.00 Ferrari | 999999.00 | 1126399.00 (10 rows) 29

©2023 Percona | Confidential You can also order within a
window test=# select brand, price, sum(price) over w from data window w as (order by brand); brand | price | sum ------------+-----------+------------ Audi | 54000.00 | 54000.00 Chevrolet | 64999.00 | 241499.00 54,000 + 64,999 + 122,500 = 241,499 Chevrolet | 122500.00 | 241499.00 Dodge | 74999.00 | 389498.00 241,499 + 74,999 + 73,000 = 389,498 Dodge | 73000.00 | 389498.00 Ferrari | 999999.00 | 1389497.00 Ford | 53400.00 | 1527897.00 Ford | 37000.00 | 1527897.00 Ford | 48000.00 | 1527897.00 Pontiac | 18000.00 | 1545897.00 (10 rows) 30

©2023 Percona | Confidential Order an range in a window
test=# select brand, price, sum(price) over w from data window w as (order by brand range current row); brand | price | sum ------------+-----------+----------- Audi | 54000.00 | 54000.00 Chevrolet | 64999.00 | 187499.00 64,999 + 122,500 = 187,499 Chevrolet | 122500.00 | 187499.00 Dodge | 74999.00 | 147999.00 74,999 + 73,000 = 147,999 Dodge | 73000.00 | 147999.00 Ferrari | 999999.00 | 999999.00 Ford | 53400.00 | 138400.00 53,400 + 37,000 + 48,000 = 138,400 Ford | 37000.00 | 138400.00 Ford | 48000.00 | 138400.00 Pontiac | 18000.00 | 18000.00 (10 rows) 31

©2023 Percona | Confidential Simple brand average test=# select brand,
count(brand), sum(price), round(avg(price),2) as avg from data group by brand order by brand; brand | count | sum | avg ------------+-------+-----------+----------- Audi | 1 | 54000.00 | 54000.00 54,000/1 = 54,000 Chevrolet | 2 | 187499.00 | 93749.50 187,499/2 = 93,749 Dodge | 2 | 147999.00 | 73999.50 Ferrari | 1 | 999999.00 | 999999.00 Ford | 3 | 138400.00 | 46133.33 Pontiac | 1 | 18000.00 | 18000.00 (6 rows) 32

©2023 Percona | Confidential Rollup test=# select brand, count(brand), sum(price),
round(avg(price),2) as avg from data group by rollup(brand) order by brand; brand | count | sum | avg ------------+-------+------------+----------- Audi | 1 | 54000.00 | 54000.00 Chevrolet | 2 | 187499.00 | 93749.50 Dodge | 2 | 147999.00 | 73999.50 Ferrari | 1 | 999999.00 | 999999.00 Ford | 3 | 138400.00 | 46133.33 Pontiac | 1 | 18000.00 | 18000.00 | 10 | 1545897.00 | 154589.70 This is the rollup line (7 rows) 33 NULL hiding here

©2023 Percona | Confidential CUBE - Multiple Grouping Sets test=#
select brand, count(brand), model, sum(price), round(avg(price),2) as avg from data group by cube(model,brand) order by model,brand; brand | count | model | sum | avg ------------+-------+-----------------+------------+----------- Ferrari | 1 | 250 GTO | 999999.00 | 999999.00 | 1 | 250 GTO | 999999.00 | 999999.00 Ford | 1 | Bronco | 37000.00 | 37000.00 | 1 | Bronco | 37000.00 | 37000.00 Chevrolet | 1 | Camaro | 64999.00 | 64999.00 | 1 | Camaro | 64999.00 | 64999.00 Dodge | 1 | Challenger | 73000.00 | 73000.00 | 1 | Challenger | 73000.00 | 73000.00 Dodge | 1 | Charger | 73000.00 | 73000.00 | 1 | Charger | 73000.00 | 73000.00 Chevrolet | 1 | Corvette | 122500.00 | 122500.00 | 1 | Corvette | 122500.00 | 122500.00 Pontiac | 1 | GTO | 18000.00 | 18000.00 | 1 | GTO | 18000.00 | 18000.00 Ford | 2 | Mustang | 101400.00 | 50700.00 | 2 | Mustang | 101400.00 | 50700.00 Audi | 1 | TT | 73000.00 | 73000.00 | 1 | TT | 73000.00 | 73000.00 Audi | 1 | | 73000.00 | 73000.00 Chevrolet | 2 | | 187499.00 | 93749.50 Dodge | 2 | | 146000.00 | 73000.00 Ferrari | 1 | | 999999.00 | 999999.00 Ford | 3 | | 138400.00 | 46133.33 Pontiac | 1 | | 18000.00 | 18000.00 | 10 | | 1562898.00 | 156289.80 (25 rows) 34

©2023 Percona | Confidential The Difference from average test=# select
brand, model, price, round(AVG(price) over(), 2) as Avg, round(price - AVG(price) over (),2) as Difference from data order by price; brand | model | price | avg | difference ------------+-----------------+-----------+-----------+------------ Pontiac | GTO | 18000.00 | 154589.70 | -136589.70 Ford | Bronco | 37000.00 | 154589.70 | -117589.70 Ford | Mustang | 48000.00 | 154589.70 | -106589.70 Ford | Mustang | 53400.00 | 154589.70 | -101189.70 Audi | TT | 54000.00 | 154589.70 | -100589.70 Chevrolet | Camaro | 64999.00 | 154589.70 | -89590.70 Dodge | Challenger | 73000.00 | 154589.70 | -81589.70 Dodge | Charger | 74999.00 | 154589.70 | -79590.70 Chevrolet | Corvette | 122500.00 | 154589.70 | -32089.70 Ferrari | 250 GTO | 999999.00 | 154589.70 | 845409.30 (10 rows) 37

©2023 Percona | Confidential It is possible to get very
complicated 39 test=# select model, price, price - LAST_VALUE(price) over w as delta, round ((price - LAST_VALUE(price) over w) / LAST_VALUE(price) over w * 100) as percent_more from data window w as (order by price desc rows between unbounded preceding and unbounded following) order by price desc; model | price | delta | percent_more -----------------+-----------+-----------+-------------- 250 GTO | 999999.00 | 981999.00 | 5456 Corvette | 122500.00 | 104500.00 | 581 Charger | 74999.00 | 56999.00 | 317 Challenger | 73000.00 | 55000.00 | 306 Camaro | 64999.00 | 46999.00 | 261 TT | 54000.00 | 36000.00 | 200 Mustang | 53400.00 | 35400.00 | 197 Mustang | 48000.00 | 30000.00 | 167 48K is 167% of 18K Bronco | 37000.00 | 19000.00 | 106 37K is 106% of 18K GTO | 18000.00 | 0.00 | 0 (10 rows)

©2023 Percona | Confidential Need to update the data test=#
update data set price = 73000.0 where id=3 or id = 2; UPDATE 2 40

©2023 Percona | Confidential RANK() & DENSE_RANK() test=# select id,
model, price, RANK() over w, DENSE_RANK() over w from data WINDOW w as (order by price desc) order by price desc; id | model | price | rank | dense_rank ----+-----------------+-----------+------+------------ 5 | 250 GTO | 999999.00 | 1 | 1 8 | Corvette | 122500.00 | 2 | 2 10 | Challenger | 73000.00 | 3 | 3 3 | Charger | 73000.00 | 3 | 3 2 | TT | 73000.00 | 3 | 3 7 | Camaro | 64999.00 | 6 | 4 9 | Mustang | 53400.00 | 7 | 5 1 | Mustang | 48000.00 | 8 | 6 4 | Bronco | 37000.00 | 9 | 7 6 | GTO | 18000.00 | 10 | 8 (10 rows) 41

Just hit the basics More out there

©2023 Percona | Confidential Window Function Functions CUME_DIST() Cumulative distribution
value DENSE_RANK() Rank of current row within its partition, without gaps FIRST_VALUE() Value of argument from first row of window frame LAG() Value of argument from row lagging current row within partition LAST_VALUE()Value of argument from last row of window frame LEAD() Value of argument from row leading current row within partition NTH_VALUE() Value of argument from N-th row of window frame NTILE()Bucket number of current row within its partition. PERCENT_RANK() Percentage rank value RANK() Rank of current row within its partition, with gaps ROW_NUMBER() Number of current row within its partition 43

Wrap Up

©2023 Percona | Confidential Recommendations Window Functions are useful •
Practice needed • Double check everything • Don’t forget to use aggregate functions 46

percona.com THANK YOU!

Window Functions

Window Functions

More Decks by David Stokes

Other Decks in Technology

Featured

Transcript