Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scale x17 - Query analytics for the day-to-day ...

Scale x17 - Query analytics for the day-to-day developer with MySQL 8.0

MySQL 8 is full of new features and amongst them, we have CTE and Window Functions. Really common when using data warehouses, these features can help you analyze what is going on in your business with just a simple query, no ETL needed! I am not saying you will never again have to export data to OLAP databases, but sometimes, being able to do these analyses on the main database could save the day!

Avatar for Gabriela D'Ávila Ferrara

Gabriela D'Ávila Ferrara

March 08, 2019
Tweet

More Decks by Gabriela D'Ávila Ferrara

Other Decks in Programming

Transcript

  1. Query analytics for the day-to-day developer with MySQL 8.0 Gabriela

    D'Ávila Ferrara @gabidavila Developer Advocate @ Google Cloud ^(gabi|gabby)\.dev$
  2. FizzBuzz WITH RECURSIVE fizz_buzz (sequence, modulo_3, modulo_5) AS ( SELECT

    1, CAST('' AS CHAR(4)), CAST('' AS CHAR(5)) UNION ALL SELECT sequence + 1, IF(MOD(sequence + 1, 3) = 0, 'Fizz', ''), IF(MOD(sequence + 1, 5) = 0, 'Buzz', '') FROM fizz_buzz WHERE sequence < 100 ) SELECT IF( CONCAT(modulo_3, modulo_5) = '', sequence, CONCAT(modulo_3, modulo_5)) AS fizzbuzz FROM fizz_buzz;
  3. Brief History • Created to handle from up to 10

    to 100M rows or around 100MB/table • Now supports terabyte- sized databases • Supports SQL standards as new as SQL 2016 • But... some stuff from SQL 2003 just became available (Window Functions, CTEs)
  4. Don’t do this in production! • Just because you can,

    doesn’t mean you should! • The cost of analytical queries are too high • Run on a replica if you must to • Remember this is just to help you in an emergency
  5. How often you write in raw SQL… • A CRUD

    operation? • A DML operation? • Create/edit a Function? • Create/edit a Procedure? • A View?
  6. Have you ever… • Generated reports using a scripting language?

    (Python, PHP) • Did an ETL? • Synced data across different types of databases? (i.e. full-text search)
  7. Let’s say you own a store 🏬 💵 💻 Tables:

    • products • users • orders • order_items
  8. And you want several reports • Most expensive order per

    user • The highest price each product was ever sold and what was the date of that • The monthly amount sold in a year together with the growth
  9. SHOW CREATE TABLE `users`; CREATE TABLE `users` ( `id` bigint(20)

    UNSIGNED NOT NULL AUTO_INCREMENT, `username` varchar(255) NOT NULL, `created_at` timestamp NOT NULL, `updated_at` timestamp NOT NULL )
  10. Who never did this? SELECT users.id, users.username, (SELECT id FROM

    orders WHERE users.id = user_id ORDER BY total LIMIT 1) AS order_id, (SELECT total FROM orders WHERE users.id = user_id ORDER BY total LIMIT 1) AS order_total FROM users ORDER BY users.id LIMIT 10; The most expensive order for each user
  11. LATERAL SELECT users.id, users.username, total_orders.id AS order_id, total_orders.total AS order_total

    FROM users, LATERAL( SELECT id, total FROM orders WHERE users.id = user_id ORDER BY total LIMIT 1 ) AS total_orders ORDER BY users.id LIMIT 10;
  12. OLTP: Online Transaction Processing • Transactional • Single-source of truth

    • Normalized (usually) • Examples: MySQL, PostgreSQL, SQL Server, etc.
  13. Analytical clause, up to 4 parts • Analytical function •

    Partitioning Clause • Order by Clause • Windowing Clause
  14. What they do? • Allows to analyze the rows of

    a given result set • Can behave like a GROUP BY without changing the result set • Allows you to use a frame to "peek" OVER a PARTITION of a window
  15. Window Functions • Examples: • Enumerate rows - ROW_NUMBER() •

    Show Aggregated sums - SUM() • Rank results - RANK(), DENSE_RANK() • Look at neighboring rows - LEAD(), LAG()
  16. SHOW CREATE TABLE `orders`; CREATE TABLE `orders` ( `id` bigint(20)

    UNSIGNED NOT NULL AUTO_INCREMENT, `user_id` bigint(20) UNSIGNED NOT NULL, `status` varchar(20) NOT NULL DEFAULT 'new', `total` decimal(10,2) NOT NULL, `created_at` datetime NOT NULL, `updated_at` datetime NOT NULL )
  17. SELECT … FROM `orders` WHERE … ORDER BY created_at +---------+---------+-----------+---------------------+

    | id | user_id | status | created_at | +---------+---------+-----------+---------------------+ | 6534954 | 654321 | canceled | 2011-08-21 19:06:59 | | 6534949 | 654321 | canceled | 2013-11-16 09:02:19 | | 6534953 | 654321 | completed | 2015-10-28 02:21:30 | | 6534951 | 654321 | new | 2018-01-05 17:12:23 | | 6534952 | 654321 | new | 2018-07-19 04:23:41 | | 6534955 | 654321 | new | 2018-11-12 05:37:48 | | 6534950 | 654321 | pending | 2018-12-20 06:11:23 | +---------+---------+-----------+---------------------+
  18. SELECT … FROM `orders` WHERE … ORDER BY created_at +---------+---------+-----------+---------------------+---------------------+---------------------+

    | id | user_id | status | previous_order | created_at | next_order | +---------+---------+-----------+---------------------+---------------------+---------------------+ | 6534954 | 654321 | canceled | NULL | 2011-08-21 19:06:59 | 2013-11-16 09:02:19 | | 6534949 | 654321 | canceled | 2011-08-21 19:06:59 | 2013-11-16 09:02:19 | 2015-10-28 02:21:30 | | 6534953 | 654321 | completed | 2013-11-16 09:02:19 | 2015-10-28 02:21:30 | 2018-01-05 17:12:23 | | 6534951 | 654321 | new | 2015-10-28 02:21:30 | 2018-01-05 17:12:23 | 2018-07-19 04:23:41 | | 6534952 | 654321 | new | 2018-01-05 17:12:23 | 2018-07-19 04:23:41 | 2018-11-12 05:37:48 | | 6534955 | 654321 | new | 2018-07-19 04:23:41 | 2018-11-12 05:37:48 | 2018-12-20 06:11:23 | | 6534950 | 654321 | pending | 2018-11-12 05:37:48 | 2018-12-20 06:11:23 | NULL | +---------+---------+-----------+---------------------+---------------------+---------------------+
  19. Previous and Next orders | LAG and LEAD SELECT id,

    user_id, status, LAG(created_at) OVER(ORDER BY created_at) AS previous_order, created_at, LEAD(created_at) OVER(ORDER BY created_at) AS next_order FROM orders WHERE user_id = 654321 ORDER BY created_at LIMIT 10;
  20. Repetition? SELECT id, user_id, status, LAG(created_at) OVER(ORDER BY created_at) AS

    previous_order, created_at, LEAD(created_at) OVER(ORDER BY created_at) AS next_order FROM orders WHERE user_id = 654321 ORDER BY created_at LIMIT 10;
  21. Named Windows! SELECT id, user_id, status, LAG(created_at) OVER(dates) AS previous_order,

    created_at, LEAD(created_at) OVER(dates) AS next_order FROM orders WHERE user_id = 654321 WINDOW dates AS (ORDER BY created_at) ORDER BY created_at LIMIT 10;
  22. Common Table Expressions • Similar to CREATE [TEMPORARY] TABLE •

    Doesn’t need CREATE privilege • Can reference other CTEs (if those are already defined) • Can be recursive • Easier to read
  23. Recursive CTE • Useful with hierarchical data • The Recipe

    is: • Base query comes first • Second query comes after an UNION statement • And the stop condition should be on the recursive call
  24. FizzBuzz WITH RECURSIVE fizz_buzz (sequence, modulo_3, modulo_5) AS ( SELECT

    1, CAST('' AS CHAR(4)), CAST('' AS CHAR(5)) UNION ALL SELECT sequence + 1, IF(MOD(sequence + 1, 3) = 0, 'Fizz', ''), IF(MOD(sequence + 1, 5) = 0, 'Buzz', '') FROM fizz_buzz WHERE sequence < 100 ) SELECT IF( CONCAT(modulo_3, modulo_5) = '', sequence, CONCAT(modulo_3, modulo_5)) AS fizzbuzz FROM fizz_buzz;
  25. SELECT * FROM `store`.`categories`; +------+-------------------------------+--------------------+ | id | name |

    parent_category_id | +------+-------------------------------+--------------------+ | 1 | Animal | 0 | | 2 | Plant | 0 | | 3 | Dog | 1 | | 4 | Cat | 1 | | 5 | Tulip | 10 | | 6 | West Highlander White Terrier | 12 | | 7 | Lettuce | 11 | | 8 | Sunflower | 10 | | 10 | Flowers | 2 | | 11 | Veggies | 2 | | 12 | Terrier | 3 | +------+-------------------------------+--------------------+ 11 rows in set (0.00 sec)
  26. Recursive CTE WITH RECURSIVE tree (depth_level, node, path, node_id) AS

    ( SELECT 1, CAST('root' AS CHAR(255)), CAST('root' AS CHAR(65535)), 0 ) SELECT * FROM tree;
  27. Recursive CTE WITH RECURSIVE tree (depth_level, node, path, node_id) AS

    ( SELECT 1, CAST('root' AS CHAR(255)), CAST('root' AS CHAR(65535)), 0 UNION ALL SELECT tree.depth_level + 1, categories.name, CONCAT_WS('/', tree.path, categories.name), categories.id FROM tree INNER JOIN categories ON tree.node_id = categories.parent_category_id WHERE tree.depth_level < 5 ) SELECT * FROM tree ORDER BY path;