Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Window_Functions_in_SQL.pdf

Avatar for Aakash N S Aakash N S
May 14, 2022
320

 Window_Functions_in_SQL.pdf

Avatar for Aakash N S

Aakash N S

May 14, 2022
Tweet

Transcript

  1. Syntax for Window Functions <function_type> OVER ( PARTITION BY clause

    ORDER BY clause ROWS or RANGE clause ) AS column_name
  2. • Offices & employees worldwide • Customers are toy/gift stores

    • Dedicated sales representative • Pay for multiple orders at once
  3. Local Database Setup • Download the SQL script • Install

    and start MySQL Server • Execute on MySQL Workbench
  4. Problem Solving Approach • Understand inputs & outputs • Articulate

    solution verbally • Construct query step-by-step
  5. Syntax for Window Functions <function_type> OVER ( PARTITION BY clause

    ORDER BY clause ROWS or RANGE clause ) AS column_name
  6. OVER Clause • Required in window functions • Preceded by

    function type • Followed by selection clauses
  7. OVER Clause Q: Show the list of all the payments

    made by the customers of Classic Models, and assign a unique payment number to each payment.
  8. PARTITION BY Clause • Optional but commonly used • Used

    to group rows in windows • Followed by column name
  9. PARTITION BY Clause Q: Express each payment made by a

    customer as a fraction of the total payment made by the customer. Show the checkNumber, customerNumber, paymentDate, amount & fractionOfTotal.
  10. PARTITION BY Clause SELECT *, amount / ( SUM(amount) OVER

    (PARTITION BY customerNumber) ) AS fractionOfTotal FROM payments;
  11. ORDER BY Clause • Determine order within window • Followed

    by column name • Limits frame to current row
  12. ORDER BY Clause Q: Show the running total of payments

    received by Classic Models, starting from the oldest to the newest. Show the checkNumber, customerNumber, paymentDate, amount & runningTotal.
  13. SUM & AVG Q: Show the running total and running

    average of payments by each customer, starting from the oldest to the newest. Also output the customerNumber, checkNumber, and paymentDate.
  14. SUM & AVG SELECT *, SUM(amount) OVER ( PARTITION BY

    customerNumber ORDER BY paymentDate) AS customerRunningTotal, AVG(amount) OVER ( PARTITION BY customerNumber ORDER BY paymentDate) AS customerRunningAverage FROM payments;
  15. COUNT Q: Determine the team size for each employee i.e.

    how many people report to the same manager as the employee. Output employeeNumber, lastName, firstName, reportsTo, and teamSize.
  16. MAX & MIN Q: For each order, show the order

    number, total order amount, and the difference between the order amount and the maximum/minimum order amount for the customer.
  17. MAX & MIN SELECT orderNumber, customerNumber, orderValue, ROUND(MAX(orderValue) OVER (PARTITION

    BY customerNumber) - orderValue, 2) AS belowMax, ROUND(orderValue - MIN(orderValue) OVER (PARTITION BY customerNumber), 2) AS aboveMin FROM (SELECT o.orderNumber, o.customerNumber, SUM(od.quantityOrdered * od.priceEach) AS orderValue FROM orders o JOIN orderdetails od ON o.orderNumber = od.orderNumber GROUP BY o.orderNumber, o.customerNumber) AS ordersWithValues;
  18. ROW_NUMBER Q: Provide serial numbers for the customers of each

    sales representative, sorted by dates of each customer’s earliest orders.
  19. ROW_NUMBER SELECT customerNumber, customerName, salesRepEmployeeNumber, earliestOrder, ROW_NUMBER() OVER (PARTITION BY

    salesRepEmployeeNumber ORDER BY earliestOrder) AS serialNumber FROM (SELECT c.customerNumber, c.customerName, c.salesRepEmployeeNumber, MIN(o.orderDate) AS earliestOrder FROM customers c JOIN orders o ON c.customerNumber = o.customerNumber WHERE c.salesRepEmployeeNumber IS NOT NULL GROUP BY c.customerNumber, c.salesRepEmployeeNumber ) AS co;
  20. RANK & DENSE_RANK Q: Rank the offices of Classic Models

    based on the number of employees (lowest first). Compare the outputs of ROW_NUMBER, RANK, DENSE_RANK and CUME_DIST.
  21. RANK & DENSE_RANK SELECT *, ROW_NUMBER() OVER (ORDER BY employeeCount)

    AS rowNumber, RANK() OVER (ORDER BY employeeCount) AS rankNumber, DENSE_RANK() OVER (ORDER BY employeeCount) AS denseRankNumber, CUME_DIST() OVER (ORDER BY employeeCount) AS cumeDist FROM (SELECT o.officeCode, city, COUNT(*) AS employeeCount FROM offices o JOIN employees e ON o.officeCode = e.officeCode GROUP BY o.officeCode, city) AS oe;
  22. NTILE & PERCENT_RANK • NTILE: put rows into buckets •

    PERCENT_RANK: fraction of values <= than current value • Typically used with ORDER BY
  23. NTILE & PERCENT_RANK Q: Compute the quartile and percentile for

    each payment made by a customer. Output checkNumber, paymentDate, amount, customerNumber, quartile, and percentRank.
  24. NTILE & PERCENT_RANK SELECT *, NTILE(4) OVER (PARTITION BY customerNumber

    ORDER BY amount) AS quartile, PERCENT_RANK() OVER (PARTITION BY customerNumber ORDER BY amount) AS percentRank FROM payments;
  25. LAG Q: For each order, show the number of days

    elapsed since the last order placed by the same customer. Output orderNumber, orderDate, customerNumber, and daysSinceLastOrder.
  26. LAG SELECT orderNumber, orderDate, customerNumber, DATEDIFF(orderDate, LAG(orderDate, 1) OVER (PARTITION

    BY customerNumber ORDER BY orderDate) ) AS daysSinceLastOrder FROM orders;
  27. LEAD Q: List orders for each customer sorted by order

    amount (highest to lowest) and display the difference in amount between each order and the next highest order by the same customer.
  28. LEAD SELECT *, orderValue - LEAD(orderValue, 1) OVER (PARTITION BY

    customerNumber ORDER BY orderValue DESC) AS higherBy FROM (SELECT o.orderNumber, o.orderDate, o.customerNumber, SUM(od.quantityOrdered * od.priceEach) AS orderValue FROM orders o JOIN orderdetails od ON o.orderNumber = od.orderNumber GROUP BY o.orderNumber, o.customerNumber) AS ordersWithValues;
  29. First, Last & Nth Value Q: For each payment made

    by a customer, compute the difference between the payment amount and the first/second/last payment made by the customer.
  30. SELECT *, ROUND(amount - FIRST_VALUE(amount) OVER customersWindow, 2) AS diffFirst,

    ROUND(amount - LAST_VALUE(amount) OVER customersWindow, 2) AS diffLast, ROUND(amount - NTH_VALUE(amount, 2) OVER customersWindow, 2) AS diffSecond FROM payments WINDOW customersWindow AS (PARTITION BY customerNumber ORDER BY amount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); First, Last & Nth Value
  31. Syntax for Window Functions <function_type> OVER ( PARTITION BY clause

    ORDER BY clause ROWS or RANGE clause ) AS column_name
  32. Q: Find the percentage of the total spend a customer

    spent on each order on Amazon. Output the customer’s name, order details, and the percentage of their total spend for each order transaction rounded to the nearest whole number. Amazon Interview Question
  33. Amazon Interview Question CUSTOMERS id name city address phone_number ORDERS

    id customer_id order_date order_quantity order_details order_cost
  34. SELECT c.name, o.order_details, ROUND(o.order_cost * 100 / CAST(SUM(o.order_cost) OVER (PARTITION

    BY c.name) AS FLOAT) ) AS percentage_of_total FROM orders o JOIN customers c ON c.id = o.customer_id ORDER BY c.name Amazon Interview Question
  35. Q: You’re given a dataset of Uber rides with the

    traveling distance and cost for each ride. For each date, find the difference between the distance per dollar for that date and the average distance per dollar for that month. Uber Interview Question
  36. Q: Estimate the growth of Airbnb each year using the

    number of hosts registered as the growth metric. rate_of_growth = 100 * (hosts_registered_this_year - hosts_registered_last_year) / hosts_registered_last_year Airbnb Interview Question
  37. Q: Find the top three unique salaries for each department

    in the company. Order the results alphabetically by department, and then by highest salary to lowest. Twitter Interview Question
  38. Q: ABC Insurance Corp has developed a ML based predictive

    model to identify propensity of fraud claims. List the top 5 percentile claims identified by the model from each state in the US. Display the policy number, state, claim cost, and fraud score for each result. Netflix Interview Question