Upgrade to Pro — share decks privately, control downloads, hide ads and more …

calendar-tables.pdf

 calendar-tables.pdf

Generating a table of calendar information in SQL can be done with little effort using two trivial tables -- no for-loops or iterators required. This talk describes the PostgreSQL interval and daterange types used for a calendar along with SQL to generate the full set of daterange rows to fill the calendar.

It's a good exercise in what Joe Celko calls "Thinking in Sets" and shows how just how little SQL it takes to manage the data.

This includes some useful general tricks for managing tables of constants and using sequence-generating functions.

Steven Lembark

June 06, 2023
Tweet

More Decks by Steven Lembark

Other Decks in Technology

Transcript

  1. Calendar Table Periods by type: Annual, Quarterly... Effective window by

    start/end date. Hierarchy of periods: Quarters within Years. Pseudo-keys for each period.
  2. Calendar Table Uses: Partition keys for temporal tables: Surrogate keys

    for calendar dates. Calculate period intersections: Week-ending dates for this year’s months? Capacity planning: What weeks will cover end-of-quarter reporting? What weekdys cover month-ending data?
  3. Defining a calendar. Start with two relations: Start Date +

    Calendar Size Period Names + Sizes Fiscal calendar has arbitrary start. Time constraint require a 01-Jan start here.
  4. Defining a calendar. Start with two relations: ( start, window

    ) calendar size ( type, size ) period elements
  5. Handling enums One way: table foo ( date_type enum (

    ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Embed enum def’s in each table.
  6. Handling enums One way: table foo ( date_type enum (

    ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Repeated declaration: table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’mnoth’ ) … );
  7. Handling enums One way: embedded. table foo ( date_type enum

    ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Oops... table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’mnoth’ ) … );
  8. Handling enums Better way: create type date_t enum ( ‘year’,

    ‘month’ ); table foo ( date_type date_t, … ); table bar ( date_type date_t, … );
  9. Handling enums Updates made once: create type date_t enum (

    ‘year’, ‘month’, ‘quarter’ ); table foo ( date_type date_t, … ); table bar ( date_type date_t, … );
  10. Intervals in time YEAR MONTH DAY HOUR MINUTE SECOND YEAR

    TO MONTH DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR TO MINUTE HOUR TO SECOND MINUTE TO SECOND
  11. Intervals in time with rounding YEAR MONTH DAY HOUR MINUTE

    SECOND YEAR TO MONTH DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR TO MINUTE HOUR TO SECOND MINUTE TO SECOND
  12. Interval input interval ‘1 year’ interval ‘2 years’ interval ‘2

    years 1 month 3 days 5 hours 6 seconds’ PG handles leap year for you: ‘31-Jan-2000’::date + interval ‘29 days’
  13. Definine the period window The definition of start date +

    window. Should have only one row. Minimal definition of start time. Minimal definition of end time.
  14. Definine the period window create table period_window ( one_row integer

    not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: Single definition.
  15. Definine the period window create table period_window ( one_row integer

    not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: Single definition.
  16. Definine the period window create table period_window ( one_row integer

    not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: First year: 2000 to current. Fiscal calendar uses date.
  17. Definine the period window create table period_window ( one_row integer

    not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: First year: Interval: Defined in years.
  18. Simpler constant tables one_row: period_window can’t have two rows. It

    can still be empty! Check constraint doesn’t prevent deletion.
  19. Simpler constant tables Fix: Use a view. No select or

    base table. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;
  20. Simpler constant tables Fix: Use a view. No select or

    base table. No metadata column. Just values. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;
  21. Simpler constant tables Fix: Use a view. No select or

    base table. No metadata column. Just values. Cast ensures typing. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;
  22. Define time units Start with units of time. Convienent names:

    create type period_t as enum ( ‘annual’ , ‘quarterly’ , ‘monthly’ , ‘weekly’ , ‘daily’ ) ;
  23. Define time units Start with units of time. Map name

    to intervals: create type period_t as enum ( ‘annual’ , ‘quarterly’ , ‘monthly’ , ‘weekly’ , ‘daily’ ) ; create table period_interval ( period_type period_t primary key , period_size interval not null , unique ( period ) );
  24. Loading initial data Not much data: Integer + window. Few

    basic types. create view period_window ( base_year , final_window ) as values ( cast 2018 as int ) , ( cast ‘10 years’ as interval ) ; insert into period_interval ( period_size , period_type ) values ( '1 year' , ‘annual’ ) , ( '3 months' , ‘quarterly’ ) , ( '1 month' , ‘monthly’ ) , ( ‘7 days’ , ‘weekly’ ) , ( '1 day' , ‘daily’ ) ;
  25. Generate a calendar “generated...” Replace sequence type. Allows cloning: no

    external sequence. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );
  26. Generate a calendar Re-use enum via type. create table period

    ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );
  27. Generate a calendar Periods are ranges. Periods are unique. create

    table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );
  28. Generate a calendar Next step: Populate it. With SQL. NO

    imperiatives. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );
  29. First step: Start date. We have an integer: base_year. Result:

    Starting timestamp. Window to final date. select make_timestamp ( base_year, 1, 1, 0, 0, 0 ) "base_date" , final_window from period_window Generate a calendar
  30. “Common Table Expression” “base_date” is a pseudo-table. Defined for one

    query. with base_date as ( select make_timestamp ( base_year, 1, 1, 0, 0, 0 ) "base_date" , final_window from period_window ) Generate a calendar
  31. “cross join”: Combine: start+window period size + type. with base_date

    as ... ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar
  32. “cross join”: This produces six rows. Cross joins are not

    evil. with base_date as ... ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar
  33. generate_series: Multiple rows. From base to final. Units of period

    interval. Generate a calendar with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z
  34. ‘annual’ , interval ‘1 year’ , ‘01-Jan-2020’ ... ‘annual’ ,

    interval ‘1 year’ , ‘01-Jan-2021’ with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar
  35. ‘quarterly’ , interval ‘3 months’ , ‘01-Jan-2020’ ... ‘annual’ ,

    interval ‘3 months’ , ‘01-Apr-2020’ with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar
  36. Syntax: Sub-queries get names. with base_date as ... from (

    select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar
  37. Sequence of values: “daterange” composite: lower upper with base_date as

    ... select period_type , daterange ( start::date , ( start + period )::date , '[)' ) from ( select period_type , period_size , generate_series ... “start” ... )z Generate a calendar
  38. Result: Periods by width. period_sk | period_type | period ----------+-------------+-------------------------

    1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar
  39. Result: ‘[ … , … )’ Non-inclusive upper. Ranges don’t

    intersect. period_sk | period_type | period ----------+-------------+------------------------- 1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar
  40. Result: ‘[ … , … )’ “2019-01-01” matches #2 &

    #16 on [2019-01-01, … ) period_sk | period_type | period ----------+-------------+------------------------- 1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar
  41. Selecting from the calendar “Range contains” ‘@>’ select period_sk ,

    period_type , period from period where period @> '15-Mar-2021'::date order by period_type , period ;
  42. Selecting from the calendar All period types. No match on:

    ‘15-Mar-2021’ ) select period_sk , period_type , period from period where period @> '15-Mar-2021'::date order by period_type , period ; period_sk | period_type | period -----------+-------------+------------------------- 4 | annual | [2021-01-01,2022-01-01) 24 | quarterly | [2021-01-01,2021-04-01) 91 | monthly | [2021-03-01,2021-04-01) 341 | weekly | [2021-03-15,2021-03-22) 1865 | single | [2021-03-15,2021-03-16)
  43. Selecting from the calendar Intersecting: ‘&&’ select period_sk, period_type, period

    from curr_report_period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' order by period_type , period ; period_sk | period_type | period -----------+-------------+------------------------- 24 | quarterly | [2021-01-01,2021-04-01) 25 | quarterly | [2021-04-01,2021-07-01) 26 | quarterly | [2021-07-01,2021-10-01) 27 | quarterly | [2021-10-01,2022-01-01) 28 | quarterly | [2022-01-01,2022-04-01) 29 | quarterly | [2022-04-01,2022-07-01) 30 | quarterly | [2022-07-01,2022-10-01)
  44. Indexing the calendar GiST indexes describe arbitrary trees. Originally designed

    for full-text search. Search where ‘=’ doesn’t apply.
  45. Indexing the calendar GiST indexes Good for ranges and areas.

    create index on period using gist ( period ) include ( period_sk ) where active ; create index on period using gist ( period , period_type ) include ( period_sk ) where active
  46. Indexing the calendar GiST indexes Adds un-indexed data to the

    index leaf nodes. Query on period gets SK. create index on period using gist ( period ) include ( period_sk ) where active ; create index on period using gist ( period , period_type ) include ( period_sk ) where active
  47. Using the index Find partitions with quarterly reports including these

    dates. explain select period_sk from period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' ;
  48. Using the index Find partitions with quarterly reports including these

    dates. Partition of data can use period_sk. explain select period_sk from period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' ;
  49. Plan without index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on

    period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows)
  50. Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on

    period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)
  51. Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on

    period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)
  52. Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on

    period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)
  53. Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using

    period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows) Works in PG: Data types & extensibility. Range type & GiST index.
  54. Summary Types for calendar: interval daterange & tsrange & tstzrange

    Series of values: generate_series( start, end, step );
  55. Summary SQL: Common Table Expression (“CTE”). Inlined in Postgres –

    fast. Name subqueries. Cross-join == Cartesian Product.
  56. Bedside reading https://www.postgresql.org/docs/13/... textsearch-indexes.html GIN & GiST rangetypes.html Data types

    datatype-datetime.html Date & Time spec’s functions-datetime.html Date op’s & func’s. functions-range.html Range op’s & func’s.
  57. More bedside reading Excellent book on how to think in

    SQL: Thinking In Sets: Auxilary, Temporal and Vitual Tables in SQL Joe Celko, 2008 Anything he has written is excellent reading. Celko approaches SQL from a practical point of view.
  58. Even More Bedside Reading Excellent look at writing correct SQL:

    SQL and Relational Theory, 3rd Edition C.J. Date, 2015 Date, the original author of SQL and the original advocate for relational databases in general, is a brilliant mathematician. He writes like one, but this is an excellent book for anyone who authors SQL.