Slide 1

Slide 1 text

Using PostgreSQL Ranges & Intervals: Data structures & SQL for calendars. Steven Lembark Workhorse Computing [email protected]

Slide 2

Slide 2 text

Calendar Table Periods by type: Annual, Quarterly... Effective window by start/end date. Hierarchy of periods: Quarters within Years. Pseudo-keys for each period.

Slide 3

Slide 3 text

Calendar Table Uses: Partition keys for temporal tables: Surrogate keys for calendar dates. Calculate period intersections: Week-ending dates for this year’s months? Capacity planning: What weeks will cover end-of-quarter reporting? What weekdys cover month-ending data?

Slide 4

Slide 4 text

Defining a calendar. Start with two relations: Start Date + Calendar Size Period Names + Sizes Fiscal calendar has arbitrary start. Time constraint require a 01-Jan start here.

Slide 5

Slide 5 text

Defining a calendar. Start with two relations: ( start, window ) calendar size ( type, size ) period elements

Slide 6

Slide 6 text

Defining a calendar. Start with two relations: ( date, interval ) PG data types ( enum, interval )

Slide 7

Slide 7 text

Defining a date “date” type. Stringy definitions: ’01-Feb-2021’ ’2021.02.01’ Has no time. Conversion to timestamp with ‘00:00:00’

Slide 8

Slide 8 text

Handling enums One way: table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Embed enum def’s in each table.

Slide 9

Slide 9 text

Handling enums One way: table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Repeated declaration: table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’mnoth’ ) … );

Slide 10

Slide 10 text

Handling enums One way: embedded. table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’month’ ) … ); Oops... table foo ( date_type enum ( ‘year’,’month’ ) … ); table bar ( date_type enum ( ‘year’,’mnoth’ ) … );

Slide 11

Slide 11 text

Handling enums Better way: create type date_t enum ( ‘year’, ‘month’ ); table foo ( date_type date_t, … ); table bar ( date_type date_t, … );

Slide 12

Slide 12 text

Handling enums Updates made once: create type date_t enum ( ‘year’, ‘month’, ‘quarter’ ); table foo ( date_type date_t, … ); table bar ( date_type date_t, … );

Slide 13

Slide 13 text

Intervals in time Intervals define periods between dates and times. Offset timestamp, date, time values.

Slide 14

Slide 14 text

Intervals in time YEAR MONTH DAY HOUR MINUTE SECOND YEAR TO MONTH DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR TO MINUTE HOUR TO SECOND MINUTE TO SECOND

Slide 15

Slide 15 text

Intervals in time with rounding YEAR MONTH DAY HOUR MINUTE SECOND YEAR TO MONTH DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR TO MINUTE HOUR TO SECOND MINUTE TO SECOND

Slide 16

Slide 16 text

Interval input interval ‘1 year’ interval ‘2 years’ interval ‘2 years 1 month 3 days 5 hours 6 seconds’ PG handles leap year for you: ‘31-Jan-2000’::date + interval ‘29 days’

Slide 17

Slide 17 text

Definine the period window The definition of start date + window. Should have only one row. Minimal definition of start time. Minimal definition of end time.

Slide 18

Slide 18 text

Definine the period window create table period_window ( one_row integer not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: Single definition.

Slide 19

Slide 19 text

Definine the period window create table period_window ( one_row integer not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: Single definition.

Slide 20

Slide 20 text

Definine the period window create table period_window ( one_row integer not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: First year: 2000 to current. Fiscal calendar uses date.

Slide 21

Slide 21 text

Definine the period window create table period_window ( one_row integer not null primary key default 0 check( one_row = 0 ) , base_year integer not null check ( base_year > 2000 and base_year <= date_part('year', now() ) ) , final_window interval year not null check ( final_window > interval '0 year' and final_window < interval '20 year' ) ); One row: First year: Interval: Defined in years.

Slide 22

Slide 22 text

Simpler constant tables one_row: period_window can’t have two rows.

Slide 23

Slide 23 text

Simpler constant tables one_row: period_window can’t have two rows. It can still be empty! Check constraint doesn’t prevent deletion.

Slide 24

Slide 24 text

Simpler constant tables Fix: Use a view. No select or base table. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;

Slide 25

Slide 25 text

Simpler constant tables Fix: Use a view. No select or base table. No metadata column. Just values. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;

Slide 26

Slide 26 text

Simpler constant tables Fix: Use a view. No select or base table. No metadata column. Just values. Cast ensures typing. create view period_window ( base_year , final_window ) as values ( cast 2022 as int ) , ( cast ‘10 years’ as interval ) ;

Slide 27

Slide 27 text

Define time units Start with units of time. Convienent names: create type period_t as enum ( ‘annual’ , ‘quarterly’ , ‘monthly’ , ‘weekly’ , ‘daily’ ) ;

Slide 28

Slide 28 text

Define time units Start with units of time. Map name to intervals: create type period_t as enum ( ‘annual’ , ‘quarterly’ , ‘monthly’ , ‘weekly’ , ‘daily’ ) ; create table period_interval ( period_type period_t primary key , period_size interval not null , unique ( period ) );

Slide 29

Slide 29 text

Loading initial data Not much data: Integer + window. Few basic types. create view period_window ( base_year , final_window ) as values ( cast 2018 as int ) , ( cast ‘10 years’ as interval ) ; insert into period_interval ( period_size , period_type ) values ( '1 year' , ‘annual’ ) , ( '3 months' , ‘quarterly’ ) , ( '1 month' , ‘monthly’ ) , ( ‘7 days’ , ‘weekly’ ) , ( '1 day' , ‘daily’ ) ;

Slide 30

Slide 30 text

Generate a calendar “generated...” Replace sequence type. Allows cloning: no external sequence. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );

Slide 31

Slide 31 text

Generate a calendar Re-use enum via type. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );

Slide 32

Slide 32 text

Generate a calendar Periods are ranges. Periods are unique. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );

Slide 33

Slide 33 text

Generate a calendar Next step: Populate it. With SQL. NO imperiatives. create table period ( period_sk integer generated by default as identity primary key , period_type period_t references period_interval , period daterange not null , unique ( period ) );

Slide 34

Slide 34 text

First step: Start date. We have an integer: base_year. Result: Starting timestamp. Window to final date. select make_timestamp ( base_year, 1, 1, 0, 0, 0 ) "base_date" , final_window from period_window Generate a calendar

Slide 35

Slide 35 text

“Common Table Expression” “base_date” is a pseudo-table. Defined for one query. with base_date as ( select make_timestamp ( base_year, 1, 1, 0, 0, 0 ) "base_date" , final_window from period_window ) Generate a calendar

Slide 36

Slide 36 text

“cross join”: Combine: start+window period size + type. with base_date as ... ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar

Slide 37

Slide 37 text

“cross join”: This produces six rows. Cross joins are not evil. with base_date as ... ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar

Slide 38

Slide 38 text

generate_series: Multiple rows. From base to final. Units of period interval. Generate a calendar with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z

Slide 39

Slide 39 text

‘annual’ , interval ‘1 year’ , ‘01-Jan-2020’ ... ‘annual’ , interval ‘1 year’ , ‘01-Jan-2021’ with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar

Slide 40

Slide 40 text

‘quarterly’ , interval ‘3 months’ , ‘01-Jan-2020’ ... ‘annual’ , interval ‘3 months’ , ‘01-Apr-2020’ with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar

Slide 41

Slide 41 text

Syntax: Sub-queries get names. with base_date as ... from ( select period_type , period , generate_series ( base_date , base_date + final_window , period_size ) "start" from base_date cross join period_interval ) z Generate a calendar

Slide 42

Slide 42 text

Sequence of values: “daterange” composite: lower upper with base_date as ... select period_type , daterange ( start::date , ( start + period )::date , '[)' ) from ( select period_type , period_size , generate_series ... “start” ... )z Generate a calendar

Slide 43

Slide 43 text

Result: Periods by width. period_sk | period_type | period ----------+-------------+------------------------- 1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar

Slide 44

Slide 44 text

Result: ‘[ … , … )’ Non-inclusive upper. Ranges don’t intersect. period_sk | period_type | period ----------+-------------+------------------------- 1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar

Slide 45

Slide 45 text

Result: ‘[ … , … )’ “2019-01-01” matches #2 & #16 on [2019-01-01, … ) period_sk | period_type | period ----------+-------------+------------------------- 1 | annual | [2018-01-01,2019-01-01) 2 | annual | [2019-01-01,2020-01-01) 3 | annual | [2020-01-01,2021-01-01) 4 | annual | [2021-01-01,2022-01-01) 5 | annual | [2022-01-01,2023-01-01) 6 | annual | [2023-01-01,2024-01-01) 7 | annual | [2024-01-01,2025-01-01) 8 | annual | [2025-01-01,2026-01-01) 9 | annual | [2026-01-01,2027-01-01) 10 | annual | [2027-01-01,2028-01-01) 11 | annual | [2028-01-01,2029-01-01) 12 | quarterly | [2018-01-01,2018-04-01) 13 | quarterly | [2018-04-01,2018-07-01) 14 | quarterly | [2018-07-01,2018-10-01) 15 | quarterly | [2018-10-01,2019-01-01) 16 | quarterly | [2019-01-01,2019-04-01) 17 | quarterly | [2019-04-01,2019-07-01) 18 | quarterly | [2019-07-01,2019-10-01) Generate a calendar

Slide 46

Slide 46 text

Selecting from the calendar “Range contains” ‘@>’ select period_sk , period_type , period from period where period @> '15-Mar-2021'::date order by period_type , period ;

Slide 47

Slide 47 text

Selecting from the calendar All period types. No match on: ‘15-Mar-2021’ ) select period_sk , period_type , period from period where period @> '15-Mar-2021'::date order by period_type , period ; period_sk | period_type | period -----------+-------------+------------------------- 4 | annual | [2021-01-01,2022-01-01) 24 | quarterly | [2021-01-01,2021-04-01) 91 | monthly | [2021-03-01,2021-04-01) 341 | weekly | [2021-03-15,2021-03-22) 1865 | single | [2021-03-15,2021-03-16)

Slide 48

Slide 48 text

Selecting from the calendar Intersecting: ‘&&’ select period_sk, period_type, period from curr_report_period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' order by period_type , period ; period_sk | period_type | period -----------+-------------+------------------------- 24 | quarterly | [2021-01-01,2021-04-01) 25 | quarterly | [2021-04-01,2021-07-01) 26 | quarterly | [2021-07-01,2021-10-01) 27 | quarterly | [2021-10-01,2022-01-01) 28 | quarterly | [2022-01-01,2022-04-01) 29 | quarterly | [2022-04-01,2022-07-01) 30 | quarterly | [2022-07-01,2022-10-01)

Slide 49

Slide 49 text

Indexing the calendar GiST indexes describe arbitrary trees. Originally designed for full-text search. Search where ‘=’ doesn’t apply.

Slide 50

Slide 50 text

Indexing the calendar GiST indexes Good for ranges and areas. create index on period using gist ( period ) include ( period_sk ) where active ; create index on period using gist ( period , period_type ) include ( period_sk ) where active

Slide 51

Slide 51 text

Indexing the calendar GiST indexes Adds un-indexed data to the index leaf nodes. Query on period gets SK. create index on period using gist ( period ) include ( period_sk ) where active ; create index on period using gist ( period , period_type ) include ( period_sk ) where active

Slide 52

Slide 52 text

Using the index Find partitions with quarterly reports including these dates. explain select period_sk from period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' ;

Slide 53

Slide 53 text

Using the index Find partitions with quarterly reports including these dates. Partition of data can use period_sk. explain select period_sk from period where period && daterange( '15-Mar-2021', '20-Aug-2022' ) and period_type = 'quarterly' ;

Slide 54

Slide 54 text

Plan without index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows)

Slide 55

Slide 55 text

Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)

Slide 56

Slide 56 text

Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)

Slide 57

Slide 57 text

Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on period (cost=15.87..80.54 rows=1 width=4) Recheck Cond: active Filter: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) -> Bitmap Index Scan on period_period_sk_idx (cost=0.00..15.87 rows=1578 width=0) (4 rows) QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows)

Slide 58

Slide 58 text

Plan with index QUERY PLAN ------------------------------------------------------------------------------------------- Index Only Scan using period_period_period_type_period_sk_idx on period (cost=0.15..2.37 rows=1 width=4) Index Cond: ((period && '[2021-03-15,2022-08-20)'::daterange) AND (period_type = 'quarterly'::period_t)) (2 rows) Works in PG: Data types & extensibility. Range type & GiST index.

Slide 59

Slide 59 text

Summary Tables for window, intervals: Declarative definition of data. No iteration, no loops. Just sets & SQL.

Slide 60

Slide 60 text

Summary Types for calendar: interval daterange & tsrange & tstzrange Series of values: generate_series( start, end, step );

Slide 61

Slide 61 text

Summary SQL: Common Table Expression (“CTE”). Inlined in Postgres – fast. Name subqueries. Cross-join == Cartesian Product.

Slide 62

Slide 62 text

Bedside reading https://www.postgresql.org/docs/13/... textsearch-indexes.html GIN & GiST rangetypes.html Data types datatype-datetime.html Date & Time spec’s functions-datetime.html Date op’s & func’s. functions-range.html Range op’s & func’s.

Slide 63

Slide 63 text

More bedside reading Excellent book on how to think in SQL: Thinking In Sets: Auxilary, Temporal and Vitual Tables in SQL Joe Celko, 2008 Anything he has written is excellent reading. Celko approaches SQL from a practical point of view.

Slide 64

Slide 64 text

Even More Bedside Reading Excellent look at writing correct SQL: SQL and Relational Theory, 3rd Edition C.J. Date, 2015 Date, the original author of SQL and the original advocate for relational databases in general, is a brilliant mathematician. He writes like one, but this is an excellent book for anyone who authors SQL.