10 in 10: Ten new things in Postgres 10

Slide 1

Slide 1 text

10 in 10 Basil Bourque [email protected] LinkedIn: basilbourque Ten new things in Postgres 10 1 2017-10-04 10

Slide 2

Slide 2 text

baker’s dozen • replacement for SERIAL • traceable COMMIT • parallel query • cross-column statistics • new collation (sorting) • more XML 2 • logical replication • simplifying timestamps • renaming & renumbering • security tweaks

Slide 3

Slide 3 text

caveat • I am no expert on these topics • Struck my fancy • I dove in, studied, played a bit • Take what I say with grain of salt: See bibliographies for straight scoop • Final shipping version of Postgres may vary 3

Slide 4

Slide 4 text

primary key 4 natural key (from your data) surrogate key (extra column, artiﬁcial value) sequential numbers (1, 2, 3, …) UUID (128-bit value) SERIAL

Slide 5

Slide 5 text

SMALLSERIAL | SERIAL | BIGSERIAL 5 CREATE TABLE tbl ( col SERIAL PRIMARY KEY, col2 VARCHAR( 80 ) , ) ; INSERT INTO tbl ( col2 ) VALUES ('Basil') RETURNING col ; • See doc • Deﬁnes a SEQUENCE • Marks col NOT NULL • Default col to sequence • 16-bit, 32-bit, 64-bit Get back your number

Slide 6

Slide 6 text

non-standard SERIAL • Not standard • Postgres-speciﬁc keyword 6

Slide 7

Slide 7 text

SERIAL pseudo-type • type erasure 7 CREATE TABLE tbl ( col SERIAL ) ; CREATE SEQUENCE tbl_col_seq ; CREATE TABLE tbl ( col INT NOT NULL DEFAULT nextval( 'tbl_col_seq' ) , col2 VARCHAR(20) ) ; ALTER SEQUENCE tbl_col_seq OWNED BY tbl.col ;

Slide 8

Slide 8 text

sequence permission separated • CREATE USER & GRANT INSERT • INSERT INTO tbl ( col2 ) VALUES ( 'whatever' ) ;  ERROR: permission denied for sequence tbl_col_seq • solution:  GRANT USAGE ON SEQUENCE tbl_col_seq ; 8 name ?

Slide 9

Slide 9 text

sequence name needed for mgmt • That mysterious name needed for managing value • ALTER SEQUENCE tbl_col_seq RESTART WITH 1000 ; 9 name ?

Slide 10

Slide 10 text

dropping SERIAL is tricky • Determine name of sequence • Remember to add CASCADE to drop command to clear DEFAULT • DROP SEQUENCE tbl_col_seq CASCADE ; • Beware:  ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT ;  … leaves sequence in place  10 ☹ ☻

Slide 11

Slide 11 text

CREATE TABLE / LIKE incorrect • CREATE TABLE tbl2 ( LIKE tbl INCLUDING ALL ) ; • Good: Copies the DEFAULT with SEQUENCE • Bad: Uses the same sequence for both tables (!) • Drop ﬁrst table fails – “other objects depend on it” • DROP TABLE tbl ; -- ERROR • Inherited table is similar  DEFAULT is inherited but the sequence ownership is not 11

Slide 12

Slide 12 text

replacement for SERIAL • Identity column – GENERATED … AS IDENTITY ( … ) • creates implicit sequence, hidden from us, with optional override • GENERATED ALWAYS AS IDENTITY   GENERATED BY DEFAULT AS IDENTITY • ALWAYS ignores user-provided value  unless INSERT … OVERRIDING SYSTEM VALUE 12 CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;

Slide 13

Slide 13 text

benefits of GENERATED…AS IDENTITY • No need to manage DEFAULT  ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT;  ERROR: column "col" of relation "tbl" is an identity column  HINT: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead. • CREATE TABLE … LIKE gets its own identity sequence ( if identity included ) • INCLUDING IDENTITY | EXCLUDING IDENTITY 13 • Standard SQL. Feature # T174 & F386 & T178 ( first SQL: 2003, clarified in SQL:2008 ) • 'generated' quality is remembered ( no ‘type erasure’ ) • No permissions issue, as sequence implicit/internal • No name involved – simply alter the column  ALTER TABLE tbl   ALTER COLUMN col   RESTART WITH 1000 ; • Simple DROP TABLE drops sequence

Slide 14

Slide 14 text

GENERATED … AS IDENTITY behavior • INSERT INTO command remains the same – no change to existing code • One per table, maximum • Type of sequence matches column implicitly ( 16-, 32-, 64-bit integer ) • Manual adjustments • Adding identity column does not generate values for existing rows • Sequence continues ( not reset ) after TRUNCATE TABLE 14

Slide 15

Slide 15 text

takes options of CREATE SEQUENCE • START WITH start • MINVALUE minvalue | NO MINVALUE • MAXVALUE maxvalue | NO MAXVALUE • INCREMENT [ BY ] increment • CYCLE | NO CYCLE • CACHE cache • OWNED BY tbl.col | OWNED BY NONE 15

Slide 16

Slide 16 text

silly example 16 id_ INTEGER GENERATED ALWAYS AS IDENTITY ( START WITH 200 MINVALUE 100 MAXVALUE 205 CYCLE INCREMENT BY 3 ) PRIMARY KEY Adding 4 rows: 200 203 100 103

Slide 17

Slide 17 text

accessing the backing sequence • Call existing function: pg_get_serial_sequence • Example… Return the value most recently obtained by nextval for this sequence in the current session: 17 SELECT currval ( pg_get_serial_sequence( 'tbl' , 'col' ) ) ;

Slide 18

Slide 18 text

moving to GENERATED … AS IDENTITY • See documentation for: • CREATE TABLE • CREATE SEQUENCE ( for options ) • Begin using on new tables • Convert old tables using PL/pgSQL function by Peter Eisentraut • Please test ! 18

Slide 19

Slide 19 text

still need PRIMARY KEY • GENERATED … AS IDENTITY • NOT NULL • CREATE SEQUENCE • options   (start, min/max, etc.) • DEFAULT nextval • tied to column  (no separate management) 19 • PRIMARY KEY • UNIQUE • indexed • relationship   ( foreign key & JOIN ) CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;

Slide 20

Slide 20 text

bibliography • PostgreSQL 10 identity columns explained by Peter Eisentraut • Waiting for PostgreSQL 10 – Identity Columns by depesz • SQL:2011 standard (draft)  4.15.11 Identity columns  4.22 Sequence generators  11.20 20

Slide 21

Slide 21 text

Aussie time 21

Slide 22

Slide 22 text

traceable COMMIT • BEGIN … some work … COMMIT … lost connection/crash • Did COMMIT command reach server? Did it succeed? • txid_status( bigint ) function, new • look-up a recent transaction-id • committed or aborted, due to crash or rollback • don’t wait too long • lighter alternative to two-phase commit • may be useful in other situations 22

Slide 23

Slide 23 text

capture txn id • Need to remember the transaction id during the transaction, to ask after • txid_current()   txid_current_if_assigned() -- new command • queries for txid that may not exist (until real work) • 64-bit xid-with-epoch = 32-bit epoch counter + 32-bit xid • epoch defaults to zero, in my trial • INSERT INTO ...; SELECT txid_current();  INSERT INTO ... RETURNING txid_current(); 23

Slide 24

Slide 24 text

result • txid_status( bigint ) → txid_status • data type: txid_status  effectively: text • domain: committed | aborted | in progress | NULL • NULL = no references to that transaction survive in the system, and the commit status information has been discarded • “This happens once all tuples associated with that transaction and all older transactions have been deleted or updated to newer versions and vacuum has cleaned up the old transaction history information.” – Craig Ringer 24

Slide 25

Slide 25 text

if ‘in progress’ • Either: • wait • terminate • look up the associated backend in pg_stat_activity • call pg_terminate_backend( … ) 25

Slide 26

Slide 26 text

2PC • Does not fully substitute for two-phase commit • knowledge of transaction status is eventually discarded • you cannot actually ROLLBACK a transaction once COMMITted • does not plug into txn managers ( JTA, XA, MS-DTC ) 26

Slide 27

Slide 27 text

bibliography • Traceable commit for PostgreSQL 10 by Craig Ringer, on 2ndQuadrant.com, 2017-04-05 • Email thread: [PATCH] Transaction traceability - txid_status(bigint) • Doc page 9.25 System Information Functions ( Table 9.69 ) 27

Slide 28

Slide 28 text

Maine Coon moment 28

Slide 29

Slide 29 text

parallel query • Split queries into multiple tasks, perform in parallel on multiple cores • Parallel Query new in 9.6, enhanced in 10 • Faster especially for lots of data returning few rows, and for aggregates • 2x-4x faster (maybe) • Great boost for some queries, but irrelevant for many ( no magic ☹ ) • Some limitations may ease-up in future development (see daggers) 29 2x 4x

Slide 30

Slide 30 text

Terminology • Gather node • Task being divided, the query. Dispatcher, gathering workers’ results. • Background worker process • Extend Postgres to run user-supplied code in separate processes • Access shared memory • Make libpq connections to server • Another Postgres process (native in OS), like a user-session process 30

Slide 31

Slide 31 text

parallel query • 9.6 • OFF by default 31 • 10 • ON by default max_worker_processes = 8 max_parallel_workers_per_gather = 0 max_worker_processes = 8 max_parallel_workers = 8 max_parallel_workers_per_gather = 2 Use EXPLAIN to see number of workers chosen by planner

Slide 32

Slide 32 text

conditions required by planning • Multi-user mode (not Single-user) • max_parallel_workers_per_gather > 0 • dynamic_shared_memory_type ≠ none • Read-only = No writing, no locking, in top-level or CTE † • no suspension during execution ( ex: DECLARE CURSOR, PL/pgSQL loop ) • no function marked PARALLEL UNSAFE (user-deﬁned functions by default) • not nested in a parallel plan †? • transaction isolation level = SERIALIZABLE † 32

Slide 33

Slide 33 text

planned but not executed ☹ • no background workers can be obtained • client sends an Execute message with a non-zero fetch count  (not possible in libpq connections) (otherwise, disable p-q to avoid plan) • prepared statement with CREATE TABLE … AS EXECUTE … (read-write) • transaction isolation level = SERIALIZABLE † (if changed after plan) 33

Slide 34

Slide 34 text

parallel scans • 9.6 • Sequential scan  Each table block handed out one-by-one to workers. • Background worker processes • Dynamic background workers • Dynamic shared memory 34 • 10 • Sequential scan • Bitmap heap scan  Leader process scans index(es), builds bitmap of blocks to visit, then hands out each table block to workers. • Btree index scan †   Workers take turns reading index, for each index block scanning-sorting-returning its referenced tuples. • Gather Merge (respect sorted tuples) • query text of parallel worker now in pg_stat_activity (debug crashes, etc.)

Slide 35

Slide 35 text

parallel joins • 9.6 • hash joins • nested loops 35 • 10 • hash joins • nested loops • merge joins Opportunity for improvement:  work on the inner side of the join is duplicated by every participant

Slide 36

Slide 36 text

bibliography • Parallelism Progress by Robert Haas, 2013-10 • Parallel Query (in 9.6) wiki • Parallel Query v2 by Robert Haas, 2017-03 • New in postgres 10 wiki, section “Additional Parallelism” 36 • Postgres manual • Chapter 15 Parallel Query, version 9.6 and version 10 • Chapter 45/47 Background Worker Processes • Chapter 19.4 Resource Consumption

Slide 37

Slide 37 text

Morocco 37

Slide 38

Slide 38 text

statistics • Statistics = count & content of values in rows for a column • most/least frequent values, value cardinality, rudimentary histograms • Postgres planner ( cost-estimation engine ) • Smarter plan = faster query = less resource intense • Automatically collected: VACUUM, ANALYZE, CREATE INDEX, etc. • per-individual-column • New in 10: multi-column 38

Slide 39

Slide 39 text

functional dependency • functional dependency ( from database normalization )    Ex: EmpInDept table | EmpId & EmpName & DeptId & DeptName  Ex: Person table | BirthDate & Age & IsMinor  Non-ex: Person table | HairColor & ShoeSize & FavoriteSong • de-normalized structure ( intentional or unwitting ) • partial functional dependency  ex: Zip Codes almost always determine City  ➠ Postgres planner won’t know that until you tell it to look 39

Slide 40

Slide 40 text

cross-column statistics • Tell Postgres when columns are kinda-sorta related:   CREATE STATISTICS zip_stats_ ( dependencies )   ON zip_ , city_ FROM zipcodes_ ; • Syntax • CREATE STATISTICS [ IF NOT EXISTS ] name  [ type = dependencies or ndistinct or omitted=all ]  ON colx , coly , …  FROM tbl • ALTER STATISTICS name   OWNER TO … | RENAME TO … | SET SCHEMA … • DROP STATISTICS [ IF EXISTS ] name 40 = functional dependency calculation, 1.0 = absolute = distinct combos,   for GROUP BY

Slide 41

Slide 41 text

a.k.a • cross-column statistics • extended statistics • correlated statistics • multivariate statistics 41

Slide 42

Slide 42 text

bibliography • PG Phriday: Crazy Correlated Column Crusade by Shaun Thomas, 2ndQuadrant.com, 2017-07 • Manual, Chapter 14.2 Statistics Used by the Planner • Manual, Chapter 68 How the Planner Uses Statistics • Wikipedia: Functional dependency • Postgres wiki, Cross Columns Stats 42

Slide 43

Slide 43 text

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Slide 48

Slide 48 text

Slide 49

Slide 49 text

49 Piet Mondrian

Slide 50

Slide 50 text

50 Yves Saint Laurent

Slide 51

Slide 51 text

Slide 52

Slide 52 text

collation • deciding the order of things, rules for sorting • diacriticals: e è é ê ë • Swedish: z < ö German: ö < z • whitespace, punctuation, case • Ex: resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, … 52

Slide 53

Slide 53 text

collation in Postgres • Postgres < 10 = host OS rules ( C library ) • passing strings to strcmp(), strcoll(), and the like • crude, not world-savvy ☹ • behavior varies ☹ • index lookup breaks with changes ( data corruption & apparent data loss ) ☹ • Postgres 10 • Unicode Collation Algorithm ( UCA ) ( 79-page spec ) • International Components for Unicode ( ICU ) ( implementation of UCA ) 53

Slide 54

Slide 54 text

ICU • adopting ICU implementation • ICU = International Components for Unicode • very complete with deep coverage, constantly updated • well-worn: from Taligent/IBM, to Java i18n, to C/C++ (ICU4C), to ICU4J • provides many Unicode-savvy text services to OSes & environments • stable (versioned, no compatibility break in minor updates) • great move for Postgres team | binary linked to major version of libicu 54

Slide 55

Slide 55 text

using ICU • Build for ICU • from source, use ./configure --with-icu with options • Get old libc collations too • Append -x-icu for ICU collation, omit for old libc collation • CREATE TABLE ... (... x text COLLATE "en_US" ...)  CREATE TABLE ... (... x text COLLATE "en_US-x-icu" ...) • Preﬁx (locale name) is the usual standard Language, Script, Country, Variant. • But not a POSIX locale ID 55

Slide 56

Slide 56 text

another example of ICU • More variants available ☻ • …even emoji ! • … VALUES   ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz') • SELECT … ORDER BY name COLLATE "de-u-co-standard-x-icu"  SELECT … ORDER BY name COLLATE "de-u-co-phonebk-x-icu"  SELECT … ORDER BY name COLLATE "de-AT-u-co-phonebk-x-icu" 56

Slide 57

Slide 57 text

side-beneﬁt: ﬁx for Abbreviated Keys • Postgres 9.5 new feature: Abbreviated Keys • Faster sorts & indexing for text/varchar columns by using an algorithm called "abbreviated keys". Up to 20x. • Ruined • buggy implementations of strcoll() in glibc ( the C standard library common on Linux systems ) not match strxfrm()   ( see Postgres wiki ) • Disabled in 9.5.2 • May be good again with ICU collation ?? 57

Slide 58

Slide 58 text

more beneﬁts • Possibly case-insensitive & accent-insensitive in Pg 11/12 • Now can detect and warn of major change to version of collation. Might reﬁne and automate in future. 58

Slide 59

Slide 59 text

bibliography • More robust collations with ICU support in PostgreSQL 10 by Peter Eisentraut, 2ndQuadrant.com, 2017-05 • Wikipedia: Unicode collation algorithm ( UCA ) ( spec, 79 pages ) • Wikipedia: International Components for Unicode ( implementation of UCA ) • Manual, Chapter 23.2 Collation Support ( 23.2.2.2.2. ICU collations ) • Locale naming: ICU User Guide – Locale 59

Slide 60

Slide 60 text

60 damn corvids

Slide 61

Slide 61 text

logical replication • Previously, physical replication • entire cluster copied from primary server to replica server • Impossible to copy over just one database, or just one table • Now possible • Replicate across versions of Postgres • Replicate by table • lite coverage here, as it gets so much press elsewhere 61

Slide 62

Slide 62 text

physical vs logical • physical = uses exact block addresses and byte-by-byte replication • standby server = bit for bit copy of primary server • a.k.a. streaming replication • logical = replicate data objects & changes, based on identity ( primary key ) • Pub-Sub – publish and subscribe model – CREATE PUBLICATION • subscribers pull data from pubs they subscribe • cascade onwards 62

Slide 63

Slide 63 text

bibliography • Manual, Chapter 31, Logical Replication • Logical Replication in PostgreSQL 10 by Petr Jelinek, 2ndQuadrant.com, 2017-04 63

Slide 64

Slide 64 text

Slide 65

Slide 65 text

TIMESTAMP simpler • no more ﬂoating-point timestamp • archaic implementation of TIMESTAMP data type • good riddance • manual page easier to read now • By the way, also dropping: • original client/server protocol (superceded in 1998) • pg_dump support for Postgres 7.4 and earlier 65

Slide 66

Slide 66 text

renaming/renumbering • Postgres numbering: Major.Minor • Simpler. Save wasted time (8 vs 9 vs 10 debates) • Renaming for Write-Ahead Log • ‘xlog’ changed to ‘wal’ • 2 directories, 12 admin functions, 8 system views/functions, 6 executables • Implications for scripts and automation tools 66

Slide 67

Slide 67 text

passwords • passwords always encrypted  no more cleartext • SCRAM authentication  Salted Challenge Response Authentication Mechanism • encryption on client • only in libc for now • Postgres 10 highlight - SCRAM authentication by Michael Paquier 67 authentication external internal Kerberos SSPI TLS/SSL cleartext salt-hash-MD5 SCRAM-SHA-256 client server CA certiﬁcate + knowledge of salted hash of pw salted hash of pw + challenge pw salted with PBKDF2 MiM

Slide 68

Slide 68 text

row-level security • Old • permissive policies ( combined as OR ) ( default )  ➡ any matching policy allowed access • New • restrictive policies ( combined as AND )  ➡ must match for access to be granted • may combine permissive + restrictive • CREATE POLICY name ON table_name  [ AS { PERMISSIVE | RESTRICTIVE } ] … 68

Slide 69

Slide 69 text

general bibliography • New Features Coming in PostgreSQL 10 by Robert Haas, EnterpriseDB.com, 2017-04 • Postgres wiki, New in postgres 10 • Release Notes 69

Slide 70

Slide 70 text

« ﬁn » 70 Basil Bourque [email protected] LinkedIn: basilbourque