10 in 10: Ten new things in Postgres 10

10 in 10 Basil Bourque [email protected] LinkedIn: basilbourque Ten new
things in Postgres 10 1 2017-10-04 10

baker’s dozen • replacement for SERIAL • traceable COMMIT •
parallel query • cross-column statistics • new collation (sorting) • more XML 2 • logical replication • simplifying timestamps • renaming & renumbering • security tweaks

caveat • I am no expert on these topics •
Struck my fancy • I dove in, studied, played a bit • Take what I say with grain of salt: See bibliographies for straight scoop • Final shipping version of Postgres may vary 3

primary key 4 natural key (from your data) surrogate key
(extra column, artiﬁcial value) sequential numbers (1, 2, 3, …) UUID (128-bit value) SERIAL

SMALLSERIAL | SERIAL | BIGSERIAL 5 CREATE TABLE tbl (
col SERIAL PRIMARY KEY, col2 VARCHAR( 80 ) , ) ; INSERT INTO tbl ( col2 ) VALUES ('Basil') RETURNING col ; • See doc • Deﬁnes a SEQUENCE • Marks col NOT NULL • Default col to sequence • 16-bit, 32-bit, 64-bit Get back your number

non-standard SERIAL • Not standard • Postgres-speciﬁc keyword 6

SERIAL pseudo-type • type erasure 7 CREATE TABLE tbl (
col SERIAL ) ; CREATE SEQUENCE tbl_col_seq ; CREATE TABLE tbl ( col INT NOT NULL DEFAULT nextval( 'tbl_col_seq' ) , col2 VARCHAR(20) ) ; ALTER SEQUENCE tbl_col_seq OWNED BY tbl.col ;

sequence permission separated • CREATE USER & GRANT INSERT •
INSERT INTO tbl ( col2 ) VALUES ( 'whatever' ) ;  ERROR: permission denied for sequence tbl_col_seq • solution:  GRANT USAGE ON SEQUENCE tbl_col_seq ; 8 name ?

sequence name needed for mgmt • That mysterious name needed
for managing value • ALTER SEQUENCE tbl_col_seq RESTART WITH 1000 ; 9 name ?

dropping SERIAL is tricky • Determine name of sequence •
Remember to add CASCADE to drop command to clear DEFAULT • DROP SEQUENCE tbl_col_seq CASCADE ; • Beware:  ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT ;  … leaves sequence in place  10 ☹ ☻

CREATE TABLE / LIKE incorrect • CREATE TABLE tbl2 (
LIKE tbl INCLUDING ALL ) ; • Good: Copies the DEFAULT with SEQUENCE • Bad: Uses the same sequence for both tables (!) • Drop ﬁrst table fails – “other objects depend on it” • DROP TABLE tbl ; -- ERROR • Inherited table is similar  DEFAULT is inherited but the sequence ownership is not 11

replacement for SERIAL • Identity column – GENERATED … AS
IDENTITY ( … ) • creates implicit sequence, hidden from us, with optional override • GENERATED ALWAYS AS IDENTITY   GENERATED BY DEFAULT AS IDENTITY • ALWAYS ignores user-provided value  unless INSERT … OVERRIDING SYSTEM VALUE 12 CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;

benefits of GENERATED…AS IDENTITY • No need to manage DEFAULT 
ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT;  ERROR: column "col" of relation "tbl" is an identity column  HINT: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead. • CREATE TABLE … LIKE gets its own identity sequence ( if identity included ) • INCLUDING IDENTITY | EXCLUDING IDENTITY 13 • Standard SQL. Feature # T174 & F386 & T178 ( first SQL: 2003, clarified in SQL:2008 ) • 'generated' quality is remembered ( no ‘type erasure’ ) • No permissions issue, as sequence implicit/internal • No name involved – simply alter the column  ALTER TABLE tbl   ALTER COLUMN col   RESTART WITH 1000 ; • Simple DROP TABLE drops sequence

GENERATED … AS IDENTITY behavior • INSERT INTO command remains
the same – no change to existing code • One per table, maximum • Type of sequence matches column implicitly ( 16-, 32-, 64-bit integer ) • Manual adjustments • Adding identity column does not generate values for existing rows • Sequence continues ( not reset ) after TRUNCATE TABLE 14

takes options of CREATE SEQUENCE • START WITH start •
MINVALUE minvalue | NO MINVALUE • MAXVALUE maxvalue | NO MAXVALUE • INCREMENT [ BY ] increment • CYCLE | NO CYCLE • CACHE cache • OWNED BY tbl.col | OWNED BY NONE 15

silly example 16 id_ INTEGER GENERATED ALWAYS AS IDENTITY (
START WITH 200 MINVALUE 100 MAXVALUE 205 CYCLE INCREMENT BY 3 ) PRIMARY KEY Adding 4 rows: 200 203 100 103

accessing the backing sequence • Call existing function: pg_get_serial_sequence •
Example… Return the value most recently obtained by nextval for this sequence in the current session: 17 SELECT currval ( pg_get_serial_sequence( 'tbl' , 'col' ) ) ;

moving to GENERATED … AS IDENTITY • See documentation for:
• CREATE TABLE • CREATE SEQUENCE ( for options ) • Begin using on new tables • Convert old tables using PL/pgSQL function by Peter Eisentraut • Please test ! 18

still need PRIMARY KEY • GENERATED … AS IDENTITY •
NOT NULL • CREATE SEQUENCE • options   (start, min/max, etc.) • DEFAULT nextval • tied to column  (no separate management) 19 • PRIMARY KEY • UNIQUE • indexed • relationship   ( foreign key & JOIN ) CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;

bibliography • PostgreSQL 10 identity columns explained by Peter Eisentraut
• Waiting for PostgreSQL 10 – Identity Columns by depesz • SQL:2011 standard (draft)  4.15.11 Identity columns  4.22 Sequence generators  11.20 <alter identity column specification> 20

Aussie time 21

traceable COMMIT • BEGIN … some work … COMMIT …
lost connection/crash • Did COMMIT command reach server? Did it succeed? • txid_status( bigint ) function, new • look-up a recent transaction-id • committed or aborted, due to crash or rollback • don’t wait too long • lighter alternative to two-phase commit • may be useful in other situations 22

capture txn id • Need to remember the transaction id
during the transaction, to ask after • txid_current()   txid_current_if_assigned() -- new command • queries for txid that may not exist (until real work) • 64-bit xid-with-epoch = 32-bit epoch counter + 32-bit xid • epoch defaults to zero, in my trial • INSERT INTO ...; SELECT txid_current();  INSERT INTO ... RETURNING txid_current(); 23

result • txid_status( bigint ) → txid_status • data type:
txid_status  effectively: text • domain: committed | aborted | in progress | NULL • NULL = no references to that transaction survive in the system, and the commit status information has been discarded • “This happens once all tuples associated with that transaction and all older transactions have been deleted or updated to newer versions and vacuum has cleaned up the old transaction history information.” – Craig Ringer 24

if ‘in progress’ • Either: • wait • terminate •
look up the associated backend in pg_stat_activity • call pg_terminate_backend( … ) 25

2PC • Does not fully substitute for two-phase commit •
knowledge of transaction status is eventually discarded • you cannot actually ROLLBACK a transaction once COMMITted • does not plug into txn managers ( JTA, XA, MS-DTC ) 26

bibliography • Traceable commit for PostgreSQL 10 by Craig Ringer,
on 2ndQuadrant.com, 2017-04-05 • Email thread: [PATCH] Transaction traceability - txid_status(bigint) • Doc page 9.25 System Information Functions ( Table 9.69 ) 27

Maine Coon moment 28

parallel query • Split queries into multiple tasks, perform in
parallel on multiple cores • Parallel Query new in 9.6, enhanced in 10 • Faster especially for lots of data returning few rows, and for aggregates • 2x-4x faster (maybe) • Great boost for some queries, but irrelevant for many ( no magic ☹ ) • Some limitations may ease-up in future development (see daggers) 29 2x 4x

Terminology • Gather node • Task being divided, the query.
Dispatcher, gathering workers’ results. • Background worker process • Extend Postgres to run user-supplied code in separate processes • Access shared memory • Make libpq connections to server • Another Postgres process (native in OS), like a user-session process 30

parallel query • 9.6 • OFF by default 31 •
10 • ON by default max_worker_processes = 8 max_parallel_workers_per_gather = 0 max_worker_processes = 8 max_parallel_workers = 8 max_parallel_workers_per_gather = 2 Use EXPLAIN to see number of workers chosen by planner

conditions required by planning • Multi-user mode (not Single-user) •
max_parallel_workers_per_gather > 0 • dynamic_shared_memory_type ≠ none • Read-only = No writing, no locking, in top-level or CTE † • no suspension during execution ( ex: DECLARE CURSOR, PL/pgSQL loop ) • no function marked PARALLEL UNSAFE (user-deﬁned functions by default) • not nested in a parallel plan †? • transaction isolation level = SERIALIZABLE † 32

planned but not executed ☹ • no background workers can
be obtained • client sends an Execute message with a non-zero fetch count  (not possible in libpq connections) (otherwise, disable p-q to avoid plan) • prepared statement with CREATE TABLE … AS EXECUTE … (read-write) • transaction isolation level = SERIALIZABLE † (if changed after plan) 33

parallel scans • 9.6 • Sequential scan  Each table block
handed out one-by-one to workers. • Background worker processes • Dynamic background workers • Dynamic shared memory 34 • 10 • Sequential scan • Bitmap heap scan  Leader process scans index(es), builds bitmap of blocks to visit, then hands out each table block to workers. • Btree index scan †   Workers take turns reading index, for each index block scanning-sorting-returning its referenced tuples. • Gather Merge (respect sorted tuples) • query text of parallel worker now in pg_stat_activity (debug crashes, etc.)

parallel joins • 9.6 • hash joins • nested loops
35 • 10 • hash joins • nested loops • merge joins Opportunity for improvement:  work on the inner side of the join is duplicated by every participant

bibliography • Parallelism Progress by Robert Haas, 2013-10 • Parallel
Query (in 9.6) wiki • Parallel Query v2 by Robert Haas, 2017-03 • New in postgres 10 wiki, section “Additional Parallelism” 36 • Postgres manual • Chapter 15 Parallel Query, version 9.6 and version 10 • Chapter 45/47 Background Worker Processes • Chapter 19.4 Resource Consumption

Morocco 37

statistics • Statistics = count & content of values in
rows for a column • most/least frequent values, value cardinality, rudimentary histograms • Postgres planner ( cost-estimation engine ) • Smarter plan = faster query = less resource intense • Automatically collected: VACUUM, ANALYZE, CREATE INDEX, etc. • per-individual-column • New in 10: multi-column 38

functional dependency • functional dependency ( from database normalization ) 
  Ex: EmpInDept table | EmpId & EmpName & DeptId & DeptName  Ex: Person table | BirthDate & Age & IsMinor  Non-ex: Person table | HairColor & ShoeSize & FavoriteSong • de-normalized structure ( intentional or unwitting ) • partial functional dependency  ex: Zip Codes almost always determine City  ➠ Postgres planner won’t know that until you tell it to look 39

cross-column statistics • Tell Postgres when columns are kinda-sorta related:
  CREATE STATISTICS zip_stats_ ( dependencies )   ON zip_ , city_ FROM zipcodes_ ; • Syntax • CREATE STATISTICS [ IF NOT EXISTS ] name  [ type = dependencies or ndistinct or omitted=all ]  ON colx , coly , …  FROM tbl • ALTER STATISTICS name   OWNER TO … | RENAME TO … | SET SCHEMA … • DROP STATISTICS [ IF EXISTS ] name 40 = functional dependency calculation, 1.0 = absolute = distinct combos,   for GROUP BY

a.k.a • cross-column statistics • extended statistics • correlated statistics
• multivariate statistics 41

bibliography • PG Phriday: Crazy Correlated Column Crusade by Shaun
Thomas, 2ndQuadrant.com, 2017-07 • Manual, Chapter 14.2 Statistics Used by the Planner • Manual, Chapter 68 How the Planner Uses Statistics • Wikipedia: Functional dependency • Postgres wiki, Cross Columns Stats 42

49 Piet Mondrian

50 Yves Saint Laurent

collation • deciding the order of things, rules for sorting
• diacriticals: e è é ê ë • Swedish: z < ö German: ö < z • whitespace, punctuation, case • Ex: resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, … 52

collation in Postgres • Postgres < 10 = host OS
rules ( C library ) • passing strings to strcmp(), strcoll(), and the like • crude, not world-savvy ☹ • behavior varies ☹ • index lookup breaks with changes ( data corruption & apparent data loss ) ☹ • Postgres 10 • Unicode Collation Algorithm ( UCA ) ( 79-page spec ) • International Components for Unicode ( ICU ) ( implementation of UCA ) 53

ICU • adopting ICU implementation • ICU = International Components
for Unicode • very complete with deep coverage, constantly updated • well-worn: from Taligent/IBM, to Java i18n, to C/C++ (ICU4C), to ICU4J • provides many Unicode-savvy text services to OSes & environments • stable (versioned, no compatibility break in minor updates) • great move for Postgres team | binary linked to major version of libicu 54

using ICU • Build for ICU • from source, use
./configure --with-icu with options • Get old libc collations too • Append -x-icu for ICU collation, omit for old libc collation • CREATE TABLE ... (... x text COLLATE "en_US" ...)  CREATE TABLE ... (... x text COLLATE "en_US-x-icu" ...) • Preﬁx (locale name) is the usual standard Language, Script, Country, Variant. • But not a POSIX locale ID 55

another example of ICU • More variants available ☻ •
…even emoji ! • … VALUES   ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz') • SELECT … ORDER BY name COLLATE "de-u-co-standard-x-icu"  SELECT … ORDER BY name COLLATE "de-u-co-phonebk-x-icu"  SELECT … ORDER BY name COLLATE "de-AT-u-co-phonebk-x-icu" 56

side-beneﬁt: ﬁx for Abbreviated Keys • Postgres 9.5 new feature:
Abbreviated Keys • Faster sorts & indexing for text/varchar columns by using an algorithm called "abbreviated keys". Up to 20x. • Ruined • buggy implementations of strcoll() in glibc ( the C standard library common on Linux systems ) not match strxfrm()   ( see Postgres wiki ) • Disabled in 9.5.2 • May be good again with ICU collation ?? 57

more beneﬁts • Possibly case-insensitive & accent-insensitive in Pg 11/12
• Now can detect and warn of major change to version of collation. Might reﬁne and automate in future. 58

bibliography • More robust collations with ICU support in PostgreSQL
10 by Peter Eisentraut, 2ndQuadrant.com, 2017-05 • Wikipedia: Unicode collation algorithm ( UCA ) ( spec, 79 pages ) • Wikipedia: International Components for Unicode ( implementation of UCA ) • Manual, Chapter 23.2 Collation Support ( 23.2.2.2.2. ICU collations ) • Locale naming: ICU User Guide – Locale 59

60 damn corvids

logical replication • Previously, physical replication • entire cluster copied
from primary server to replica server • Impossible to copy over just one database, or just one table • Now possible • Replicate across versions of Postgres • Replicate by table • lite coverage here, as it gets so much press elsewhere 61

physical vs logical • physical = uses exact block addresses
and byte-by-byte replication • standby server = bit for bit copy of primary server • a.k.a. streaming replication • logical = replicate data objects & changes, based on identity ( primary key ) • Pub-Sub – publish and subscribe model – CREATE PUBLICATION • subscribers pull data from pubs they subscribe • cascade onwards 62

bibliography • Manual, Chapter 31, Logical Replication • Logical Replication
in PostgreSQL 10 by Petr Jelinek, 2ndQuadrant.com, 2017-04 63

TIMESTAMP simpler • no more ﬂoating-point timestamp • archaic implementation
of TIMESTAMP data type • good riddance • manual page easier to read now • By the way, also dropping: • original client/server protocol (superceded in 1998) • pg_dump support for Postgres 7.4 and earlier 65

renaming/renumbering • Postgres numbering: Major.Minor • Simpler. Save wasted time
(8 vs 9 vs 10 debates) • Renaming for Write-Ahead Log • ‘xlog’ changed to ‘wal’ • 2 directories, 12 admin functions, 8 system views/functions, 6 executables • Implications for scripts and automation tools 66

passwords • passwords always encrypted  no more cleartext • SCRAM
authentication  Salted Challenge Response Authentication Mechanism • encryption on client • only in libc for now • Postgres 10 highlight - SCRAM authentication by Michael Paquier 67 authentication external internal Kerberos SSPI TLS/SSL cleartext salt-hash-MD5 SCRAM-SHA-256 client server CA certiﬁcate + knowledge of salted hash of pw salted hash of pw + challenge pw salted with PBKDF2 MiM

row-level security • Old • permissive policies ( combined as
OR ) ( default )  ➡ any matching policy allowed access • New • restrictive policies ( combined as AND )  ➡ must match for access to be granted • may combine permissive + restrictive • CREATE POLICY name ON table_name  [ AS { PERMISSIVE | RESTRICTIVE } ] … 68

general bibliography • New Features Coming in PostgreSQL 10 by
Robert Haas, EnterpriseDB.com, 2017-04 • Postgres wiki, New in postgres 10 • Release Notes 69

« ﬁn » 70 Basil Bourque [email protected] LinkedIn: basilbourque

10 in 10: Ten new things in Postgres 10

10 in 10: Ten new things in Postgres 10

More Decks by Basil Bourque

Other Decks in Programming

Featured

Transcript