10 in 10
Basil Bourque
[email protected]
LinkedIn: basilbourque
Ten new things in Postgres 10
1
2017-10-04
10
Slide 2
Slide 2 text
baker’s dozen • replacement for SERIAL
• traceable COMMIT
• parallel query
• cross-column statistics
• new collation (sorting)
• more XML
2
• logical replication
• simplifying timestamps
• renaming & renumbering
• security tweaks
Slide 3
Slide 3 text
caveat
• I am no expert on these topics
• Struck my fancy
• I dove in, studied, played a bit
• Take what I say with grain of salt: See bibliographies for straight scoop
• Final shipping version of Postgres may vary
3
SMALLSERIAL | SERIAL | BIGSERIAL
5
CREATE TABLE tbl (
col SERIAL PRIMARY KEY,
col2 VARCHAR( 80 ) ,
) ;
INSERT INTO tbl ( col2 )
VALUES ('Basil')
RETURNING col ;
• See doc
• Defines a SEQUENCE
• Marks col NOT NULL
• Default col to sequence
• 16-bit, 32-bit, 64-bit
Get back
your number
Slide 6
Slide 6 text
non-standard SERIAL
• Not standard
• Postgres-specific keyword
6
Slide 7
Slide 7 text
SERIAL pseudo-type
• type erasure
7
CREATE TABLE tbl (
col SERIAL
) ;
CREATE SEQUENCE tbl_col_seq ;
CREATE TABLE tbl (
col INT
NOT NULL
DEFAULT nextval( 'tbl_col_seq' ) ,
col2 VARCHAR(20)
) ;
ALTER SEQUENCE tbl_col_seq
OWNED BY tbl.col ;
Slide 8
Slide 8 text
sequence permission separated
• CREATE USER & GRANT INSERT
• INSERT INTO tbl ( col2 ) VALUES ( 'whatever' ) ;
ERROR: permission denied for sequence tbl_col_seq
• solution:
GRANT USAGE ON SEQUENCE tbl_col_seq ;
8
name ?
Slide 9
Slide 9 text
sequence name needed for mgmt
• That mysterious name needed for managing value
• ALTER SEQUENCE tbl_col_seq RESTART WITH 1000 ;
9
name ?
Slide 10
Slide 10 text
dropping SERIAL is tricky
• Determine name of sequence
• Remember to add CASCADE to drop command to clear DEFAULT
• DROP SEQUENCE tbl_col_seq CASCADE ;
• Beware:
ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT ;
… leaves sequence in place
10
☹
☻
Slide 11
Slide 11 text
CREATE TABLE / LIKE incorrect
• CREATE TABLE tbl2 ( LIKE tbl INCLUDING ALL ) ;
• Good: Copies the DEFAULT with SEQUENCE
• Bad: Uses the same sequence for both tables (!)
• Drop first table fails – “other objects depend on it”
• DROP TABLE tbl ; -- ERROR
• Inherited table is similar
DEFAULT is inherited but the sequence ownership is not
11
Slide 12
Slide 12 text
replacement for SERIAL
• Identity column – GENERATED … AS IDENTITY ( … )
• creates implicit sequence, hidden from us, with optional override
• GENERATED ALWAYS AS IDENTITY
GENERATED BY DEFAULT AS IDENTITY
• ALWAYS ignores user-provided value
unless INSERT … OVERRIDING SYSTEM VALUE
12
CREATE TABLE tbl (
col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY ,
col2 text
) ;
Slide 13
Slide 13 text
benefits of GENERATED…AS IDENTITY
• No need to manage DEFAULT
ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT;
ERROR: column "col" of relation "tbl" is an identity column
HINT: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead.
• CREATE TABLE … LIKE gets its own identity sequence ( if identity included )
• INCLUDING IDENTITY | EXCLUDING IDENTITY
13
• Standard SQL. Feature # T174
& F386 & T178 ( first SQL:
2003, clarified in SQL:2008 )
• 'generated' quality is
remembered ( no ‘type
erasure’ )
• No permissions issue, as sequence implicit/internal
• No name involved – simply alter the column
ALTER TABLE tbl
ALTER COLUMN col
RESTART WITH 1000 ;
• Simple DROP TABLE drops sequence
Slide 14
Slide 14 text
GENERATED … AS IDENTITY behavior
• INSERT INTO command remains the same – no change to existing code
• One per table, maximum
• Type of sequence matches column implicitly ( 16-, 32-, 64-bit integer )
• Manual adjustments
• Adding identity column does not generate values for existing rows
• Sequence continues ( not reset ) after TRUNCATE TABLE
14
Slide 15
Slide 15 text
takes options of CREATE SEQUENCE
• START WITH start
• MINVALUE minvalue | NO MINVALUE
• MAXVALUE maxvalue | NO MAXVALUE
• INCREMENT [ BY ] increment
• CYCLE | NO CYCLE
• CACHE cache
• OWNED BY tbl.col | OWNED BY NONE
15
Slide 16
Slide 16 text
silly example
16
id_ INTEGER
GENERATED ALWAYS AS IDENTITY (
START WITH 200
MINVALUE 100
MAXVALUE 205
CYCLE
INCREMENT BY 3
) PRIMARY KEY
Adding 4 rows:
200
203
100
103
Slide 17
Slide 17 text
accessing the backing sequence
• Call existing function: pg_get_serial_sequence
• Example… Return the value most recently obtained by nextval for this
sequence in the current session:
17
SELECT currval (
pg_get_serial_sequence( 'tbl' , 'col' )
) ;
Slide 18
Slide 18 text
moving to GENERATED … AS IDENTITY
• See documentation for:
• CREATE TABLE
• CREATE SEQUENCE ( for options )
• Begin using on new tables
• Convert old tables using PL/pgSQL function by Peter Eisentraut
• Please test !
18
Slide 19
Slide 19 text
still need PRIMARY KEY
• GENERATED … AS IDENTITY
• NOT NULL
• CREATE SEQUENCE
• options
(start, min/max, etc.)
• DEFAULT nextval
• tied to column
(no separate management)
19
• PRIMARY KEY
• UNIQUE
• indexed
• relationship
( foreign key & JOIN )
CREATE TABLE tbl (
col BIGINT
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY ,
col2 text
) ;
Slide 20
Slide 20 text
bibliography
• PostgreSQL 10 identity columns explained by Peter Eisentraut
• Waiting for PostgreSQL 10 – Identity Columns by depesz
• SQL:2011 standard (draft)
4.15.11 Identity columns
4.22 Sequence generators
11.20
20
Slide 21
Slide 21 text
Aussie time
21
Slide 22
Slide 22 text
traceable COMMIT
• BEGIN … some work … COMMIT … lost connection/crash
• Did COMMIT command reach server? Did it succeed?
• txid_status( bigint ) function, new
• look-up a recent transaction-id
• committed or aborted, due to crash or rollback
• don’t wait too long
• lighter alternative to two-phase commit
• may be useful in other situations
22
Slide 23
Slide 23 text
capture txn id
• Need to remember the transaction id during the transaction, to ask after
• txid_current()
txid_current_if_assigned() -- new command
• queries for txid that may not exist (until real work)
• 64-bit xid-with-epoch = 32-bit epoch counter + 32-bit xid
• epoch defaults to zero, in my trial
• INSERT INTO ...; SELECT txid_current();
INSERT INTO ... RETURNING txid_current();
23
Slide 24
Slide 24 text
result
• txid_status( bigint ) → txid_status
• data type: txid_status
effectively: text
• domain: committed | aborted | in progress | NULL
• NULL = no references to that transaction survive in the system, and the
commit status information has been discarded
• “This happens once all tuples associated with that transaction and all older
transactions have been deleted or updated to newer versions and vacuum
has cleaned up the old transaction history information.” – Craig Ringer
24
Slide 25
Slide 25 text
if ‘in progress’
• Either:
• wait
• terminate
• look up the associated backend in pg_stat_activity
• call pg_terminate_backend( … )
25
Slide 26
Slide 26 text
2PC
• Does not fully substitute for two-phase commit
• knowledge of transaction status is eventually discarded
• you cannot actually ROLLBACK a transaction once COMMITted
• does not plug into txn managers ( JTA, XA, MS-DTC )
26
Slide 27
Slide 27 text
bibliography
• Traceable commit for PostgreSQL 10 by Craig Ringer, on 2ndQuadrant.com,
2017-04-05
• Email thread: [PATCH] Transaction traceability - txid_status(bigint)
• Doc page 9.25 System Information Functions ( Table 9.69 )
27
Slide 28
Slide 28 text
Maine Coon moment
28
Slide 29
Slide 29 text
parallel query
• Split queries into multiple tasks, perform in parallel on multiple cores
• Parallel Query new in 9.6, enhanced in 10
• Faster especially for lots of data returning few rows, and for aggregates
• 2x-4x faster (maybe)
• Great boost for some queries, but irrelevant for many ( no magic ☹ )
• Some limitations may ease-up in future development (see daggers)
29
2x
4x
Slide 30
Slide 30 text
Terminology
• Gather node
• Task being divided, the query. Dispatcher, gathering workers’ results.
• Background worker process
• Extend Postgres to run user-supplied code in separate processes
• Access shared memory
• Make libpq connections to server
• Another Postgres process (native in OS), like a user-session process
30
Slide 31
Slide 31 text
parallel query
• 9.6
• OFF by default
31
• 10
• ON by default
max_worker_processes = 8
max_parallel_workers_per_gather = 0
max_worker_processes = 8
max_parallel_workers = 8
max_parallel_workers_per_gather = 2
Use EXPLAIN to see number of workers chosen by planner
Slide 32
Slide 32 text
conditions required by planning
• Multi-user mode (not Single-user)
• max_parallel_workers_per_gather > 0
• dynamic_shared_memory_type ≠ none
• Read-only = No writing, no locking, in top-level or CTE †
• no suspension during execution ( ex: DECLARE CURSOR, PL/pgSQL loop )
• no function marked PARALLEL UNSAFE (user-defined functions by default)
• not nested in a parallel plan †?
• transaction isolation level = SERIALIZABLE †
32
Slide 33
Slide 33 text
planned but not executed ☹
• no background workers can be obtained
• client sends an Execute message with a non-zero fetch count
(not possible in libpq connections) (otherwise, disable p-q to avoid plan)
• prepared statement with CREATE TABLE … AS EXECUTE … (read-write)
• transaction isolation level = SERIALIZABLE † (if changed after plan)
33
Slide 34
Slide 34 text
parallel scans
• 9.6
• Sequential scan
Each table block handed
out one-by-one to
workers.
• Background worker
processes
• Dynamic background
workers
• Dynamic shared
memory
34
• 10
• Sequential scan
• Bitmap heap scan
Leader process scans index(es), builds bitmap of
blocks to visit, then hands out each table block to
workers.
• Btree index scan †
Workers take turns reading index, for each index block
scanning-sorting-returning its referenced tuples.
• Gather Merge (respect sorted tuples)
• query text of parallel worker now in
pg_stat_activity (debug crashes, etc.)
Slide 35
Slide 35 text
parallel joins
• 9.6
• hash joins
• nested loops
35
• 10
• hash joins
• nested loops
• merge joins
Opportunity for improvement:
work on the inner side of the join is duplicated by every participant
Slide 36
Slide 36 text
bibliography
• Parallelism Progress by Robert
Haas, 2013-10
• Parallel Query (in 9.6) wiki
• Parallel Query v2 by Robert Haas,
2017-03
• New in postgres 10 wiki, section
“Additional Parallelism”
36
• Postgres manual
• Chapter 15 Parallel Query,
version 9.6 and version 10
• Chapter 45/47 Background
Worker Processes
• Chapter 19.4 Resource
Consumption
Slide 37
Slide 37 text
Morocco
37
Slide 38
Slide 38 text
statistics
• Statistics = count & content of values in rows for a column
• most/least frequent values, value cardinality, rudimentary histograms
• Postgres planner ( cost-estimation engine )
• Smarter plan = faster query = less resource intense
• Automatically collected: VACUUM, ANALYZE, CREATE INDEX, etc.
• per-individual-column
• New in 10: multi-column
38
Slide 39
Slide 39 text
functional dependency
• functional dependency ( from database normalization )
Ex: EmpInDept table | EmpId & EmpName & DeptId & DeptName
Ex: Person table | BirthDate & Age & IsMinor
Non-ex: Person table | HairColor & ShoeSize & FavoriteSong
• de-normalized structure ( intentional or unwitting )
• partial functional dependency
ex: Zip Codes almost always determine City
➠ Postgres planner won’t know that until you tell it to look
39
Slide 40
Slide 40 text
cross-column statistics
• Tell Postgres when columns are kinda-sorta related:
CREATE STATISTICS zip_stats_ ( dependencies )
ON zip_ , city_ FROM zipcodes_ ;
• Syntax
• CREATE STATISTICS [ IF NOT EXISTS ] name
[ type = dependencies or ndistinct or omitted=all ]
ON colx , coly , …
FROM tbl
• ALTER STATISTICS name
OWNER TO … | RENAME TO … | SET SCHEMA …
• DROP STATISTICS [ IF EXISTS ] name
40
= functional dependency
calculation, 1.0 = absolute
= distinct combos,
for GROUP BY
bibliography
• PG Phriday: Crazy Correlated Column Crusade by Shaun Thomas,
2ndQuadrant.com, 2017-07
• Manual, Chapter 14.2 Statistics Used by the Planner
• Manual, Chapter 68 How the Planner Uses Statistics
• Wikipedia: Functional dependency
• Postgres wiki, Cross Columns Stats
42
Slide 43
Slide 43 text
43
Slide 44
Slide 44 text
44
Slide 45
Slide 45 text
45
Slide 46
Slide 46 text
46
Slide 47
Slide 47 text
47
Slide 48
Slide 48 text
48
Slide 49
Slide 49 text
49
Piet Mondrian
Slide 50
Slide 50 text
50
Yves Saint Laurent
Slide 51
Slide 51 text
51
Slide 52
Slide 52 text
collation
• deciding the order of things, rules for sorting
• diacriticals: e è é ê ë
• Swedish: z < ö German: ö < z
• whitespace, punctuation, case
• Ex: resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, …
52
Slide 53
Slide 53 text
collation in Postgres
• Postgres < 10 = host OS rules ( C library )
• passing strings to strcmp(), strcoll(), and the like
• crude, not world-savvy ☹
• behavior varies ☹
• index lookup breaks with changes ( data corruption & apparent data loss ) ☹
• Postgres 10
• Unicode Collation Algorithm ( UCA ) ( 79-page spec )
• International Components for Unicode ( ICU ) ( implementation of UCA )
53
Slide 54
Slide 54 text
ICU
• adopting ICU implementation
• ICU = International Components for Unicode
• very complete with deep coverage, constantly updated
• well-worn: from Taligent/IBM, to Java i18n, to C/C++ (ICU4C), to ICU4J
• provides many Unicode-savvy text services to OSes & environments
• stable (versioned, no compatibility break in minor updates)
• great move for Postgres team | binary linked to major version of libicu
54
Slide 55
Slide 55 text
using ICU
• Build for ICU
• from source, use ./configure --with-icu with options
• Get old libc collations too
• Append -x-icu for ICU collation, omit for old libc collation
• CREATE TABLE ... (... x text COLLATE "en_US" ...)
CREATE TABLE ... (... x text COLLATE "en_US-x-icu" ...)
• Prefix (locale name) is the usual standard Language, Script, Country, Variant.
• But not a POSIX locale ID
55
Slide 56
Slide 56 text
another example of ICU
• More variants available ☻
• …even emoji !
• … VALUES
('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz')
• SELECT … ORDER BY name COLLATE "de-u-co-standard-x-icu"
SELECT … ORDER BY name COLLATE "de-u-co-phonebk-x-icu"
SELECT … ORDER BY name COLLATE "de-AT-u-co-phonebk-x-icu"
56
Slide 57
Slide 57 text
side-benefit: fix for Abbreviated Keys
• Postgres 9.5 new feature: Abbreviated Keys
• Faster sorts & indexing for text/varchar columns by using an algorithm
called "abbreviated keys". Up to 20x.
• Ruined
• buggy implementations of strcoll() in glibc ( the C standard library
common on Linux systems ) not match strxfrm()
( see Postgres wiki )
• Disabled in 9.5.2
• May be good again with ICU collation ??
57
Slide 58
Slide 58 text
more benefits
• Possibly case-insensitive & accent-insensitive in Pg 11/12
• Now can detect and warn of major change to version of collation. Might
refine and automate in future.
58
Slide 59
Slide 59 text
bibliography
• More robust collations with ICU support in PostgreSQL 10 by Peter
Eisentraut, 2ndQuadrant.com, 2017-05
• Wikipedia: Unicode collation algorithm ( UCA ) ( spec, 79 pages )
• Wikipedia: International Components for Unicode ( implementation of UCA )
• Manual, Chapter 23.2 Collation Support ( 23.2.2.2.2. ICU collations )
• Locale naming: ICU User Guide – Locale
59
Slide 60
Slide 60 text
60
damn
corvids
Slide 61
Slide 61 text
logical replication
• Previously, physical replication
• entire cluster copied from primary server to replica server
• Impossible to copy over just one database, or just one table
• Now possible
• Replicate across versions of Postgres
• Replicate by table
• lite coverage here, as it gets so much press elsewhere
61
Slide 62
Slide 62 text
physical vs logical
• physical = uses exact block addresses and byte-by-byte replication
• standby server = bit for bit copy of primary server
• a.k.a. streaming replication
• logical = replicate data objects & changes, based on identity ( primary key )
• Pub-Sub – publish and subscribe model – CREATE PUBLICATION
• subscribers pull data from pubs they subscribe
• cascade onwards
62
Slide 63
Slide 63 text
bibliography
• Manual, Chapter 31, Logical Replication
• Logical Replication in PostgreSQL 10 by Petr Jelinek, 2ndQuadrant.com,
2017-04
63
Slide 64
Slide 64 text
64
Slide 65
Slide 65 text
TIMESTAMP simpler
• no more floating-point timestamp
• archaic implementation of TIMESTAMP data type
• good riddance
• manual page easier to read now
• By the way, also dropping:
• original client/server protocol (superceded in 1998)
• pg_dump support for Postgres 7.4 and earlier
65
Slide 66
Slide 66 text
renaming/renumbering
• Postgres numbering: Major.Minor
• Simpler. Save wasted time (8 vs 9 vs 10 debates)
• Renaming for Write-Ahead Log
• ‘xlog’ changed to ‘wal’
• 2 directories, 12 admin functions, 8 system views/functions, 6 executables
• Implications for scripts and automation tools
66
Slide 67
Slide 67 text
passwords
• passwords always encrypted
no more cleartext
• SCRAM authentication
Salted Challenge Response
Authentication Mechanism
• encryption on client
• only in libc for now
• Postgres 10 highlight - SCRAM
authentication by Michael Paquier
67
authentication
external internal
Kerberos
SSPI
TLS/SSL
cleartext
salt-hash-MD5
SCRAM-SHA-256
client server
CA certificate
+ knowledge of
salted hash of pw
salted hash of pw
+ challenge
pw salted
with PBKDF2
MiM
Slide 68
Slide 68 text
row-level security
• Old
• permissive policies ( combined as OR ) ( default )
➡ any matching policy allowed access
• New
• restrictive policies ( combined as AND )
➡ must match for access to be granted
• may combine permissive + restrictive
• CREATE POLICY name ON table_name
[ AS { PERMISSIVE | RESTRICTIVE } ] …
68
Slide 69
Slide 69 text
general bibliography
• New Features Coming in PostgreSQL 10 by Robert Haas, EnterpriseDB.com,
2017-04
• Postgres wiki, New in postgres 10
• Release Notes
69