Upgrade to Pro — share decks privately, control downloads, hide ads and more …

10 in 10: Ten new things in Postgres 10

10 in 10: Ten new things in Postgres 10

Ten of my favorite new things in Postgres 10.

- Replacement for SERIAL (identity column)
- Traceable COMMIT
- Parallel query
- Cross-column statistics
- New collation (sorting)
- More XML
- logical replication
- Simplifying timestamps
- Renaming & renumbering
- Security tweaks

We spend a chunk of time explaining the surprisingly troublesome nature of the Postgres-specific `SERIAL` and how it is now gladly replaced by SQL-standard identity columns. Another chunk of time is spent on the practicalities of determining if a transaction completed successfully or not, now made easier with traceable COMMIT. Then we accelerate moving faster and faster through the remaining items, looking at the nooks and crannies of new features that I found useful for my own work and that you may not have yet discovered.

Updated 2018-01.

By Basil Bourque.
LinkedIn: basilbourque

Basil Bourque

January 31, 2018
Tweet

More Decks by Basil Bourque

Other Decks in Programming

Transcript

  1. baker’s dozen • replacement for SERIAL • traceable COMMIT •

    parallel query • cross-column statistics • new collation (sorting) • more XML 2 • logical replication • simplifying timestamps • renaming & renumbering • security tweaks
  2. caveat • I am no expert on these topics •

    Struck my fancy • I dove in, studied, played a bit • Take what I say with grain of salt: See bibliographies for straight scoop • Final shipping version of Postgres may vary 3
  3. primary key 4 natural key (from your data) surrogate key

    (extra column, artificial value) sequential numbers (1, 2, 3, …) UUID (128-bit value) SERIAL
  4. SMALLSERIAL | SERIAL | BIGSERIAL 5 CREATE TABLE tbl (

    col SERIAL PRIMARY KEY, col2 VARCHAR( 80 ) , ) ; INSERT INTO tbl ( col2 ) VALUES ('Basil') RETURNING col ; • See doc • Defines a SEQUENCE • Marks col NOT NULL • Default col to sequence • 16-bit, 32-bit, 64-bit Get back your number
  5. SERIAL pseudo-type • type erasure 7 CREATE TABLE tbl (

    col SERIAL ) ; CREATE SEQUENCE tbl_col_seq ; CREATE TABLE tbl ( col INT NOT NULL DEFAULT nextval( 'tbl_col_seq' ) , col2 VARCHAR(20) ) ; ALTER SEQUENCE tbl_col_seq OWNED BY tbl.col ;
  6. sequence permission separated • CREATE USER & GRANT INSERT •

    INSERT INTO tbl ( col2 ) VALUES ( 'whatever' ) ;
 ERROR: permission denied for sequence tbl_col_seq • solution:
 GRANT USAGE ON SEQUENCE tbl_col_seq ; 8 name ?
  7. sequence name needed for mgmt • That mysterious name needed

    for managing value • ALTER SEQUENCE tbl_col_seq RESTART WITH 1000 ; 9 name ?
  8. dropping SERIAL is tricky • Determine name of sequence •

    Remember to add CASCADE to drop command to clear DEFAULT • DROP SEQUENCE tbl_col_seq CASCADE ; • Beware:
 ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT ;
 … leaves sequence in place
 10 ☹ ☻
  9. CREATE TABLE / LIKE incorrect • CREATE TABLE tbl2 (

    LIKE tbl INCLUDING ALL ) ; • Good: Copies the DEFAULT with SEQUENCE • Bad: Uses the same sequence for both tables (!) • Drop first table fails – “other objects depend on it” • DROP TABLE tbl ; -- ERROR • Inherited table is similar
 DEFAULT is inherited but the sequence ownership is not 11
  10. replacement for SERIAL • Identity column – GENERATED … AS

    IDENTITY ( … ) • creates implicit sequence, hidden from us, with optional override • GENERATED ALWAYS AS IDENTITY 
 GENERATED BY DEFAULT AS IDENTITY • ALWAYS ignores user-provided value
 unless INSERT … OVERRIDING SYSTEM VALUE 12 CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;
  11. benefits of GENERATED…AS IDENTITY • No need to manage DEFAULT


    ALTER TABLE tbl ALTER COLUMN col DROP DEFAULT;
 ERROR: column "col" of relation "tbl" is an identity column
 HINT: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead. • CREATE TABLE … LIKE gets its own identity sequence ( if identity included ) • INCLUDING IDENTITY | EXCLUDING IDENTITY 13 • Standard SQL. Feature # T174 & F386 & T178 ( first SQL: 2003, clarified in SQL:2008 ) • 'generated' quality is remembered ( no ‘type erasure’ ) • No permissions issue, as sequence implicit/internal • No name involved – simply alter the column
 ALTER TABLE tbl 
 ALTER COLUMN col 
 RESTART WITH 1000 ; • Simple DROP TABLE drops sequence
  12. GENERATED … AS IDENTITY behavior • INSERT INTO command remains

    the same – no change to existing code • One per table, maximum • Type of sequence matches column implicitly ( 16-, 32-, 64-bit integer ) • Manual adjustments • Adding identity column does not generate values for existing rows • Sequence continues ( not reset ) after TRUNCATE TABLE 14
  13. takes options of CREATE SEQUENCE • START WITH start •

    MINVALUE minvalue | NO MINVALUE • MAXVALUE maxvalue | NO MAXVALUE • INCREMENT [ BY ] increment • CYCLE | NO CYCLE • CACHE cache • OWNED BY tbl.col | OWNED BY NONE 15
  14. silly example 16 id_ INTEGER GENERATED ALWAYS AS IDENTITY (

    START WITH 200 MINVALUE 100 MAXVALUE 205 CYCLE INCREMENT BY 3 ) PRIMARY KEY Adding 4 rows: 200 203 100 103
  15. accessing the backing sequence • Call existing function: pg_get_serial_sequence •

    Example… Return the value most recently obtained by nextval for this sequence in the current session: 17 SELECT currval ( pg_get_serial_sequence( 'tbl' , 'col' ) ) ;
  16. moving to GENERATED … AS IDENTITY • See documentation for:

    • CREATE TABLE • CREATE SEQUENCE ( for options ) • Begin using on new tables • Convert old tables using PL/pgSQL function by Peter Eisentraut • Please test ! 18
  17. still need PRIMARY KEY • GENERATED … AS IDENTITY •

    NOT NULL • CREATE SEQUENCE • options 
 (start, min/max, etc.) • DEFAULT nextval • tied to column
 (no separate management) 19 • PRIMARY KEY • UNIQUE • indexed • relationship 
 ( foreign key & JOIN ) CREATE TABLE tbl ( col BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , col2 text ) ;
  18. bibliography • PostgreSQL 10 identity columns explained by Peter Eisentraut

    • Waiting for PostgreSQL 10 – Identity Columns by depesz • SQL:2011 standard (draft)
 4.15.11 Identity columns
 4.22 Sequence generators
 11.20 <alter identity column specification> 20
  19. traceable COMMIT • BEGIN … some work … COMMIT …

    lost connection/crash • Did COMMIT command reach server? Did it succeed? • txid_status( bigint ) function, new • look-up a recent transaction-id • committed or aborted, due to crash or rollback • don’t wait too long • lighter alternative to two-phase commit • may be useful in other situations 22
  20. capture txn id • Need to remember the transaction id

    during the transaction, to ask after • txid_current() 
 txid_current_if_assigned() -- new command • queries for txid that may not exist (until real work) • 64-bit xid-with-epoch = 32-bit epoch counter + 32-bit xid • epoch defaults to zero, in my trial • INSERT INTO ...; SELECT txid_current();
 INSERT INTO ... RETURNING txid_current(); 23
  21. result • txid_status( bigint ) → txid_status • data type:

    txid_status
 effectively: text • domain: committed | aborted | in progress | NULL • NULL = no references to that transaction survive in the system, and the commit status information has been discarded • “This happens once all tuples associated with that transaction and all older transactions have been deleted or updated to newer versions and vacuum has cleaned up the old transaction history information.” – Craig Ringer 24
  22. if ‘in progress’ • Either: • wait • terminate •

    look up the associated backend in pg_stat_activity • call pg_terminate_backend( … ) 25
  23. 2PC • Does not fully substitute for two-phase commit •

    knowledge of transaction status is eventually discarded • you cannot actually ROLLBACK a transaction once COMMITted • does not plug into txn managers ( JTA, XA, MS-DTC ) 26
  24. bibliography • Traceable commit for PostgreSQL 10 by Craig Ringer,

    on 2ndQuadrant.com, 2017-04-05 • Email thread: [PATCH] Transaction traceability - txid_status(bigint) • Doc page 9.25 System Information Functions ( Table 9.69 ) 27
  25. parallel query • Split queries into multiple tasks, perform in

    parallel on multiple cores • Parallel Query new in 9.6, enhanced in 10 • Faster especially for lots of data returning few rows, and for aggregates • 2x-4x faster (maybe) • Great boost for some queries, but irrelevant for many ( no magic ☹ ) • Some limitations may ease-up in future development (see daggers) 29 2x 4x
  26. Terminology • Gather node • Task being divided, the query.

    Dispatcher, gathering workers’ results. • Background worker process • Extend Postgres to run user-supplied code in separate processes • Access shared memory • Make libpq connections to server • Another Postgres process (native in OS), like a user-session process 30
  27. parallel query • 9.6 • OFF by default 31 •

    10 • ON by default max_worker_processes = 8 max_parallel_workers_per_gather = 0 max_worker_processes = 8 max_parallel_workers = 8 max_parallel_workers_per_gather = 2 Use EXPLAIN to see number of workers chosen by planner
  28. conditions required by planning • Multi-user mode (not Single-user) •

    max_parallel_workers_per_gather > 0 • dynamic_shared_memory_type ≠ none • Read-only = No writing, no locking, in top-level or CTE † • no suspension during execution ( ex: DECLARE CURSOR, PL/pgSQL loop ) • no function marked PARALLEL UNSAFE (user-defined functions by default) • not nested in a parallel plan †? • transaction isolation level = SERIALIZABLE † 32
  29. planned but not executed ☹ • no background workers can

    be obtained • client sends an Execute message with a non-zero fetch count
 (not possible in libpq connections) (otherwise, disable p-q to avoid plan) • prepared statement with CREATE TABLE … AS EXECUTE … (read-write) • transaction isolation level = SERIALIZABLE † (if changed after plan) 33
  30. parallel scans • 9.6 • Sequential scan
 Each table block

    handed out one-by-one to workers. • Background worker processes • Dynamic background workers • Dynamic shared memory 34 • 10 • Sequential scan • Bitmap heap scan
 Leader process scans index(es), builds bitmap of blocks to visit, then hands out each table block to workers. • Btree index scan † 
 Workers take turns reading index, for each index block scanning-sorting-returning its referenced tuples. • Gather Merge (respect sorted tuples) • query text of parallel worker now in pg_stat_activity (debug crashes, etc.)
  31. parallel joins • 9.6 • hash joins • nested loops

    35 • 10 • hash joins • nested loops • merge joins Opportunity for improvement:
 work on the inner side of the join is duplicated by every participant
  32. bibliography • Parallelism Progress by Robert Haas, 2013-10 • Parallel

    Query (in 9.6) wiki • Parallel Query v2 by Robert Haas, 2017-03 • New in postgres 10 wiki, section “Additional Parallelism” 36 • Postgres manual • Chapter 15 Parallel Query, version 9.6 and version 10 • Chapter 45/47 Background Worker Processes • Chapter 19.4 Resource Consumption
  33. statistics • Statistics = count & content of values in

    rows for a column • most/least frequent values, value cardinality, rudimentary histograms • Postgres planner ( cost-estimation engine ) • Smarter plan = faster query = less resource intense • Automatically collected: VACUUM, ANALYZE, CREATE INDEX, etc. • per-individual-column • New in 10: multi-column 38
  34. functional dependency • functional dependency ( from database normalization )


    
 Ex: EmpInDept table | EmpId & EmpName & DeptId & DeptName
 Ex: Person table | BirthDate & Age & IsMinor
 Non-ex: Person table | HairColor & ShoeSize & FavoriteSong • de-normalized structure ( intentional or unwitting ) • partial functional dependency
 ex: Zip Codes almost always determine City
 ➠ Postgres planner won’t know that until you tell it to look 39
  35. cross-column statistics • Tell Postgres when columns are kinda-sorta related:

    
 CREATE STATISTICS zip_stats_ ( dependencies ) 
 ON zip_ , city_ FROM zipcodes_ ; • Syntax • CREATE STATISTICS [ IF NOT EXISTS ] name
 [ type = dependencies or ndistinct or omitted=all ]
 ON colx , coly , …
 FROM tbl • ALTER STATISTICS name 
 OWNER TO … | RENAME TO … | SET SCHEMA … • DROP STATISTICS [ IF EXISTS ] name 40 = functional dependency calculation, 1.0 = absolute = distinct combos, 
 for GROUP BY
  36. bibliography • PG Phriday: Crazy Correlated Column Crusade by Shaun

    Thomas, 2ndQuadrant.com, 2017-07 • Manual, Chapter 14.2 Statistics Used by the Planner • Manual, Chapter 68 How the Planner Uses Statistics • Wikipedia: Functional dependency • Postgres wiki, Cross Columns Stats 42
  37. 43

  38. 44

  39. 45

  40. 46

  41. 47

  42. 48

  43. 51

  44. collation • deciding the order of things, rules for sorting

    • diacriticals: e è é ê ë • Swedish: z < ö German: ö < z • whitespace, punctuation, case • Ex: resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, … 52
  45. collation in Postgres • Postgres < 10 = host OS

    rules ( C library ) • passing strings to strcmp(), strcoll(), and the like • crude, not world-savvy ☹ • behavior varies ☹ • index lookup breaks with changes ( data corruption & apparent data loss ) ☹ • Postgres 10 • Unicode Collation Algorithm ( UCA ) ( 79-page spec ) • International Components for Unicode ( ICU ) ( implementation of UCA ) 53
  46. ICU • adopting ICU implementation • ICU = International Components

    for Unicode • very complete with deep coverage, constantly updated • well-worn: from Taligent/IBM, to Java i18n, to C/C++ (ICU4C), to ICU4J • provides many Unicode-savvy text services to OSes & environments • stable (versioned, no compatibility break in minor updates) • great move for Postgres team | binary linked to major version of libicu 54
  47. using ICU • Build for ICU • from source, use

    ./configure --with-icu with options • Get old libc collations too • Append -x-icu for ICU collation, omit for old libc collation • CREATE TABLE ... (... x text COLLATE "en_US" ...)
 CREATE TABLE ... (... x text COLLATE "en_US-x-icu" ...) • Prefix (locale name) is the usual standard Language, Script, Country, Variant. • But not a POSIX locale ID 55
  48. another example of ICU • More variants available ☻ •

    …even emoji ! • … VALUES 
 ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz') • SELECT … ORDER BY name COLLATE "de-u-co-standard-x-icu"
 SELECT … ORDER BY name COLLATE "de-u-co-phonebk-x-icu"
 SELECT … ORDER BY name COLLATE "de-AT-u-co-phonebk-x-icu" 56
  49. side-benefit: fix for Abbreviated Keys • Postgres 9.5 new feature:

    Abbreviated Keys • Faster sorts & indexing for text/varchar columns by using an algorithm called "abbreviated keys". Up to 20x. • Ruined • buggy implementations of strcoll() in glibc ( the C standard library common on Linux systems ) not match strxfrm() 
 ( see Postgres wiki ) • Disabled in 9.5.2 • May be good again with ICU collation ?? 57
  50. more benefits • Possibly case-insensitive & accent-insensitive in Pg 11/12

    • Now can detect and warn of major change to version of collation. Might refine and automate in future. 58
  51. bibliography • More robust collations with ICU support in PostgreSQL

    10 by Peter Eisentraut, 2ndQuadrant.com, 2017-05 • Wikipedia: Unicode collation algorithm ( UCA ) ( spec, 79 pages ) • Wikipedia: International Components for Unicode ( implementation of UCA ) • Manual, Chapter 23.2 Collation Support ( 23.2.2.2.2. ICU collations ) • Locale naming: ICU User Guide – Locale 59
  52. logical replication • Previously, physical replication • entire cluster copied

    from primary server to replica server • Impossible to copy over just one database, or just one table • Now possible • Replicate across versions of Postgres • Replicate by table • lite coverage here, as it gets so much press elsewhere 61
  53. physical vs logical • physical = uses exact block addresses

    and byte-by-byte replication • standby server = bit for bit copy of primary server • a.k.a. streaming replication • logical = replicate data objects & changes, based on identity ( primary key ) • Pub-Sub – publish and subscribe model – CREATE PUBLICATION • subscribers pull data from pubs they subscribe • cascade onwards 62
  54. bibliography • Manual, Chapter 31, Logical Replication • Logical Replication

    in PostgreSQL 10 by Petr Jelinek, 2ndQuadrant.com, 2017-04 63
  55. 64

  56. TIMESTAMP simpler • no more floating-point timestamp • archaic implementation

    of TIMESTAMP data type • good riddance • manual page easier to read now • By the way, also dropping: • original client/server protocol (superceded in 1998) • pg_dump support for Postgres 7.4 and earlier 65
  57. renaming/renumbering • Postgres numbering: Major.Minor • Simpler. Save wasted time

    (8 vs 9 vs 10 debates) • Renaming for Write-Ahead Log • ‘xlog’ changed to ‘wal’ • 2 directories, 12 admin functions, 8 system views/functions, 6 executables • Implications for scripts and automation tools 66
  58. passwords • passwords always encrypted
 no more cleartext • SCRAM

    authentication
 Salted Challenge Response Authentication Mechanism • encryption on client • only in libc for now • Postgres 10 highlight - SCRAM authentication by Michael Paquier 67 authentication external internal Kerberos SSPI TLS/SSL cleartext salt-hash-MD5 SCRAM-SHA-256 client server CA certificate + knowledge of salted hash of pw salted hash of pw + challenge pw salted with PBKDF2 MiM
  59. row-level security • Old • permissive policies ( combined as

    OR ) ( default )
 ➡ any matching policy allowed access • New • restrictive policies ( combined as AND )
 ➡ must match for access to be granted • may combine permissive + restrictive • CREATE POLICY name ON table_name
 [ AS { PERMISSIVE | RESTRICTIVE } ] … 68
  60. general bibliography • New Features Coming in PostgreSQL 10 by

    Robert Haas, EnterpriseDB.com, 2017-04 • Postgres wiki, New in postgres 10 • Release Notes 69