Slide 1

Slide 1 text

“UPSERT” in PostgreSQL Peter Geoghegan Pronounced / e n/ ˈɡ ɪɡə @petervgeoghegan SFPUG Tuesday, May 26, 2015

Slide 2

Slide 2 text

What is “UPSERT”? ● INSERT row, or UPDATE existing one to reconcile it with what you wanted to INSERT. Always one or the other of those two. No “ifs”, no “buts”. ● Coming in PostgreSQL 9.5. ● 9.5 also adds a DO NOTHING variant that never locks row. MySQL calls this INSERT IGNORE. ● You more or less don't have to worry about concurrency/race conditions with the feature.

Slide 3

Slide 3 text

What is “UPSERT”? (cont.) ● “Fundamental UPSERT property”. At READ COMMITTED isolation level, you should always get an insert or update. – No unprincipled deadlocking. – No spurious unique constraint violations. ● Only way to really ensure this is to make it driven by insert. Take alternative path (i.e., go update existing row) when would-be duplicate violation detected. ● Particularly useful for OLTP + web apps. Great for ETL, too.

Slide 4

Slide 4 text

Syntax ­­ “UPSERT” some distributors: INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';

Slide 5

Slide 5 text

EXCLUDED.* pseudo table ­­ “UPSERT” some distributors: INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';

Slide 6

Slide 6 text

Inference Specification ­­ “UPSERT” some distributors: INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';

Slide 7

Slide 7 text

More on inference ­­ “UPSERT” some distributors: INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ● Must specify which unique index to take update action on. ● Prefer to do this with inference clause, naming relevant column (or columns, or expressions, ...). ● Alternatively, ON CONFLICT can name constraint directly, but avoid it (Okay for exclusion constraints, but not unique constraints). ● Very flexible and forgiving. Borrows stuff from CREATE INDEX. ● Optimizer throws error if it cannot find satisfactory unique index.

Slide 8

Slide 8 text

EXPLAIN 'ing an UPSERT postgres=# EXPLAIN INSERT INTO upsert (key, val) VALUES(1, 'Baz') ON CONFLICT (key) DO UPDATE SET val = EXCLUDED.val WHERE EXCLUDED.val <> 'Contrived'; QUERY PLAN ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Insert on upsert (cost=0.00..0.01 rows=1 width=0) Conflict Resolution: UPDATE Conflict Arbiter Indexes: upsert_pkey Conflict Filter: (excluded.val <> 'Contrived'::text) ­> Result (cost=0.00..0.01 rows=1 width=0) (5 rows)

Slide 9

Slide 9 text

Why not SQL MERGE? ● SQL standard MERGE statement implemented by a few other systems. ● Synchronizes two tables, inserting, updating, and deleting as it goes. Individual MERGE statement may lack WHEN NOT MATCHED THEN INSERT “handler”. In contrast, Postgres UPSERT is “driven by INSERT”. ● Source table could be VALUES(), so appears UPSERT-like. ● Lacks “fundamental UPSERT property”, because can (for example) raise duplicate violations for simple UPSERT cases. ● ON CONFLICT DO UPDATE has some amount of MERGE-like flexibility. ● Idea that MERGE is equivalent to UPSERT is unfortunate myth.

Slide 10

Slide 10 text

What feature needed to work with (incomplete list) ● Data-modifying common table expressions (WITH clauses). ● And relatedly, RETURNING (projects updated rows, too). ● Exclusion constraints (for DO NOTHING only). ● Updatable views. ● postgres_fdw (and foreign data wrappers generally). ● And so on... – New row-level security stuff. – Older column privileges stuff. – User tables named “excluded”! – Inheritance. – Subqueries in nested SET list, and nested WHERE clause. ● Simple. ● Correlated. – Triggers. – Rules. – Partial indexes (for inference clause). – Alternative collations, non-default opclasses in inference clause (purely theoretical requirement).

Slide 11

Slide 11 text

Summary ● UPSERT has very flexible syntax – far more flexible (and far less error prone) than comparable MySQL feature, for example. ● Complicated to implement, but not complicated to use. Involved small revision to chapter in documentation on MVCC rules, for example. ● Very useful for ETL, too. Can resolve to not update based on what is found to already exist, so some MERGE-like capabilities here. ● Unlike MERGE, has no race conditions. No duplicate violations from arbiter unique index. No “unprincipled deadlocks”. ● However, unlike MERGE, inserting is one possible outcome that you must always be happy with.