Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"UPSERT" in PostgreSQL

"UPSERT" in PostgreSQL

Small talk covering the "UPSERT" feature coming in PostgreSQL 9.5.

https://www.youtube.com/watch?v=pbg97bkxbbY

Peter Geoghegan

May 26, 2015
Tweet

More Decks by Peter Geoghegan

Other Decks in Programming

Transcript

  1. “UPSERT” in PostgreSQL Peter Geoghegan Pronounced / e n/ ˈɡ

    ɪɡə @petervgeoghegan SFPUG Tuesday, May 26, 2015
  2. What is “UPSERT”? • INSERT row, or UPDATE existing one

    to reconcile it with what you wanted to INSERT. Always one or the other of those two. No “ifs”, no “buts”. • Coming in PostgreSQL 9.5. • 9.5 also adds a DO NOTHING variant that never locks row. MySQL calls this INSERT IGNORE. • You more or less don't have to worry about concurrency/race conditions with the feature.
  3. What is “UPSERT”? (cont.) • “Fundamental UPSERT property”. At READ

    COMMITTED isolation level, you should always get an insert or update. – No unprincipled deadlocking. – No spurious unique constraint violations. • Only way to really ensure this is to make it driven by insert. Take alternative path (i.e., go update existing row) when would-be duplicate violation detected. • Particularly useful for OLTP + web apps. Great for ETL, too.
  4. Syntax ­­ “UPSERT” some distributors: INSERT INTO distributors (did, dname)

    VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';
  5. EXCLUDED.* pseudo table ­­ “UPSERT” some distributors: INSERT INTO distributors

    (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';
  6. Inference Specification ­­ “UPSERT” some distributors: INSERT INTO distributors (did,

    dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; ­­ “INSERT IGNORE” variant: INSERT INTO distributors (did, dname) VALUES (7, 'Redline GmbH') ON CONFLICT (did) DO NOTHING; ­­ Consider not updating existing distributor: INSERT INTO distributors (did, dname)AS d VALUES (8, 'Anvil Distribution') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; WHERE d.zipcode != '21201';
  7. More on inference ­­ “UPSERT” some distributors: INSERT INTO distributors

    (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Associated Computing, inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname; • Must specify which unique index to take update action on. • Prefer to do this with inference clause, naming relevant column (or columns, or expressions, ...). • Alternatively, ON CONFLICT can name constraint directly, but avoid it (Okay for exclusion constraints, but not unique constraints). • Very flexible and forgiving. Borrows stuff from CREATE INDEX. • Optimizer throws error if it cannot find satisfactory unique index.
  8. EXPLAIN 'ing an UPSERT postgres=# EXPLAIN INSERT INTO upsert (key,

    val) VALUES(1, 'Baz') ON CONFLICT (key) DO UPDATE SET val = EXCLUDED.val WHERE EXCLUDED.val <> 'Contrived'; QUERY PLAN ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Insert on upsert (cost=0.00..0.01 rows=1 width=0) Conflict Resolution: UPDATE Conflict Arbiter Indexes: upsert_pkey Conflict Filter: (excluded.val <> 'Contrived'::text) ­> Result (cost=0.00..0.01 rows=1 width=0) (5 rows)
  9. Why not SQL MERGE? • SQL standard MERGE statement implemented

    by a few other systems. • Synchronizes two tables, inserting, updating, and deleting as it goes. Individual MERGE statement may lack WHEN NOT MATCHED THEN INSERT “handler”. In contrast, Postgres UPSERT is “driven by INSERT”. • Source table could be VALUES(), so appears UPSERT-like. • Lacks “fundamental UPSERT property”, because can (for example) raise duplicate violations for simple UPSERT cases. • ON CONFLICT DO UPDATE has some amount of MERGE-like flexibility. • Idea that MERGE is equivalent to UPSERT is unfortunate myth.
  10. What feature needed to work with (incomplete list) • Data-modifying

    common table expressions (WITH clauses). • And relatedly, RETURNING (projects updated rows, too). • Exclusion constraints (for DO NOTHING only). • Updatable views. • postgres_fdw (and foreign data wrappers generally). • And so on... – New row-level security stuff. – Older column privileges stuff. – User tables named “excluded”! – Inheritance. – Subqueries in nested SET list, and nested WHERE clause. • Simple. • Correlated. – Triggers. – Rules. – Partial indexes (for inference clause). – Alternative collations, non-default opclasses in inference clause (purely theoretical requirement).
  11. Summary • UPSERT has very flexible syntax – far more

    flexible (and far less error prone) than comparable MySQL feature, for example. • Complicated to implement, but not complicated to use. Involved small revision to chapter in documentation on MVCC rules, for example. • Very useful for ETL, too. Can resolve to not update based on what is found to already exist, so some MERGE-like capabilities here. • Unlike MERGE, has no race conditions. No duplicate violations from arbiter unique index. No “unprincipled deadlocks”. • However, unlike MERGE, inserting is one possible outcome that you must always be happy with.