Slide 1

Slide 1 text

Will PostgreSQL be THE Cloud Native Database? Dave Stokes @Stoker [email protected]

Slide 2

Slide 2 text

©2023 Percona | Confidential | Internal use only Who Am I I am Dave Stokes Technology Evangelist at Percona Author of MySQL & JSON - A Practical Programming Guide Over a decade on the Oracle MySQL Community Team Started with MySQL 3.29 2

Slide 3

Slide 3 text

©2023 Percona Percona is an open source database software, support, and services company that helps make databases and applications run better.

Slide 4

Slide 4 text

QUIZ TIME!!!!!!

Slide 5

Slide 5 text

©2023 Percona | Confidential | Internal use only Name that database! ● It is a Popular open-source database ● Originated by a guy with the first name Michael ● Michael has a history of saying some outrageous things ● Michael has formed several companies

Slide 6

Slide 6 text

©2023 Percona | Confidential | Internal use only The answer is PostgreSQL, MySQL, Or MariaDB Or Vertica, etc.

Slide 7

Slide 7 text

©2023 Percona | Confidential | Internal use only One relational database is pretty much the same as another, right? While both PostgreSQL and MySQL are both RDMSes, there are some rather dramatic differences in features, performance, and operation. There are advantages, and disadvantaged, to both which could severely impact how you do business.

Slide 8

Slide 8 text

©2023 Percona | Confidential | Internal use only One document database is pretty much the same as another, right? MongoDB’s change of license caught a lot of folks flat footed.. Not easy to move to another document database. Document databases do not have a lot of relational features on which projects depend.

Slide 9

Slide 9 text

©2023 Percona | Confidential | Internal use only Differences - MySQL versus PostgreSQL While PostgreSQL is generally regarded as having the better SQL support and more features, there are some operators such as JSON_TABLE() which are invaluable but missing. Replication is another area where things are not as easy to establish. And PostgreSQL has operational issues such as index bloat and vacuuming that will catch you if you are unprepared. MySQL can do ‘wrong’ things with data, is missing functions like MERGE(), and runs fairly well even if poorly implemented. Make sure you have a unique primary key on ALL tables or things can go badly.

Slide 10

Slide 10 text

©2023 Percona | Confidential | Internal use only Where is MySQL going? MySQL was the foundation of the LAMP stack but recently Oracle's focus has been on the non-open-source Heatwave analytics engine while seeming to ignore the core database product. While much easier to set up, MySQL may lack functionality like MERGE() or have a complete implementation in areas such as Window Functions for your needs. Usually seen as the ‘good enough’ database

Slide 11

Slide 11 text

©2023 Percona | Confidential | Internal use only Where is PostgreSQL going? Mailing lists!?! While maybe not as effective as a top-down management in a corporation, PG’s ability to govern itself over email is amazing. Some developments take longer but the code is usually outstanding. Several long term issues (mentioned later) still are being (slowly) worked on in the once yearly major release. Replication is not as straightforward as MYSQL but is improving rapidly. Lots of innovation, advanced features, and can be complex to configure. No one stop shopping! You will probably need extensions and add-ons from others, that may or may not work with others.

Slide 12

Slide 12 text

©2023 Percona | Confidential | Internal use only PostgreSQL versus MySQL differences PostgreSQL: Better SQL Standard Support Governed by mailing list, consensus Active community MySQL: ‘Easier’ Governed (?) by Oracle Active community Both: Relational Database Management Systems Open Source Popular Old enough to allowed to drink (therefore seen as ‘not cool’ by some) 'The devil is in the details' Ludwig Mies Van Der Rohe.

Slide 13

Slide 13 text

©2023 Percona | Confidential | Internal use only https://db-engines.com/en/ranking_osvsc The above chart shows the historical trend of the popularity of open source and commercial database management systems.

Slide 14

Slide 14 text

©2023 Percona | Confidential | Internal use only

Slide 15

Slide 15 text

©2023 Percona | Confidential | Internal use only Projected Trends Yikes!

Slide 16

Slide 16 text

©2023 Percona | Confidential | Internal use only PostgreSQL has some interesting stuff not in MySQL ● Materialized Views ● MERGE() – process transactions logs, like from cash registers, as a batch rather than multiple round trips between application and database ● TWO JSON data types ● Many different types of indexes ○ Ability to index only parts of an index ○ Can ‘roll your own’ ● Better SQL standard compliance ○ More complete Window Functions ○ Sequences ■ Similar to MySQL AUTO_INCREMENT ■ Great for building test data ● Basis for many projects ○ FerretDB - MongoDB protocol ● Harder to setup and run ○ Upgrades can be tricky

Slide 17

Slide 17 text

OMGHDWSHTPI2023* Oh My Goodness How Do We Still Have This Problem In 2023?

Slide 18

Slide 18 text

©2023 Percona | Confidential | Internal use only Vacuum

Slide 19

Slide 19 text

©2023 Percona | Confidential | Internal use only 19 VACUUM reclaims storage occupied by dead tuples*. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables. -PG Documentation A tuple is PostgreSQL's internal representation of a row in a table. MySQL uses as difference MVCC approach that automatically takes care of dead tuples and vacuuming will seem very odd to a MySQL DBA

Slide 20

Slide 20 text

©2023 Percona | Confidential | Internal use only Wrap Around XIDs PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction XID is “in the future” and should not be visible to the current transaction. XIDs have limited size of 32 bits so a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound XID counter wraps around to zero transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions. 20

Slide 21

Slide 21 text

©2023 Percona | Confidential | Internal use only Caveats Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However, extra space is not returned to the operating system (in most cases); it's just kept available for re-use within the same table. It also allows us to leverage multiple CPUs in order to process indexes. This feature is known as parallel vacuum. VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed. 21

Slide 22

Slide 22 text

©2023 Percona | Confidential | Internal use only Transaction ID Wraparound 32-bit transaction ID - Much Too Small

Slide 23

Slide 23 text

©2023 Percona | Confidential | Internal use only XIDs can be viewed as lying on a circle or circular buffer. As long as the end of that buffer does not jump past the front, the system will function correctly. To prevent running out of XIDs and avoid wraparound, the vacuum process is also responsible for “freezing” row versions that are over a certain age (tens of millions of transactions old by default). However, there are failure modes which prevent it from freezing extremely old tuples and the oldest unfrozen tuple limits the number of past IDs that are visible to a transaction (only two billion past IDs are visible). If the remaining XID count reaches one million, the database will stop accepting commands and must be restarted in single-user mode to recover. Therefore, it is extremely important to monitor the remaining XIDs so that your database never gets into this state.

Slide 24

Slide 24 text

©2023 Percona | Confidential | Internal use only Wow Factor The Things a MySQL DBA will be impressed by

Slide 25

Slide 25 text

©2023 Percona | Confidential | Internal use only Materialized Views, Watch, Many Types of Indexes, & FILTER SELECT fa.actor_id, SUM(length) FILTER (WHERE rating = 'R'), SUM(length) FILTER (WHERE rating = 'PG') FROM film_actor AS fa LEFT JOIN film AS f ON f.film_id = fa.film_id GROUP BY fa.actor_id

Slide 26

Slide 26 text

©2023 Percona | Confidential | Internal use only Replication No open source equivalent to InnoDB Cluster or even Galera

Slide 27

Slide 27 text

©2023 Percona | Confidential | Internal use only Need for connection pooling - multi-process versus multi-threading

Slide 28

Slide 28 text

©2023 Percona | Confidential | Internal use only Some reading https://www.youtube.com/watch?v=S7jEJ9o9o2o https://www.highgo.ca/2021/03/20/how-to-check-and-resolve-bloat-in-postgresql/ https://onesignal.com/blog/lessons-learned-from-5-years-of-scaling-postgresql/ https://www.postgresql.org/docs/ https://www.scalingpostgres.com/ https://psql-tips.org/psql_tips_all.html

Slide 29

Slide 29 text

©2023 Percona | Confidential | Internal use only http://smalldatum.blogspot.com/2023/09/checki ng-postgres-for-perf-regressions.html Checking Postgres for perf regressions from 11.21 to 16.0 with sysbench and a small server

Slide 30

Slide 30 text

©2023 Percona | Confidential | Internal use only “It is different” Different != Better

Slide 31

Slide 31 text

©2023 Percona | Confidential | Internal use only https://ottertune.com/blog/the-part-of-postgresql-we- hate-the-most/ https://philbooth.me/blog/nine-ways-to-shoot-yourself -in-the-foot-with-postgresql https://neon.tech/blog/quicker-serverless-postgres Links

Slide 32

Slide 32 text

Thank You! [email protected] @Stoker Speakerdeck.com/Stoker