Slide 1

Slide 1 text

Cloud Success AWS Aurora Postgres to the rescue! Vienna DB Meetup, 2024-02-22

Slide 2

Slide 2 text

Johannes Nagl #proudDadOfAGirl, CEO, (Head of Nightmare Projects), Developer at heart, Product Enthusiast with Swat.io for 14 years Follow on LinkedIn ! , Follow on Bluesky: @jol.li

Slide 3

Slide 3 text

Swat.io ✨ Bootstrapped Software as a Service company ✨ Coordinated Social Media Communication ✨ 45 Employees, some of them from Vienna ✨ 10 Developers, 1 Dev Ops Engineer ✨ Product almost 13 years in the making

Slide 4

Slide 4 text

Swat.io ✨ Nature of the application: Always on ✨ Write-intensive work-loads ✨ 100s of incoming realtime updates per second ✨ We basically only delete data when we lose customers

Slide 5

Slide 5 text

Swat.io ✨ Nature of the application: Always on ✨ Write-intensive work-loads ✨ 100s of incoming realtime updates per second ✨ We basically only delete data when we lose customers ✨ We rarely lose them

Slide 6

Slide 6 text

The current, primary Database as of now() ✨ 3,6 TB ✨ Biggest table: 1,350,000,000++ rows ✨ 79 tables

Slide 7

Slide 7 text

The history, part 1 ! We started with MySQL / Percona (on a shared webspace) ! We added UTF-8 / timezone-support on the fly ™ ! 9 years ago we found out: adding new indices or columns on production was a massive pain ! We also found out: Adding multiple indices in parallel might blow up your database engine

Slide 8

Slide 8 text

The history, part 2 ! Therefore: Postgres for the win! ! We migrated to Postgres 9.6 and had a decent Master-/Slave Setup on physical hardware ! We upgraded RAM + SSDs, but … ! Couple of years later, we found out that: There’s a physical limit of how much RAM and SSD discs you can put into a machine (2 x 2xCPU 2.3GHz 10-core 128GB RAM at start, in the end: 384GB RAM)

Slide 9

Slide 9 text

The problem, 2 years ago ! There was almost no place left on the SSDs (2.5 TB) ! We were stuck on Postgres 9.6 because ! No space for duplicating the db on the server with a newer Postgres version ! No idea of how long a in-place upgrade to a new version would mean (downtime > 6h+: impossible) ! No secondary system to duplicate Swat.io to perform upgrade tests

Slide 10

Slide 10 text

While we were discussing at least some new server options with our hosting provider back then … … an AWS account manager was reaching out to me.

Slide 11

Slide 11 text

The bright promise ! We were already AWS customer back then (only S3) ! While talking about life in general, the sales manager asked regarding current challenges ! When I mentioned “database is stuck on Postgres 9.6 and we need a new server” he was alerted and saw there’s a huge opportunity ! He dropped the famous sentence “Johannes, we have a tool called Database Migration service - with that you can migrate to a newer Postgres database version on AWS on the fly - without downtime” ! I was laughing out loud and said: “Sure thing”

Slide 12

Slide 12 text

AWS Aurora Unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility https://aws.amazon.com/rds/aurora/ at 1/10th the cost of commercial databases. Aurora has 5x the throughput of MySQL and 3x of PostgreSQL. https://aws.amazon.com/rds/aurora/ built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication

Slide 13

Slide 13 text

AWS Aurora, translated ! It's basically Postgres (or MySQL) ! But on steroids ! As managed service ! highly available ! with tons of AWS optimisations for the cloud

Slide 14

Slide 14 text

The bright future ! Well, we gave it a try ! We teamed up with an AWS partner (tecRacer) and agreed to go all in on AWS ! We investigated the project and found out that it should be doable

Slide 15

Slide 15 text

The process, high-level → It took us another 9 months to finally fully migrate to AWS → We tried multiple attempts and had some fuck ups in the process → In the end, we really upgraded to Postgres 13 without any problems (on that day) → We were happily ever after (almost)

Slide 16

Slide 16 text

The process, in-depth Pre-Requirements 1. Set up a Site-To-Site VPN between old hosting environment and AWS 2. Keep a look at disk space on-prem on legacy servers

Slide 17

Slide 17 text

The process, in-depth Runbook 1. Stop Swat.io 2. Create a replication slot 3. Start a dump (in total: runtime 19h) 4. Start Swat.io 5. When dump is done: Upload to S3 (1h) 6. Import Dump on AWS machine (16h) 7. AWS DMS Change Data Capture start (Replication lag of 36h) 8. When 0 latency: VACCUUM ANALYZE 9. Manually Fix Sequences (auto increments not properly set with sync mode we have chosen) 10. Turn off old app / servers 11. Turn on AWS app / servers 12. Cross your fingers

Slide 18

Slide 18 text

The fuckups ! It took quite some tunings on the AWS DMS bastion host sizing and AWS DMS configuration parameters ! We had some tries where replication lag did not went down ! We did not know about the sequences in the beginning ! We had some crashes of the old app because AWS DMS was overloading the system (DoS attack basically)

Slide 19

Slide 19 text

The advantages ! Down/Upscale Instances easily (we started with bigger machines in the beginning and downgraded then within minutes to a more cost-sensitive instance type) ! (also while migrating! " ) ! Snapshots all the way ! No need buying new SSDs ! Spin up a new cluster anytime for tests / upgrades ! In the meantime, we already upgraded to Aurora Postgres 15 without any problems (16 pending) ! It's really just Postgres, we did not change a single query

Slide 20

Slide 20 text

The learnings ! Listen to sales managers, but don't always blindly trust them ! However, sometimes you need to aim for a moonshot ! Be careful with Aurora Pricing! ! "IO Optimized" are helpful for saving money ! Boy, this thing is expensive! ! Way higher costs " ! Aurora RDS Proxy always lags in Postgres Version support ! You should turn off AWS instances when they are not needed

Slide 21

Slide 21 text

Would we do it again? YES, immediately!

Slide 22

Slide 22 text

Ask me anything That's it! Follow on LinkedIn ! , Follow on Bluesky: @jol.li