Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elephants in the Cloud

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for Mike Fowler Mike Fowler
October 26, 2017

Elephants in the Cloud

Managing your own PostgreSQL servers is sometimes a burden your business does not want. In this talk we will provide an overview of some of the public cloud offerings available for hosted PostgreSQL and discuss a number of strategies for migrating your databases with a minimum of downtime.

Avatar for Mike Fowler

Mike Fowler

October 26, 2017

More Decks by Mike Fowler

Other Decks in Technology

Transcript

  1. • Senior Site Reliability Engineer in the Public Cloud Practice

    of claranet • Background in Software Engineering, Systems Engineering, System & Database Administration • Been using PostgreSQL since 7.4 • Contributed to YAWL, PostgreSQL & Terraform open source projects – XMLEXISTS/xpath_exists() – xml_is_well_formed() • Regular speaker at PGDay UK About Me
  2. • Hosted PostgreSQL • Overview of public cloud hosting options

    • Database migration strategies Overview
  3. • Your database somewhere else • A managed service –

    Some providers offer full DBA support – Cloud providers give only the infrastructure • Typically provisioned through an API or GUI – i.e. a self-service environment What is hosted PostgreSQL?
  4. • Reduces adoption costs • Installation & configuration is already

    done – Generally sane defaults, some tuning often required • Needn’t worry about physical servers • Opex instead of Capex • Most routine DBA tasks are done for you • Easier to grow Benefits of Hosted PostgreSQL
  5. • Less control • Latency • Some features are disabled

    • Migrating existing databases is hard • Potential for vendor lock-in • Resource limits Drawbacks of Hosted PostgreSQL
  6. • Automatic scaling • It will fix your slow queries/reports/application

    • You don’t need a DBA – They need to be planning for scale – They need to help address slowness Fallacies of Hosted PostgreSQL
  7. • We’ll look only at Public Cloud offerings • Current

    major offerings – Amazon Relation Database Service (RDS) – Amazon Aurora – Heroku • Future major offerings – Google Cloud SQL – Microsoft Azure Hosting Options
  8. • PostgreSQL 9.3.12 – 9.6.3 supported • Numerous instance types

    – Costs range from $0.018 to $7.96 per hour – Select from 1 vCPU up to 64 vCPUs, all 64-bit – Memory ranges from 1GB to 488GB • Flexible storage options – Choose between SSD or Provisioned IOPS – Up to 6TB with up to 30,000 IOPS Amazon RDS
  9. • High availability multi-availability zone option – Synchronous replica –

    Automatic failover (~2 minutes) • Up to 5 read-only replicas (asynchronous replication) • Configurable automatic backups with PITR • Monthly uptime percentage of 99.95 per instance – Allows for approximately 22 minutes downtime Amazon RDS
  10. • Became generally available yesterday (2017-10-25) • Currently available in

    3 US regions and Ireland • 6 instance types costing from $0.29 to $9.28 per hour • Compatible with PostgreSQL 9.6 • Up to 2x throughput of conventional PostgreSQL • Up to 16 read replicas with sub-10ms replica lag • Auto-growing filesystem up to 64TB – Filesystem is shared between 3 availability zones Amazon Aurora
  11. • Supports PostgreSQL 9.3, 9.4, 9.5 & 9.6 (10 in

    beta) • Simpler pricing based on choice of tier ($0-8.5k pcm) • Tier dictates resource limits – Maximum number of rows (Hobby only) – Cache size (1GB - 240GB) – Storage limit (64GB - 1TB) – Connection limit (120 - 500) – Rollback (4 days – 1 week) Heroku
  12. • Fork & Follow • Some of your data may

    end up in the US – Logs (can be blocked at creation time) – Snapshots & Dataclips • Not possible to replicate out – No permission for Bucardo, Londiste & Slony – Remote slave is prohibited – Only way is dump & restore Heroku
  13. • Currently in Beta (no SLA) • Only supports PostgreSQL

    9.6 • No replication support • Posed to be a serious rival to RDS – Billing per minute – Automatic scaling of filesystem – Similar variety of instance types Google Cloud SQL
  14. • Currently in preview (no SLA) • Supports PostgreSQL 9.5

    & 9.6 • Replication is seamless – Automated failover – PITR • Selectable compute units – Roughly equates to maximum connections Microsoft Azure
  15. • At Rest (AES-256) – Enabled by default for Google

    & Azure – User enabled at create for Amazon (RDS/Aurora) – Only available in premium plans for Heroku • In transit (SSL) – Enforced on Heroku & Azure – Can be enforced on Google & Amazon • pgcrypto is available on all platforms Encryption
  16. • Providers dictate the availability of extensions – Well documented

    except for Aurora • Only step required is to run CREATE EXTENSION • Many extensions are widely available – pgcrypto – PostGIS – pg_stat_statements – hstore – plpgsql Extensions
  17. • Dump & Restore • Replication failover • Amazon’s Database

    Migration Service • PITR + Logical decoding Migration Strategies
  18. • Simplest strategy – Perceived as low risk for data

    loss – Less “moving parts” • Just a pg_dump & pg_restore • Downtime is function of database size Dump & Restore
  19. • Move historic data ahead of time – Opportunity to

    clear out unused data – Consider introducing partitions • Consider moving the dump closer to the target – e.g. Upload to EC2 instance in the same region as the RDS instance and run pg_restore from there • Over provision resources – Gives higher throughput during data load – Downscale once operational Strategies to Minimise Downtime
  20. • No one supports external masters! • Trigger based replication

    failover – Slony, Londiste & Bucardo • Can be used on most any version of PostgreSQL • Some restrictions apply – DDL is not supported – Rows must be uniquely identifiable Replication Failover
  21. • Presents some risk to production environment – Initial overhead

    of replicating each table • Gradually add tables to the configuration to spread the load – Per-transaction overhead • Write latency to remote slave • Heavy write workload could lead to high replication lag • This also works to replicate out of RDS but not Heroku Replication Failover
  22. • Web service supporting migration to/from AWS – Can be

    RDS or an EC2 instance – External DB can be anywhere • Perform a one time load or continual load – Change Data Capture • “Supports” heterogeneous migrations – Oracle, MySQL, SQL Server, MongoDB – Schema Conversion Tool exists to assist AWS Database Migration Service
  23. • PostgreSQL as a source database – User only needs

    to see all data • PostgreSQL as a streaming source database – Tables must have a primary key – Requires 9.4 with wal_level = logical – Must have room for a new replication slot – User must be a SUPERUSER AWS Database Migration Service
  24. • PostgreSQL as a target database – Import your schema

    using pg_dump – session_replication_role needs to be set to replica – Set target table preparation mode to DO_NOTHING • There are drawbacks of using DMS – Replication lag – Process is fiddly AWS Database Migration Service
  25. • Most involved approach, least downtime • Combines point-in-time recovery

    with the changes captured by logical decoding to create a replica • Need to be running at least PostgreSQL 9.4 with WAL level logical and have WAL archiving configured • DDL not supported, still need unique rows • Recommend barman for managing WAL http://www.pgbarman.org/ • Recommend decoder_raw as logical decoding plugin github.com/michaelpq/pg_plugins/tree/master/decoder_raw PITR & Logical decoding
  26. 1. Create a logical replication slot SELECT * FROM pg_create_logical_replication_slot

    ('logical_slot', 'decoder_raw'); 2. Note the transaction ID (catalog_xmin) SELECT catalog_xmin FROM pg_replication_slots WHERE slot_name = ‘logical_slot’; PITR & Logical decoding
  27. 3. Perform a barman backup $ barman backup master 4.

    Perform a barman PITR $ barman recover –target-xid (catalog_xmin - 1) master latest 5. Start database and verify correct recovery PITR & Logical decoding
  28. 5. Perform pg_dump on the readonly barman node 6. Restore

    to public cloud 7. Read output of logical decoding and write to cloud PITR & Logical decoding
  29. • Hosted PostgreSQL gives you high performance PostgreSQL without the

    hassle of hardware, maintenance and configuration • Opex instead of Capex • Consider the limitations of your intended platform • There are multiple options for migration Summary
  30. Questions? Please remember to leave feedback about my talk, other

    talks and the conference in general at https://2017.pgconf.eu/f Mike Fowler gh-mlfowler mlfowler mike dot fowler at claranet dot uk