Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Life in the Fast Lane: Speeding up an AWS RDS for PostgreSQL Migration

Life in the Fast Lane: Speeding up an AWS RDS for PostgreSQL Migration

Every once in a while, you find yourself in a situation where you need to migrate data from one PostgreSQL database instance to another. Whether it’s 10 gigabytes or 1,000 gigabytes, being able to execute the dump/restore process as quickly as possible is always desirable.

In this talk, we’ll go through a handful of steps that’ll minimize the amount of time it takes to dump the contents of a local PostgreSQL database instance and restore it to Amazon RDS for PostgreSQL. First, we’ll walk through the process of exporting the contents of a local PostgreSQL instance with pg_dump. Then, we’ll go over a variety of EC2 and RDS specific tweaks. Finally, we’ll use pg_restore to load it into the target PostgreSQL RDS database as quickly and efficiently as possible.

Hector Castro

July 14, 2017
Tweet

More Decks by Hector Castro

Other Decks in Technology

Transcript

  1. NAME pg_dump - extract a PostgreSQL database into a script

    file or other archive file SYNOPSIS pg_dump [connection-option""...] [option""...] [dbname] DESCRIPTION pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers). pg_dump only dumps a single database. To backup global objects that are common to all databases in a cluster, such as roles and tablespaces, use pg_dumpall(1). Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. To restore from such a script, feed it to psql(1). Script files can be used to reconstruct the database even on other machines and other architectures; with some modifications, even on other SQL database products.
  2. NAME pg_restore - restore a PostgreSQL database from an archive

    file created by pg_dump SYNOPSIS pg_restore [connection-option""...] [option""...] [filename] DESCRIPTION pg_restore is a utility for restoring a PostgreSQL database from an archive created by pg_dump(1) in one of the non-plain-text formats. It will issue the commands necessary to reconstruct the database to the state it was in at the time it was saved. The archive files also allow pg_restore to be selective about what is restored, or even to reorder the items prior to being restored. The archive files are designed to be portable across architectures. pg_restore can operate in two modes. If a database name is specified, pg_restore connects to that database and restores archive contents directly into the database. Otherwise, a script containing the SQL commands necessary to rebuild the database is created and written to a file or standard output. This script output is equivalent to the plain text output format of pg_dump. Some of the options controlling the output are therefore analogous to pg_dump options.
  3. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  4. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  5. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstance &AllocatedStorage=64 &DBInstanceClass=db.m4.large &DBInstanceIdentifier=business-db &Engine=Postgres &MasterUserPassword=s3cr3t &MasterUsername=business &SignatureMethod=HmacSHA256 &SignatureVersion=4

    &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140424/us-east-1/rds/aws4_request &X-Amz-Date=20140424T194844Z &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=bee4aabc750bf7dad0cd9e22b952bd6089d91e2a16592c2293
  6. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  7. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstance &AllocatedStorage=64 &DBInstanceClass=db.m4.large &DBInstanceIdentifier=business-db &Engine=Postgres &MasterUserPassword=s3cr3t &MasterUsername=business &SignatureMethod=HmacSHA256 &SignatureVersion=4

    &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140424/us-east-1/rds/aws4_request &X-Amz-Date=20140424T194844Z &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=bee4aabc750bf7dad0cd9e22b952bd6089d91e2a16592c2293
  8. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) *

    db.m4.large 8 2 Moderate $0.182 db.m4.xlarge 16 4 High $0.365 db.m4.2xlarge 32 8 High $0.730 db.m4.4xlarge 64 16 High $1.461 db.m4.10xlarge 160 40 10 Gigabit $3.654 * Prices from us-east-1 as of 7/4/2016
  9. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) db.r3.large

    15.25 2 Moderate $0.250 db.r3.xlarge 30.5 4 Moderate $0.500 db.r3.2xlarge 61 8 High $0.995 db.r3.4xlarge 122 16 High $1.990 db.r3.8xlarge 244 32 10 Gigabit $3.980 * Prices from us-east-1 as of 7/4/2016
  10. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) db.t2.micro

    1 1 Low to Moderate $0.018 db.t2.small 2 1 Low to Moderate $0.036 db.t2.medium 4 2 Low to Moderate $0.073 db.t2.large 8 2 Low to Moderate $0.145 * Prices from us-east-1 as of 7/4/2016
  11. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  12. !

  13. "

  14. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  15. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  16. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  17. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  18. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  19. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  20. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  21. osm=# SELECT * FROM pg_stat_activity; -[ RECORD 1 ]----+-------------------------------- datid

    | 16385 datname | hector pid | 93232 usesysid | 16384 usename | hector application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2017-07-03 22:02:36.071841-04 xact_start | 2017-07-03 22:11:08.038188-04 query_start | 2017-07-03 22:11:08.038188-04 state_change | 2017-07-03 22:11:08.038192-04 wait_event_type | wait_event | state | active backend_xid | backend_xmin | 1871 query | SELECT * FROM pg_stat_activity; osm=#
  22. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  23. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  24. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  25. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  26. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  27. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  28. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  29. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  30. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  31. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  32. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  33. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  34. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  35. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  36. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  37. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  38. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  39. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  40. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  41. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  42. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  43. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  44. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  45. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  46. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  47. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  48. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  49. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  50. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  51. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  52. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  53. $

  54. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  55. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  56. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  57. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  58. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  59. %