Life in the Fast Lane: Speeding up an AWS RDS for PostgreSQL Migration

Life in the Fast Lane: Speeding up an AWS RDS for PostgreSQL Migration

Every once in a while, you find yourself in a situation where you need to migrate data from one PostgreSQL database instance to another. Whether it’s 10 gigabytes or 1,000 gigabytes, being able to execute the dump/restore process as quickly as possible is always desirable.

In this talk, we’ll go through a handful of steps that’ll minimize the amount of time it takes to dump the contents of a local PostgreSQL database instance and restore it to Amazon RDS for PostgreSQL. First, we’ll walk through the process of exporting the contents of a local PostgreSQL instance with pg_dump. Then, we’ll go over a variety of EC2 and RDS specific tweaks. Finally, we’ll use pg_restore to load it into the target PostgreSQL RDS database as quickly and efficiently as possible.

B32443719f266e1da10dc301688642b4?s=128

Hector Castro

July 14, 2017
Tweet

Transcript

  1. Life in the Fast Lane Speeding up an AWS RDS

    for PostgreSQL Migra8on
  2. Azavea builds mobile and web geospa'al data applica4ons.

  3. 2001 Founded 69 Employees Philly Headquartered 4 Products 10% Research

    Time B Corpora4on
  4. Hector Castro (@hectcastro), VP of Engineering

  5. None
  6. Amazon Database Migra8on Service

  7. Amazon Database Migra8on Service

  8. NAME pg_dump - extract a PostgreSQL database into a script

    file or other archive file SYNOPSIS pg_dump [connection-option""...] [option""...] [dbname] DESCRIPTION pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers). pg_dump only dumps a single database. To backup global objects that are common to all databases in a cluster, such as roles and tablespaces, use pg_dumpall(1). Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. To restore from such a script, feed it to psql(1). Script files can be used to reconstruct the database even on other machines and other architectures; with some modifications, even on other SQL database products.
  9. NAME pg_restore - restore a PostgreSQL database from an archive

    file created by pg_dump SYNOPSIS pg_restore [connection-option""...] [option""...] [filename] DESCRIPTION pg_restore is a utility for restoring a PostgreSQL database from an archive created by pg_dump(1) in one of the non-plain-text formats. It will issue the commands necessary to reconstruct the database to the state it was in at the time it was saved. The archive files also allow pg_restore to be selective about what is restored, or even to reorder the items prior to being restored. The archive files are designed to be portable across architectures. pg_restore can operate in two modes. If a database name is specified, pg_restore connects to that database and restores archive contents directly into the database. Otherwise, a script containing the SQL commands necessary to rebuild the database is created and written to a file or standard output. This script output is equivalent to the plain text output format of pg_dump. Some of the options controlling the output are therefore analogous to pg_dump options.
  10. None
  11. Amazon RDS

  12. Amazon Rela8onal Database Service

  13. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  14. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  15. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstance &AllocatedStorage=64 &DBInstanceClass=db.m4.large &DBInstanceIdentifier=business-db &Engine=Postgres &MasterUserPassword=s3cr3t &MasterUsername=business &SignatureMethod=HmacSHA256 &SignatureVersion=4

    &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140424/us-east-1/rds/aws4_request &X-Amz-Date=20140424T194844Z &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=bee4aabc750bf7dad0cd9e22b952bd6089d91e2a16592c2293
  16. None
  17. None
  18. None
  19. None
  20. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  21. None
  22. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstance &AllocatedStorage=64 &DBInstanceClass=db.m4.large &DBInstanceIdentifier=business-db &Engine=Postgres &MasterUserPassword=s3cr3t &MasterUsername=business &SignatureMethod=HmacSHA256 &SignatureVersion=4

    &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140424/us-east-1/rds/aws4_request &X-Amz-Date=20140424T194844Z &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=bee4aabc750bf7dad0cd9e22b952bd6089d91e2a16592c2293
  23. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) *

    db.m4.large 8 2 Moderate $0.182 db.m4.xlarge 16 4 High $0.365 db.m4.2xlarge 32 8 High $0.730 db.m4.4xlarge 64 16 High $1.461 db.m4.10xlarge 160 40 10 Gigabit $3.654 * Prices from us-east-1 as of 7/4/2016
  24. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) db.r3.large

    15.25 2 Moderate $0.250 db.r3.xlarge 30.5 4 Moderate $0.500 db.r3.2xlarge 61 8 High $0.995 db.r3.4xlarge 122 16 High $1.990 db.r3.8xlarge 244 32 10 Gigabit $3.980 * Prices from us-east-1 as of 7/4/2016
  25. Instance Type Memory (GB) vCPU Network Performance Price (Hourly) db.t2.micro

    1 1 Low to Moderate $0.018 db.t2.small 2 1 Low to Moderate $0.036 db.t2.medium 4 2 Low to Moderate $0.073 db.t2.large 8 2 Low to Moderate $0.145 * Prices from us-east-1 as of 7/4/2016
  26. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstanceReadReplica &DBInstanceIdentifier=business-db-replica &SignatureMethod=HmacSHA256 &SignatureVersion=4 &SourceDBInstanceIdentifier=business-db &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140425/us-east-1/rds/aws4_request &X-Amz-Date=20140425T170525Z

    &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=a5bc7bb9648272e9967c76fc582b308d3ee37d6c4f7a4eb62c
  27. https:!//rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstanceReadReplica &DBInstanceIdentifier=business-db-replica &SignatureMethod=HmacSHA256 &SignatureVersion=4 &SourceDBInstanceIdentifier=business-db &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140425/us-east-1/rds/aws4_request &X-Amz-Date=20140425T170525Z

    &X-Amz-SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date &X-Amz-Signature=a5bc7bb9648272e9967c76fc582b308d3ee37d6c4f7a4eb62c
  28. –Amazon Web Services Amazon Rela4onal Database Service (Amazon RDS) is

    a web service that makes it easier to set up, operate, and scale a rela4onal database in the cloud. It provides cost-efficient, resizable capacity for an industry- standard rela4onal database and manages common database administra4on tasks.
  29. None
  30. None
  31. None
  32. None
  33. !

  34. RDS Parameter Groups

  35. hKps:/ /www.postgresql.org/docs/current/sta8c/populate.html

  36. hKps:/ /docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Impor8ng.html

  37. "

  38. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  39. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  40. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  41. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  42. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  43. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  44. Parameter Value Unit maintenance_work_mem 524288-4194304 Kilobytes max_wal_size 256 16 Megabytes

    checkpoint_timeout 1800 Seconds synchronous_commit off N/A wal_buffers 8192 8 Kilobytes autovacuum 0 N/A
  45. None
  46. None
  47. $ ssh hector@current-database

  48. osm=# SELECT * FROM pg_stat_activity; -[ RECORD 1 ]----+-------------------------------- datid

    | 16385 datname | hector pid | 93232 usesysid | 16384 usename | hector application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2017-07-03 22:02:36.071841-04 xact_start | 2017-07-03 22:11:08.038188-04 query_start | 2017-07-03 22:11:08.038188-04 state_change | 2017-07-03 22:11:08.038192-04 wait_event_type | wait_event | state | active backend_xid | backend_xmin | 1871 query | SELECT * FROM pg_stat_activity; osm=#
  49. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  50. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  51. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  52. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  53. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  54. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  55. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  56. $ pg_dump -Fc -h localhost -U hector -v -f osm.dump

    osm 2>&1 | tee dump.log
  57. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  58. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  59. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  60. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  61. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  62. $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set

    default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp osm.dump s3:!//business-db/osm.dump
  63. None
  64. None
  65. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  66. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  67. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  68. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  69. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  70. ✓At least 10 Gigabit network performance ✓At least enough instance

    storage for database dump ✓Launch in same availability zone as target database ✓Use spot market, if you’re feeling lucky ✓If within private subnet, enable S3 VPC endpoint
  71. $ ssh ec2-user@beefy-ec2-instance

  72. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  73. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  74. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  75. $ sudo mkfs.ext4 /dev/xvdb $ sudo mount /dev/xvdb /mnt $

    sudo chown -R ec2-user:ec2-user /mnt/ $ aws configure set default.s3.max_concurrent_requests 16 $ aws configure set default.s3.max_queue_size 10000 $ aws configure set default.s3.multipart_threshold 64MB $ aws configure set default.s3.multipart_chunksize 16MB $ aws s3 cp s3:!//business-db/osm.dump /mnt/osm.dump
  76. None
  77. None
  78. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  79. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  80. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  81. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  82. ✓At least 10 Gigabit network performance ✓Set parameter group tuned

    for imports ✓Disable mutli-AZ feature ✓Launch in same availability zone as restore instance
  83. $ hostname beefy-ec2-instance

  84. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  85. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  86. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  87. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  88. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  89. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  90. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  91. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  92. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  93. $ pg_restore -C -j16 -Fc -h …rds.amazonaws.com -U hector -v

    \ -d postgres /mnt/osm.dump 2>&1 | tee restore.log
  94. None
  95. $

  96. osm=# VACUUM (VERBOSE, ANALYZE);

  97. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  98. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  99. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  100. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  101. ✓Set parameter group back to normal ✓Enable mutli-AZ feature ✓Scale

    back down to desired instance type ✓Keep an eye on metrics once traffic starts back up
  102. None
  103. %

  104. Thank You