OSCON 2013 talk on AWS and MongoDB

Charity Majors @mipsytipsy Friday, July 26, 13

MongoDB on AWS: Operational Best Practices Friday, July 26, 13

overview resources provisioning disaster mitigation techniques Friday, July 26, 13

replica sets Friday, July 26, 13

replica sets • Odd number of votes • Distribute across
AZs • More votes are better than fewer • Use arbiters for extra votes Friday, July 26, 13

basic replica set Friday, July 26, 13

2-node replica set with arbiter Friday, July 26, 13

arbiters • Mongod processes that do nothing but vote •
Highly reliable • Lightweight; you can run many arbiters on a single node Friday, July 26, 13

replica set with snapshot node Friday, July 26, 13

EBS snapshots • Set priority = 0 • Set hidden
= 1 • Consider setting votes = 0 • Lock mongo or stop mongod before snapshot • Consider running continuous compaction on snapshot node Friday, July 26, 13

other backup options • EBS snapshots • LVM snapshots •
Mongodump • MongoDB backups as a service Friday, July 26, 13

EC2 and disks Friday, July 26, 13

memory • Memory is your primary scaling constraint • Your
working set should ﬁt in to RAM • In 2.4, estimate with: • Page faults? Your working set may not ﬁt Friday, July 26, 13

disk options • EBS • Dedicated SSD • Provisioned IOPS
• Ephemeral Friday, July 26, 13

EBS classic EBS with PIOPS: ... just say no to
EBS Friday, July 26, 13

SSD (hi1.4xlarge) • 8 cores • 60 gigs RAM •
2 1-TB SSD drives • 120k random reads/sec • 85k random writes/sec • expensive! $2300/mo on demand Friday, July 26, 13

PIOPS • Up to 2000 IOPS/volume • Up to 1024
GB/volume • Variability of < 0.1% • Costs double regular EBS • Supports snapshots • RAID together multiple volumes for more storage/performance Friday, July 26, 13

• multiply that by 2-3x depending on your spikiness estimating
PIOPS • estimate how many IOPS to provision with the “tps” column of sar -d 1 Friday, July 26, 13

Ephemeral Storage • Cheap • Fast • No network latency
• No snapshot capability • Data is lost forever if you stop or resize the instance Friday, July 26, 13

ﬁlesystem • Use ext4 • Raise ﬁle descriptor limits •
Raise connection limits • Mount with noatime and nodiratime • Consider putting the journal on a separate volume Friday, July 26, 13

blockdev • Your default blockdev is probably wrong • Too
large? you will underuse memory • Too small? you will hit the disk too much • Experiment. Friday, July 26, 13

provisioning Friday, July 26, 13

infrastructure is code • Chef • Puppet • CloudFormation •
Scripts (e.g. MongoLab’s mongoctl) Friday, July 26, 13

highlights of mongo chef cookbook • Conﬁgures EBS raid for
you • Supports PIOPS • Handles multiple clusters, sharding, arbiters • Built-in snapshot support • Provisions new nodes automagically from latest completed RAID snapshot set for cluster Friday, July 26, 13

provisioning from snapshot • Fast and easy • Takes <
5 minutes using knife-ec2 • Will not reset padding factors Friday, July 26, 13

snapshot caveats: • EBS snapshot will lazily-load blocks from S3
• run “dd” on each of the data ﬁles to pull blocks down • Always warm up a secondary before promoting • warm up both indexes and data • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • in mongodb 2.2 and above you can use the touch command: Friday, July 26, 13

provisioning with initial sync • Compacts and repairs your collections
and databases • Hard on your primary, does a full table scan of all data • On > 2.2.0 you can sync from a secondary by button- mashing rs.syncFrom() on startup • Or use iptables to block secondary from viewing primary (all versions) • Resets all padding factors to 1 Friday, July 26, 13

fragmentation is terrible Friday, July 26, 13

fragmentation • Your RAM gets fragmented too! • Leads to
underuse of memory • Deletes are not the only source of fragmentation • Repair, compact, or resync regularly • Or consider using powerof2 padding factor Friday, July 26, 13

3 ways to ﬁx fragmentation: • Re-sync a secondary from
scratch • resets your padding factors • hard on your primary; rs.syncFrom() a secondary • Repair a secondary • resets your padding factors • may take longer than your oplog age • Run continuous compaction on your snapshot node • won’t reset padding factors • but it also won’t reclaim disk space Friday, July 26, 13

query proﬁling Friday, July 26, 13

Finding bad queries • db.currentOp() • mongodb.log • proﬁling collection
Friday, July 26, 13

db.currentOp() • Check the queue size • Any indexes building?
• Sort by num_seconds • Sort by num_yields, locktype • Consider adding comments to your queries • Run explain() on queries that are long-running Friday, July 26, 13

mongodb.log • Conﬁgure output with --slowms • Look for high
execution time, nscanned, nreturned • See which queries are holding long locks • Match connection ids to IPs Friday, July 26, 13

system.profile collection • Enable profiling with db.setProfiling() • Does not
persist through restarts • Like mongodb.log, but queryable • Writes to this collection incur some cost • Use db.system.profile.find() to get slow queries for a certain collection, time range, execution time, etc Friday, July 26, 13

failure scenarios. Friday, July 26, 13

• Know what your tipping point looks like • Don’t
elect a new primary or restart • Do kill queries before the tipping point • Write your kill script before you need it • Don’t kill internal mongo operations, only queries. ... when queries pile up ... Friday, July 26, 13

can’t elect a primary? • Never run with an even
number of votes (max 7) • You need > 50% of votes to elect a primary • Set your priority levels explicitly if you need warmup • Consider delegating voting to arbiters • Set snapshot nodes to be nonvoting if possible. • Check your mongo log. Is something vetoing? Do they have an inconsistent view of the cluster state? Friday, July 26, 13

secondaries crashing? • Some rare mongo bugs will cause all
secondaries to crash unrecoverably • Never kill oplog tailers or other internal database operations, this can also trash secondaries • Arbiters are more stable than secondaries, consider using them to form a quorum with your primary Friday, July 26, 13

replication stops? • Other rare bugs will stop replication or
cause secondaries to exit without a corrupt op • The correct way to ﬁx this is to re-snapshot off the primary and rebuild your secondaries. • However, you can sometimes *dangerously* repair a secondary: 1. stop mongo 2. bring it back up in standalone mode 3. repair the offending collection 4. restart mongo again as part of the replica set Friday, July 26, 13

Glossary of resources • Opscode AWS cookbook • https://github.com/opscode-cookbooks/aws •
edelight MongoDB cookbook • https://github.com/edelight/chef-mongodb • Parse MongoDB cookbook fork • https://github.com/ParsePlatform/Ops/tree/primary/chef/cookbooks/ mongodb • ChefConf presentation on mongo + chef • http://www.youtube.com/watch?v=dBk5RyExsOE Friday, July 26, 13

Glossary of resources • MongoLab’s mongoctl • https://github.com/mongolab/mongoctl • Cloudformation
templates • http://docs.mongodb.org/ecosystem/tutorial/automate-deployment-with- cloudformation/#automate-deployment-with-cloudformation/ • Parse warmup scripts • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • Parse compaction scripts • http://blog.parse.com/2013/03/26/always-be-compacting/ Friday, July 26, 13

Charity Majors @mipsytipsy Friday, July 26, 13

OSCON 2013 talk on AWS and MongoDB

OSCON 2013 talk on AWS and MongoDB

More Decks by Charity Majors

Other Decks in Programming

Featured

Transcript