Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OSCON 2013 talk on AWS and MongoDB

OSCON 2013 talk on AWS and MongoDB

Charity Majors

July 26, 2013
Tweet

More Decks by Charity Majors

Other Decks in Programming

Transcript

  1. replica sets • Odd number of votes • Distribute across

    AZs • More votes are better than fewer • Use arbiters for extra votes Friday, July 26, 13
  2. arbiters • Mongod processes that do nothing but vote •

    Highly reliable • Lightweight; you can run many arbiters on a single node Friday, July 26, 13
  3. EBS snapshots • Set priority = 0 • Set hidden

    = 1 • Consider setting votes = 0 • Lock mongo or stop mongod before snapshot • Consider running continuous compaction on snapshot node Friday, July 26, 13
  4. other backup options • EBS snapshots • LVM snapshots •

    Mongodump • MongoDB backups as a service Friday, July 26, 13
  5. memory • Memory is your primary scaling constraint • Your

    working set should fit in to RAM • In 2.4, estimate with: • Page faults? Your working set may not fit Friday, July 26, 13
  6. SSD (hi1.4xlarge) • 8 cores • 60 gigs RAM •

    2 1-TB SSD drives • 120k random reads/sec • 85k random writes/sec • expensive! $2300/mo on demand Friday, July 26, 13
  7. PIOPS • Up to 2000 IOPS/volume • Up to 1024

    GB/volume • Variability of < 0.1% • Costs double regular EBS • Supports snapshots • RAID together multiple volumes for more storage/performance Friday, July 26, 13
  8. • multiply that by 2-3x depending on your spikiness estimating

    PIOPS • estimate how many IOPS to provision with the “tps” column of sar -d 1 Friday, July 26, 13
  9. Ephemeral Storage • Cheap • Fast • No network latency

    • No snapshot capability • Data is lost forever if you stop or resize the instance Friday, July 26, 13
  10. filesystem • Use ext4 • Raise file descriptor limits •

    Raise connection limits • Mount with noatime and nodiratime • Consider putting the journal on a separate volume Friday, July 26, 13
  11. blockdev • Your default blockdev is probably wrong • Too

    large? you will underuse memory • Too small? you will hit the disk too much • Experiment. Friday, July 26, 13
  12. infrastructure is code • Chef • Puppet • CloudFormation •

    Scripts (e.g. MongoLab’s mongoctl) Friday, July 26, 13
  13. highlights of mongo chef cookbook • Configures EBS raid for

    you • Supports PIOPS • Handles multiple clusters, sharding, arbiters • Built-in snapshot support • Provisions new nodes automagically from latest completed RAID snapshot set for cluster Friday, July 26, 13
  14. provisioning from snapshot • Fast and easy • Takes <

    5 minutes using knife-ec2 • Will not reset padding factors Friday, July 26, 13
  15. snapshot caveats: • EBS snapshot will lazily-load blocks from S3

    • run “dd” on each of the data files to pull blocks down • Always warm up a secondary before promoting • warm up both indexes and data • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • in mongodb 2.2 and above you can use the touch command: Friday, July 26, 13
  16. provisioning with initial sync • Compacts and repairs your collections

    and databases • Hard on your primary, does a full table scan of all data • On > 2.2.0 you can sync from a secondary by button- mashing rs.syncFrom() on startup • Or use iptables to block secondary from viewing primary (all versions) • Resets all padding factors to 1 Friday, July 26, 13
  17. fragmentation • Your RAM gets fragmented too! • Leads to

    underuse of memory • Deletes are not the only source of fragmentation • Repair, compact, or resync regularly • Or consider using powerof2 padding factor Friday, July 26, 13
  18. 3 ways to fix fragmentation: • Re-sync a secondary from

    scratch • resets your padding factors • hard on your primary; rs.syncFrom() a secondary • Repair a secondary • resets your padding factors • may take longer than your oplog age • Run continuous compaction on your snapshot node • won’t reset padding factors • but it also won’t reclaim disk space Friday, July 26, 13
  19. db.currentOp() • Check the queue size • Any indexes building?

    • Sort by num_seconds • Sort by num_yields, locktype • Consider adding comments to your queries • Run explain() on queries that are long-running Friday, July 26, 13
  20. mongodb.log • Configure output with --slowms • Look for high

    execution time, nscanned, nreturned • See which queries are holding long locks • Match connection ids to IPs Friday, July 26, 13
  21. system.profile collection • Enable profiling with db.setProfiling() • Does not

    persist through restarts • Like mongodb.log, but queryable • Writes to this collection incur some cost • Use db.system.profile.find() to get slow queries for a certain collection, time range, execution time, etc Friday, July 26, 13
  22. • Know what your tipping point looks like • Don’t

    elect a new primary or restart • Do kill queries before the tipping point • Write your kill script before you need it • Don’t kill internal mongo operations, only queries. ... when queries pile up ... Friday, July 26, 13
  23. can’t elect a primary? • Never run with an even

    number of votes (max 7) • You need > 50% of votes to elect a primary • Set your priority levels explicitly if you need warmup • Consider delegating voting to arbiters • Set snapshot nodes to be nonvoting if possible. • Check your mongo log. Is something vetoing? Do they have an inconsistent view of the cluster state? Friday, July 26, 13
  24. secondaries crashing? • Some rare mongo bugs will cause all

    secondaries to crash unrecoverably • Never kill oplog tailers or other internal database operations, this can also trash secondaries • Arbiters are more stable than secondaries, consider using them to form a quorum with your primary Friday, July 26, 13
  25. replication stops? • Other rare bugs will stop replication or

    cause secondaries to exit without a corrupt op • The correct way to fix this is to re-snapshot off the primary and rebuild your secondaries. • However, you can sometimes *dangerously* repair a secondary: 1. stop mongo 2. bring it back up in standalone mode 3. repair the offending collection 4. restart mongo again as part of the replica set Friday, July 26, 13
  26. Glossary of resources • Opscode AWS cookbook • https://github.com/opscode-cookbooks/aws •

    edelight MongoDB cookbook • https://github.com/edelight/chef-mongodb • Parse MongoDB cookbook fork • https://github.com/ParsePlatform/Ops/tree/primary/chef/cookbooks/ mongodb • ChefConf presentation on mongo + chef • http://www.youtube.com/watch?v=dBk5RyExsOE Friday, July 26, 13
  27. Glossary of resources • MongoLab’s mongoctl • https://github.com/mongolab/mongoctl • Cloudformation

    templates • http://docs.mongodb.org/ecosystem/tutorial/automate-deployment-with- cloudformation/#automate-deployment-with-cloudformation/ • Parse warmup scripts • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • Parse compaction scripts • http://blog.parse.com/2013/03/26/always-be-compacting/ Friday, July 26, 13