Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OSCON 2013 talk on AWS and MongoDB

OSCON 2013 talk on AWS and MongoDB

Charity Majors

July 26, 2013
Tweet

More Decks by Charity Majors

Other Decks in Programming

Transcript

  1. Charity Majors
    @mipsytipsy
    Friday, July 26, 13

    View Slide

  2. MongoDB on AWS:
    Operational Best Practices
    Friday, July 26, 13

    View Slide

  3. overview
    resources
    provisioning
    disaster
    mitigation
    techniques
    Friday, July 26, 13

    View Slide

  4. replica sets
    Friday, July 26, 13

    View Slide

  5. replica sets
    • Odd number of votes
    • Distribute across AZs
    • More votes are better than fewer
    • Use arbiters for extra votes
    Friday, July 26, 13

    View Slide

  6. basic replica set
    Friday, July 26, 13

    View Slide

  7. 2-node replica set
    with arbiter
    Friday, July 26, 13

    View Slide

  8. arbiters
    • Mongod processes that do nothing but vote
    • Highly reliable
    • Lightweight; you can run many arbiters on a single
    node
    Friday, July 26, 13

    View Slide

  9. replica set
    with snapshot node
    Friday, July 26, 13

    View Slide

  10. EBS snapshots
    • Set priority = 0
    • Set hidden = 1
    • Consider setting votes = 0
    • Lock mongo or stop mongod before snapshot
    • Consider running continuous compaction on
    snapshot node
    Friday, July 26, 13

    View Slide

  11. other backup options
    • EBS snapshots
    • LVM snapshots
    • Mongodump
    • MongoDB backups as a service
    Friday, July 26, 13

    View Slide

  12. EC2 and disks
    Friday, July 26, 13

    View Slide

  13. memory
    • Memory is your primary scaling constraint
    • Your working set should fit in to RAM
    • In 2.4, estimate with:
    • Page faults? Your working set may not fit
    Friday, July 26, 13

    View Slide

  14. disk options
    • EBS
    • Dedicated SSD
    • Provisioned IOPS
    • Ephemeral
    Friday, July 26, 13

    View Slide

  15. EBS classic
    EBS with
    PIOPS:
    ... just say no to EBS
    Friday, July 26, 13

    View Slide

  16. SSD
    (hi1.4xlarge)
    • 8 cores
    • 60 gigs RAM
    • 2 1-TB SSD drives
    • 120k random reads/sec
    • 85k random writes/sec
    • expensive! $2300/mo on demand
    Friday, July 26, 13

    View Slide

  17. PIOPS
    • Up to 2000 IOPS/volume
    • Up to 1024 GB/volume
    • Variability of < 0.1%
    • Costs double regular EBS
    • Supports snapshots
    • RAID together multiple volumes
    for more storage/performance
    Friday, July 26, 13

    View Slide

  18. • multiply that by 2-3x depending on your spikiness
    estimating PIOPS
    • estimate how many IOPS to provision with the “tps”
    column of sar -d 1
    Friday, July 26, 13

    View Slide

  19. Ephemeral
    Storage
    • Cheap
    • Fast
    • No network latency
    • No snapshot capability
    • Data is lost forever if you stop or
    resize the instance
    Friday, July 26, 13

    View Slide

  20. filesystem
    • Use ext4
    • Raise file descriptor limits
    • Raise connection limits
    • Mount with noatime and nodiratime
    • Consider putting the journal on a separate volume
    Friday, July 26, 13

    View Slide

  21. blockdev
    • Your default blockdev is probably wrong
    • Too large? you will underuse memory
    • Too small? you will hit the disk too much
    • Experiment.
    Friday, July 26, 13

    View Slide

  22. provisioning
    Friday, July 26, 13

    View Slide

  23. infrastructure is code
    • Chef
    • Puppet
    • CloudFormation
    • Scripts (e.g. MongoLab’s
    mongoctl)
    Friday, July 26, 13

    View Slide

  24. highlights of mongo chef cookbook
    • Configures EBS raid for you
    • Supports PIOPS
    • Handles multiple clusters, sharding, arbiters
    • Built-in snapshot support
    • Provisions new nodes automagically from latest
    completed RAID snapshot set for cluster
    Friday, July 26, 13

    View Slide

  25. provisioning from snapshot
    • Fast and easy
    • Takes < 5 minutes using knife-ec2
    • Will not reset padding factors
    Friday, July 26, 13

    View Slide

  26. snapshot caveats:
    • EBS snapshot will lazily-load blocks from S3
    • run “dd” on each of the data files to pull blocks down
    • Always warm up a secondary before promoting
    • warm up both indexes and data
    • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
    • in mongodb 2.2 and above you can use the touch command:
    Friday, July 26, 13

    View Slide

  27. provisioning with initial sync
    • Compacts and repairs your collections and databases
    • Hard on your primary, does a full table scan of all data
    • On > 2.2.0 you can sync from a secondary by button-
    mashing rs.syncFrom() on startup
    • Or use iptables to block secondary from viewing
    primary (all versions)
    • Resets all padding factors to 1
    Friday, July 26, 13

    View Slide

  28. fragmentation is terrible
    Friday, July 26, 13

    View Slide

  29. fragmentation
    • Your RAM gets fragmented too!
    • Leads to underuse of memory
    • Deletes are not the only source of fragmentation
    • Repair, compact, or resync regularly
    • Or consider using powerof2 padding factor
    Friday, July 26, 13

    View Slide

  30. 3 ways to fix fragmentation:
    • Re-sync a secondary from scratch
    • resets your padding factors
    • hard on your primary; rs.syncFrom() a secondary
    • Repair a secondary
    • resets your padding factors
    • may take longer than your oplog age
    • Run continuous compaction on your snapshot
    node
    • won’t reset padding factors
    • but it also won’t reclaim disk space
    Friday, July 26, 13

    View Slide

  31. query profiling
    Friday, July 26, 13

    View Slide

  32. Finding bad queries
    • db.currentOp()
    • mongodb.log
    • profiling collection
    Friday, July 26, 13

    View Slide

  33. db.currentOp()
    • Check the queue size
    • Any indexes building?
    • Sort by num_seconds
    • Sort by num_yields, locktype
    • Consider adding comments to your queries
    • Run explain() on queries that are long-running
    Friday, July 26, 13

    View Slide

  34. mongodb.log
    • Configure output with --slowms
    • Look for high execution time, nscanned, nreturned
    • See which queries are holding long locks
    • Match connection ids to IPs
    Friday, July 26, 13

    View Slide

  35. system.profile collection
    • Enable profiling with db.setProfiling()
    • Does not persist through restarts
    • Like mongodb.log, but queryable
    • Writes to this collection incur some cost
    • Use db.system.profile.find() to get slow queries for
    a certain collection, time range, execution time, etc
    Friday, July 26, 13

    View Slide

  36. failure scenarios.
    Friday, July 26, 13

    View Slide

  37. • Know what your tipping point looks like
    • Don’t elect a new primary or restart
    • Do kill queries before the tipping point
    • Write your kill script before you need it
    • Don’t kill internal mongo operations, only queries.
    ... when queries pile up ...
    Friday, July 26, 13

    View Slide

  38. can’t elect a primary?
    • Never run with an even number of votes (max 7)
    • You need > 50% of votes to elect a primary
    • Set your priority levels explicitly if you need
    warmup
    • Consider delegating voting to arbiters
    • Set snapshot nodes to be nonvoting if possible.
    • Check your mongo log. Is something vetoing? Do
    they have an inconsistent view of the cluster state?
    Friday, July 26, 13

    View Slide

  39. secondaries crashing?
    • Some rare mongo bugs will cause all secondaries
    to crash unrecoverably
    • Never kill oplog tailers or other internal database
    operations, this can also trash secondaries
    • Arbiters are more stable than secondaries,
    consider using them to form a quorum with your
    primary
    Friday, July 26, 13

    View Slide

  40. replication stops?
    • Other rare bugs will stop replication or cause
    secondaries to exit without a corrupt op
    • The correct way to fix this is to re-snapshot off
    the primary and rebuild your secondaries.
    • However, you can sometimes *dangerously* repair
    a secondary:
    1. stop mongo
    2. bring it back up in standalone mode
    3. repair the offending collection
    4. restart mongo again as part of the replica set
    Friday, July 26, 13

    View Slide

  41. Glossary of resources
    • Opscode AWS cookbook
    • https://github.com/opscode-cookbooks/aws
    • edelight MongoDB cookbook
    • https://github.com/edelight/chef-mongodb
    • Parse MongoDB cookbook fork
    • https://github.com/ParsePlatform/Ops/tree/primary/chef/cookbooks/
    mongodb
    • ChefConf presentation on mongo + chef
    • http://www.youtube.com/watch?v=dBk5RyExsOE
    Friday, July 26, 13

    View Slide

  42. Glossary of resources
    • MongoLab’s mongoctl
    • https://github.com/mongolab/mongoctl
    • Cloudformation templates
    • http://docs.mongodb.org/ecosystem/tutorial/automate-deployment-with-
    cloudformation/#automate-deployment-with-cloudformation/
    • Parse warmup scripts
    • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
    • Parse compaction scripts
    • http://blog.parse.com/2013/03/26/always-be-compacting/
    Friday, July 26, 13

    View Slide

  43. Charity Majors
    @mipsytipsy
    Friday, July 26, 13

    View Slide