Upgrade to Pro — share decks privately, control downloads, hide ads and more …

chefconf 2013, managing multiple mongodb clusters with chef

chefconf 2013, managing multiple mongodb clusters with chef

Charity Majors

April 26, 2013
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. • Replica sets • Multiple clusters • Sharding • Arbiters

    • Provisioning Managing multiple MongoDB clusters with Chef • Snapshots • Fragmentation • Monitoring • State of the mongo cookbook • Glossary of tools Friday, April 26, 13
  2. Basic replica set This configuration can survive any single node

    or Availability Zone outage Friday, April 26, 13
  3. How do I chef that? • Grab the mongodb and

    aws cookbooks • Make a wrapper cookbook (mongodb_parse) with a recipe to construct our default replica set. Friday, April 26, 13
  4. Now make a role for your cluster that uses your

    default replica set recipe. ... and launch some nodes. Friday, April 26, 13
  5. Initiate the replica set in the mongo shell, And you’re

    set! You now have one empty mongo cluster with no data. Friday, April 26, 13
  6. Ok. But what if I have data? Backup options: •

    snapshot • mongodump Provisioning options: • snapshot • secondary sync • mongorestore (in theory) Snapshots are great and you should use them. Friday, April 26, 13
  7. Provisioning with initial sync • Compacts and repairs your collections

    and databases • Hard on your primary, does a full table scan of all data • On > 2.2.0 you can sync from a secondary by button-mashing rs.syncFrom(“host:port”) on startup • Or use iptables to block secondary from viewing primary (all versions) • Not riskless, it resets the padding factor for all collections to 1 Friday, April 26, 13
  8. Configuring RAID for EBS • Decide how many volumes to

    RAID (this is hard to change later!) • Include mongodb::raid_data in your replica set recipe • Set a role attribute for mongodb[:vols] to indicate how many volumes to RAID • Set a role attribute for mongodb[:volsize] to indicate the size of the volumes • Set the mongodb[:use_piops] attribute if you want to use PIOPS volumes Friday, April 26, 13
  9. Adding snapshots • Pick a node to be your snapshot

    host • Specify that node as the mongodb[:backup_host] in your cluster role • Get the volume ids that are mounted on /var/lib/mongodb. Add them to your cluster role attributes as backups[:mongo_volumes]. Friday, April 26, 13
  10. mongodb::backups How it works: • Installs a cron job on

    the backup host • Cron job does an ec2-consistent-snapshot of the volumes specified in backups[:mongo_volumes] • Locks mongo during snapshot • Tags a “daily” snapshot once a day, so you can easily prune hourly snapshot sets while keeping raid array sets coherent Friday, April 26, 13
  11. Provisioning nodes from snapshot How it works • checks to

    see if mongo $dbpath is mounted • if not, grab the latest completed set of snapshots for the volumes in backup[:mongo_volumes] • provision and attach a new volume from each snapshot • assemble the RAID array, mount on $dbpath To reprovision, just delete the aws attributes from the node, detach the old volumes, and re-run chef-client. Friday, April 26, 13
  12. • different types of data (e.g. separate application data from

    analytics) • different performance characteristics or hardware requirements • collection-level sharding isn’t appropriate, collections need to stay locally intact • staging or test clusters • remove as much as possible from the critical path Why use multiple clusters instead of just sharding? Multiple MongoDB clusters Friday, April 26, 13
  13. Multiple clusters? Easy! Just make a role for each cluster,

    with a distinct cluster name, backup host, and backup volumes. Friday, April 26, 13
  14. What if you need to shard? Make a new recipe

    in your wrapper cookbook that includes mongodb::shard, and use that recipe in the cluster’s role. Set a shard name per role. Friday, April 26, 13
  15. Arbiters • Arbiters are mongod processes that don’t do anything

    but vote. • They are awesome. They give you more flexibility and reliability. • To provision an arbiter, use the LWRP. • If you have lots of clusters, you may want to run arbiters for multiple clusters on each arbiter host. • Arbiters tend to be more reliable than nodes because they have less to do. Friday, April 26, 13
  16. Managing votes with arbiters • Three arbiter processes on each

    arbiter node, one arbiter per cluster • You can have a maximum of seven votes per replica set • Now you can survive all secondaries dying, or an AZ outage • If you have even one healthy node per RS, you can continue to serve traffic Friday, April 26, 13
  17. More about snapshots • Snapshot often • Set snapshot node

    to priority = 0, hidden = 1 • Lock Mongo or stop mongod during snapshot • Always warm up a snapshot before promoting • in mongodb 2.4 you can use db.touch() • warm up both indexes and data, use dd on data files to pull from S3 • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • Run continuous compaction on your snapshot node Friday, April 26, 13
  18. Fragmentation • Your RAM gets fragmented too! • Leads to

    underuse of memory • Deletes are not the only source of fragmentation • db.<collection>.stats to find the padding factor (between 1 - 2, the higher the more fragmentation) • Repair, compact, or reslave regularly (db.printReplicationInfo() to get the length of your oplog to see if repair is a viable option) Friday, April 26, 13
  19. 3 ways to fix fragmentation: • Re-sync a secondary from

    scratch • limited by the size of your data & oplog • very hard on your primary, use rs.syncFrom() or iptables to sync from secondary • Repair your node • also limited by the size of your oplog • can cause small discrepancies in your data • Run continuous compaction on your snapshot node • http://blog.parse.com/2013/03/26/always-be-compacting/ Friday, April 26, 13
  20. Monitoring Most people start with MMS. • hosted solution from

    10gen • single python script • very pretty graphs • auto-detects replica sets from single node • can do alerting • populated by db.serverStatus() Friday, April 26, 13
  21. Nagios • Nagios should alert (at least) if a node

    actually goes down, or secondaries are lagging • We use check_mongodb.py extensively • Automatically apply all mongo checks to all mongo nodes with a hostgroup: Friday, April 26, 13
  22. Ganglia • the goal: graph everything MMS does, and more

    • use mongodb gmond python modules • our fork of the ganglia cookbook supports multiple clusters. Add the cluster name as a role attribute. Friday, April 26, 13
  23. ... and assign cluster names to ports in your ganglia

    wrapper cookbook. Friday, April 26, 13
  24. Future plans for the mongodb cookbook • Get Parse’s changes

    fully upstreamed • Use LWRPs for mongod, mongos, mongoc • Add better support for ephemeral storage • Populate backup volume attributes from backup host • Support bringing up nodes with secondary initial sync • Choose which secondary to sync from via attribute • Optionally auto-join the cluster • Make EBS raid a LWRP • Add ebs_optimized support for PIOPS • ... and more. Friday, April 26, 13
  25. Glossary of resources • Opscode AWS cookbook • https://github.com/opscode-cookbooks/aws •

    edelight MongoDB cookbook • https://github.com/edelight/chef-mongodb • Parse MongoDB cookbook fork • https://github.com/ParsePlatform/Ops/tree/master/chef/cookbooks/ mongodb • Parse compaction scripts and warmup scripts • http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/ • http://blog.parse.com/2013/03/26/always-be-compacting/ Friday, April 26, 13
  26. Glossary of resources (cont’d) • Opscode nagios cookbook • https://github.com/opscode-cookbooks/nagios

    • nagios check_mongodb.py check • https://github.com/mzupan/nagios-plugin-mongodb • Heavywater ganglia cookbook • http://github.com/hw-cookbooks/ganglia • Parse ganglia cookbook fork • https://github.com/ParsePlatform/Ops/tree/master/chef/cookbooks/ganglia • Ganglia gmond modules • https://github.com/ganglia/gmond_python_modules/tree/master/mongodb Friday, April 26, 13