Operational MongoDB - Brendan McAdams, Software Engineer, 10gen

Operations Understanding MongoDB & Keeping it Happy Brendan McAdams 10gen,
Inc. [email protected] @rit Monday, October 10, 11

Wireless Information SSID: swisscom Login: EVENT/10GEN Password: 10GEN Monday,
October 10, 11

Hardware (You know, for running MongoDB on) Monday, October 10,
11

•MongoDB revolves around memory mapped files Memory Monday, October 10,
11

•(200 gigs of MongoDB files creates 200 gigs of virtual
memory) •OS controls what data in RAM •When a piece of data isn't found, a page fault occurs (Expensive + Locking!) •OS goes to disk to fetch the data •Indexes are part of the Regular Database files •Deployment Trick: Pre-Warm your Database (PreWarming your cache) to prevent cold start slowdown Operating System map files on the Filesystem to Virtual Memory Monday, October 10, 11

Big Things To Watch For • % index miss •
faults / sec • flushes / sec Monday, October 10, 11

• For working set queries, CPU usage is typically low
MongoDB will take advantage of multiple cores Monday, October 10, 11

•Surprise: Queries which don't hit indexes make heavy use of
CPU & Disk •Deployment Trick: Avoid counting & computing on the fly by caching & precomputing data Full Tablescans Monday, October 10, 11

DB Profiling is your Friend • Ensure your queries are
being executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: • db.xx.find({..}).explain() • http://www.mongodb.org/display/DOCS/Optimization • Make sure your Queries are properly indexed. • Deployment Trick: Start mongod with --notablescan to disable tablescans Monday, October 10, 11

Indexes • Index on “Foo, Bar, Baz” works for “Foo”,
“Foo, Bar” and “Foo, Bar, Baz” • The Query Optimizer figures out the order but can’t do things in reverse • You can pass hints to force a specific index: db.collection.find({ username: ‘foo’, city: ‘New York’ }).hint({username: 1}) • Missing Values are indexed as “null” • This includes unique indexes • Deployment Trick: 1.8 has Sparse and Covered Indexes! • Dig into system.indexes • Common Sense Trick: Make sure your Queries are properly indexed! Monday, October 10, 11

•Currently Single Threaded; runs in parallel across shards •Deployment Trick:
Use the new aggregation output options Map Reduce Monday, October 10, 11

•Working set should be, as much as possible, in memory
•Your entire dataset need not be! Working set is crucial!!! Monday, October 10, 11

•Disk I/O becomes your definer of performance in non- working
set queries Disks & I/O Monday, October 10, 11

•RAID is good for a variety of reasons •Our Recommendations
... Surprise: Faster Disks is better than slow disks. More is also better Monday, October 10, 11

•Improved write performance •Survives single disk failure •Downside: Needs double
storage needs •e.g. 4 20 gig disks gives you 40 gigs of usable space •LVM of RAID 10 on EBS seems to smooth out performance and reliability best for MongoDB RAID 10 (Mirrored sets inside a striped set; minimum 4 disks) Monday, October 10, 11

•1 or 2 additional disks required for parity •Can survive
1 or 2 disk failures •Implementations seem inconsistent, buyer beware RAID 5 or 6 Monday, October 10, 11

•Expensive, but getting cheaper •Significantly reduced seek time and increased
I/O Throughput •Random Writes and Sequential Reads are still a weak point Flash (SSD) Monday, October 10, 11

•For production: Use a 64 bit OS and a 64
bit MongoDB Build •32 Bit has a 2 gig limit; imposed by the operating systems for memory mapped files •Clients can be 32 bit •MongoDB Supports (little endian only) •Linux, FreeBSD, OS X (on Intel, not PowerPC) •Windows •Solaris (Intel only, Joyent offers a cloud service which works for Mongo) OS Monday, October 10, 11

Put your journal on a separate spindle if possible Monday,
October 10, 11

Server Status Some tools for examining server status Monday, October
10, 11

•Shows I/O counters, time spent in locks, etc MongoStat -
free tool which comes with MongoDB Monday, October 10, 11

•iostat [args] <seconds per poll> •-x for extended report •Disk
can be a bottleneck in large datasets where working set > ram •~200-300Mb/s on XL EC2 instances, but YMMV (EBS is slower) •On Amazon Latency spikes are common, 400-600ms (No, this is not a good thing) Similarly, iostat ships on most Linux machines (or can be installed) Monday, October 10, 11

iostat Monday, October 10, 11

Files & Filesystems Monday, October 10, 11

•You can create symbolic links to keep different databases on
different disks •Best to aggregate your IO across multiple disks •File Allocation All data & namespace files are stored in the 'data' directory (--dbpath) Monday, October 10, 11

Extent Allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000
00000000000 00000000000 preallocated space 00000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns Monday, October 10, 11

Record Allocation Deleted Record (Size, Offset, Next) BSON Data Header
(Size, Offset, Next, Prev) Padding ... ... Monday, October 10, 11

•--logpath <file> •Rotation can be requested of MongoDB... •db.runCommand("logRotate") •kill
-SIGUSR1 <mongod pid> •killall -SIGUSR1 mongod •Won't work for ./mongod > [file] syntax Logfiles Monday, October 10, 11

•MongoDB is filesystem neutral •ext3, ext4 and XFS are most
used •BUT.... •ext4, XFS or any other filesystem with posix_fallocate() are preferred and best Filesystems Monday, October 10, 11

•Many distros default to ext3 (but Amazon AMI now uses
ext4 by default) •For best performance reformat to EXT4 / XFS •Make sure you use a recent version of EXT4 •Striping (MDADM / LVM) aggregates I/O •See previous recommendations about RAID 10 EC2 Monday, October 10, 11

•When doing a lot of updates or deletes.... •Compaction may
be needed occasionally on indices and datafiles • db.repair() •Replica Sets: •Rolling repairs, start nodes up with --repair param • Deployment Trick: For large bulk data operations consider removing indexes and re-adding them later! (better : a new DB may help) Maintenance Monday, October 10, 11

New “compact” command • Previously, in MongoDB • Compaction was
a per-database operation (“repair”) • Repair required disk space double the database’s size Monday, October 10, 11

New “compact” command Monday, October 10, 11

New “compact” command • 2.0 adds a new “compact” command
Monday, October 10, 11

• Compacts and defragments individual collections Monday, October 10, 11

• Compacts and defragments individual collections • Requires much less disk space overhead to accomplish compaction (small amount simply to create new extent(s)) Monday, October 10, 11

• Compacts and defragments individual collections • Requires much less disk space overhead to accomplish compaction (small amount simply to create new extent(s)) • Markedly faster than “repair” Monday, October 10, 11

• Compacts and defragments individual collections • Requires much less disk space overhead to accomplish compaction (small amount simply to create new extent(s)) • Markedly faster than “repair” • Still blocks – use with care! Monday, October 10, 11

• Compacts and defragments individual collections • Requires much less disk space overhead to accomplish compaction (small amount simply to create new extent(s)) • Markedly faster than “repair” • Still blocks – use with care! • Secondaries running a repair will now automatically demote to “recovery” state Monday, October 10, 11

New “compact” command db.runCommand({compact: “collectionName”}) Monday, October 10, 11

Dump Collection Stats • db.<collectionName>.stats() > db.getCollectionNames().forEach(function(x) { ... print("Collection:
" + x); ... printjson(db[x].stats()); ... }) Monday, October 10, 11

Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2
rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos / config server mongos / config server mongos / config server Monday, October 10, 11

Backups Monday, October 10, 11

•Eliminates impact on master during backup •Hidden Nodes in 1.8
Best driven from a slave Monday, October 10, 11

•binary, compact object dump •each consistent object is written •NOT
necessarily consistent from start to finish (Unless you lock the database) •mongorestore to restore binary dump •Deployment Trick #1: database doesn't have to be up to restore, can use dbpath • Deployment Trick #2: mongodump with replSetName/ <hostlist> will automatically read from a slave! mongodump / mongorestore Monday, October 10, 11

•lock: blocks writes •db.runCommand({fsync: 1, lock: 1}) •fsync to flush
buffers to disk •backup •then, unlock •db.$cmd.sys.unlock.findOne(); filelock / fsync Monday, October 10, 11

•EBS Can disappear (Several incidents this year) •S3 for longer
term backups •USE AMAZON AVAILABILITY ZONES •DR / HA is important. The cloud is neither a panacea nor a magic disaster recovery recipe With Journaling, you can run an LVM or EBS snapshot and recover later without locking Monday, October 10, 11

Shell Functions • Leaving off the () in the shell
prints the function: > db.coll.find function (query, fields, limit, skip) { return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip); } Monday, October 10, 11

Backing up Sharded Clusters • Small Cluster / Easy Route
... • mongodump via mongos • Restore tantamount to rebuilding sharded setup Monday, October 10, 11

Backing up Sharded Clusters • Large Clusters Backup • Turn
off the balancer // connect to mongos (not a config server!) > use config > db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); • Stop one (and only one) config server (makes config db read-only; cluster still read/write; don’t lock+fsync) • Backup stopped config server • Backup each shard • Restart config server • Restart Balancer >use config >db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); Monday, October 10, 11

Restoring Sharded Clusters • Small Cluster / Easy Way •
Setup servers again (including config server) • Restore via mongorestore (all data including config metadata is in backup and will restore) Monday, October 10, 11

Restoring Sharded Clusters • Large Cluster (Restore entire cluster) •
Make sure all servers are stopped (mongod, mongos, etc) • Restore all sharding and config servers individually • restart cluster Monday, October 10, 11

Sharding Administration (Briefly) Monday, October 10, 11

Config Servers are Crucial to Sharding • All of the
information about the sharding setup is stored in the config servers; it’s important you don’t lose them • You may have 1 or 3 config servers; this is the only valid configuration (Two Phase Commit) • Production deployments should always have 3 • If any config server fails ... • Chunk splitting will stop • Migration / balancing will stop • ... Until all 3 servers are back up • This can lead to unbalanced shard situations • Through mongos the config info is in the “config” db Monday, October 10, 11

Keeping Your Balance • The Balancer is crucial to good
sharding • Basic unit of transfer: “chunk” • Default size of 64 MB proves to be a “sweet spot” • More: Migration takes too long, queries la • Less: Overhead of moving doesn’t pay off • The idea is to keep a balance of data & load on each server. Even is good! • Once a threshold of “imbalance” is reached, the balancer kicks in • Usually about ~8 chunks: Don’t want to balance on one doc diff. Monday, October 10, 11

Balancer Migrations • The balancer migrates chunks one at a
time • Known as balancer “rounds” • Balancing rounds continue until difference between any two shards is only 2 chunks • Common Question – “Why isn’t collection $x being balanced?!” • Commonly, it just doesn’t need to. Not enough chunk diff, and the cost of balancing would outweigh the benefit. • Alternately, the balancer may be running but not progressing Monday, October 10, 11

@mongodb German Translators Needed for MongoDB Docs! conferences, appearances, and
meetups http://www.10gen.com/events http://bit.ly/mongo= Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org mms.10gen.com (Free MongoDB Monitoring by 10gen) We’re Hiring ! [email protected] (twitter: @rit) Monday, October 10, 11

Operational MongoDB - Brendan McAdams, Software...

Operational MongoDB - Brendan McAdams, Software Engineer, 10gen

More Decks by mongodb

Other Decks in Technology

Featured

Transcript