Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mongo London 5/11: Deployment Tips & Tricks

Mongo London 5/11: Deployment Tips & Tricks

Deployment Tips & Tricks and bits and bobs of Internals, from a talk I gave on MongoDB at Mongo London in May, 2011.

Brendan McAdams

May 05, 2011
Tweet

More Decks by Brendan McAdams

Other Decks in Programming

Transcript

  1. •(200 gigs of MongoDB files creates 200 gigs of virtual

    memory) •OS controls what data in RAM •When a piece of data isn't found, a page fault occurs (Expensive + Locking!) •OS goes to disk to fetch the data •Indexes are part of the Regular Database files •Deployment Trick: Pre-Warm your Database (PreWarming your cache) to prevent cold start slowdown Operating System map files on the Filesystem to Virtual Memory Thursday, May 5, 2011
  2. Big Things To Watch For • % index miss •

    faults / sec • flushes / sec Thursday, May 5, 2011
  3. •For working set queries, CPU usage is typically low MongoDB

    will take advantage of multiple cores Thursday, May 5, 2011
  4. •Surprise: Queries which don't hit indexes make heavy use of

    CPU & Disk •Deployment Trick: Avoid counting & computing on the fly by caching & precomputing data Full Tablescans Thursday, May 5, 2011
  5. DB Profiling is your Friend • Ensure your queries are

    being executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: •db.xx.find({..}).explain() •http://www.mongodb.org/display/DOCS/Optimization • Make sure your Queries are properly indexed. • Deployment Trick: Start mongod with --notablescan to disable tablescans Thursday, May 5, 2011
  6. Indexes • Index on Foo, Bar, Baz” works for “Foo”,

    “Foo, Bar” and “Foo, Bar, Baz” • The Query Optimizer figures out the order but can’t do things in reverse • You can pass hints to force a specific index: db.collection.find({username: ‘foo’, city: ‘New York’}).hint({‘username’: 1}) • Missing Values are indexed as “null” • This includes unique indexes • Deployment Trick: 1.8 has Sparse and Covered Indexes! • system.indexes ! Thursday, May 5, 2011
  7. •Currently Single Threaded; runs in parallel across shards •Deployment Trick:

    Use the new aggregation output options Map Reduce Thursday, May 5, 2011
  8. •Working set should be, as much as possible, in memory

    •Your entire dataset need not be! Working set is crucial!!! Thursday, May 5, 2011
  9. •Disk I/O becomes your definer of performance in non- working

    set queries Disks & I/O Thursday, May 5, 2011
  10. •RAID is good for a variety of reasons •Our Recommendations

    ... Surprise: Faster Disks is better than slow disks. More is also better Thursday, May 5, 2011
  11. •Improved write performance •Survives single disk failure •Downside: Needs double

    storage needs •e.g. 4 20 gig disks gives you 40 gigs of usable space •LVM of RAID 10 on EBS seems to smooth out performance and reliability best for MongoDB RAID 10 (Mirrored sets inside a striped set; minimum 4 disks) Thursday, May 5, 2011
  12. •1 or 2 additional disks required for parity •Can survive

    1 or 2 disk failures •Implementations seem inconsistent, buyer beware RAID 5 or 6 Thursday, May 5, 2011
  13. •Expensive, but getting cheaper •Significantly reduced seek time and increased

    I/O Throughput •Random Writes and Sequential Reads are still a weak point Flash (SSD) Thursday, May 5, 2011
  14. •For production: Use a 64 bit OS and a 64

    bit MongoDB Build •32 Bit has a 2 gig limit; imposed by the operating systems for memory mapped files •Clients can be 32 bit •MongoDB Supports (little endian only) •Linux, FreeBSD, OS X (on Intel, not PowerPC) •Windows •Solaris (Intel only, Joyent offers a cloud service which works for Mongo) OS Thursday, May 5, 2011
  15. •Shows I/O counters, time spent in locks, etc MongoStat -

    free tool which comes with MongoDB Thursday, May 5, 2011
  16. •iostat [args] <seconds per poll> •-x for extended report •Disk

    can be a bottleneck in large datasets where working set > ram •~200-300Mb/s on XL EC2 instances, but YMMV (EBS is slower) •On Amazon Latency spikes are common, 400-600ms (No, this is not a good thing) Similarly, iostat ships on most Linux machines (or can be installed) Thursday, May 5, 2011
  17. Use MongoDB’s Built-in Profiler • Ensure your queries are being

    executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: •db.xx.find({..}).explain() •http://www.mongodb.org/display/DOCS/Optimization • Deployment / Common Sense Trick: Make sure your Queries are properly indexed! Thursday, May 5, 2011
  18. •You can create symbolic links to keep different databases on

    different disks •Best to aggregate your IO across multiple disks •File Allocation All data & namespace files are stored in the 'data' directory (-- dbpath) Thursday, May 5, 2011
  19. Extent Allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000

    00000000000 00000000000 preallocated space 00000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns Thursday, May 5, 2011
  20. Record Allocation Deleted Record (Size, Offset, Next) BSON Data Header

    (Size, Offset, Next, Prev) Padding ... ... Thursday, May 5, 2011
  21. •--logpath <file> •Rotation can be requested of MongoDB... •db.runCommand("logRotate") •kill

    -SIGUSR1 <mongod pid> •killall -SIGUSR1 mongod •Won't work for ./mongod > [file] syntax Logfiles Thursday, May 5, 2011
  22. •MongoDB is filesystem neutral •ext3, ext4 and XFS are most

    used •BUT.... •ext4, XFS or any other filesystem with posix_fallocate() are preferred and best Filesystems Thursday, May 5, 2011
  23. •Many distros default to ext3 (but Amazon AMI now uses

    ext4 by default) •For best performance reformat to EXT4 / XFS •Make sure you use a recent version of EXT4 •Striping (MDADM / LVM) aggregates I/O •See previous recommendations about RAID 10 EC2 Thursday, May 5, 2011
  24. •When doing a lot of updates or deletes.... •Compaction may

    be needed occasionally on indices and datafiles •db.repair() •Replica Sets: •Rolling repairs, start nodes up with --repair param • Deployment Trick: For large bulk data operations consider removing indexes and re-adding them later! (better : a new DB may help) Maintenance Thursday, May 5, 2011
  25. Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2

    rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos  /   config  server mongos  /   config  server mongos  /   config  server Thursday, May 5, 2011
  26. •Eliminates impact on master during backup •Hidden Nodes in 1.8

    Best driven from a slave Thursday, May 5, 2011
  27. •binary, compact object dump •each consistent object is written •NOT

    necessarily consistent from start to finish (Unless you lock the database) •mongorestore to restore binary dump •Deployment Trick #1: database doesn't have to be up to restore, can use dbpath • Deployment Trick #2: mongodump with replSetName/ <hostlist> will automatically read from a slave! mongodump / mongorestore Thursday, May 5, 2011
  28. •lock: blocks writes •db.runCommand({fsync: 1, lock: 1}) •fsync to flush

    buffers to disk •backup •then, unlock •db.$cmd.sys.unlock.findOne(); filelock / fsync Thursday, May 5, 2011
  29. •EBS Can disappear (See: last week) •S3 for longer term

    backups •USE AMAZON AVAILABILITY ZONES •DR / HA With Journaling, you can run an LVM or EBS snapshot and recover later without locking Thursday, May 5, 2011
  30. Shell Functions • Leaving off the () in the shell

    prints the function: > db.coll.find function (query, fields, limit, skip) { return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip); } Thursday, May 5, 2011
  31. _id if not specified drivers will add default: ObjectId("4bface1a2231316e04f3c434") timestamp

    machine id process id counter http://www.mongodb.org/display/DOCS/Object+IDs Thursday, May 5, 2011
  32. BSON Encoding {_id: ObjectId(XXXXXXXXXXXX), hello: “world”} \x27\x00\x00\x00\x07 _ i d

    \x00 X X X X X X X X X X X X \x02 h e l l o \x00 \x06 \x00 \x00 \x00 w o r l d \x00\x00 http://bsonspec.org Thursday, May 5, 2011
  33. Insert Message (TCP / IP ) message length message id

    response id op code (insert) \x68\x00\x00\x00 \xXX\xXX\xXX\xXX \x00\x00\x00\x00 \xd2\x07\x00\x00 reserved collection name document(s) \x00\x00\x00\x00 f o o . t e s t \x00 BSON Data http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol Thursday, May 5, 2011
  34. @mongodb AFTER:  Drinks  at  “The  Slaughtered  Lamb” conferences,  appearances,  and

     meetups http://www.10gen.com/events http://bit.ly/mongoB   Facebook                    |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Thursday, May 5, 2011