A Brief Tour of MongoDB: Intros, Ops & Internals (Mongo Munich, June 2011)

Mongo Munich Meetup A Brief Tour of MongoDB: Intros, Ops
& Internals Brendan McAdams - 10gen, Inc. [email protected] @rit Monday, July 4, 2011

Introductions • Brendan McAdams <[email protected]> • Started using MongoDB (in
production) ~Feb. 2009 • Engineer at 10gen - “The Company” • Scala support (maintain and develop drivers, improve and assist third party frameworks, community steering) • Java support (contribute to maintenance of drivers, features. Focus is on improving integration for non-Java JVM languages w/ our Java toolchain) • Hadoop support (develop & maintain MongoDB’s Hadoop integration layers, assist deployments) • Support (Free community & commercial) • Community Outreach (Meetups, Conferences) • Training & Consulting Monday, July 4, 2011

Let’s Face It ... SQL Sucks. Monday, July 4, 2011

Let’s Face It ... SQL Sucks. For some problems at
least. Monday, July 4, 2011

I know! Let’s use an ORM! Monday, July 4, 2011

I know! Let’s use an ORM! Congratulations: Now we’ve got
2 problems! Monday, July 4, 2011

Stuffing an object graph into a relational model is like
fitting a square peg into a round hole. Monday, July 4, 2011

Sure, we can use an ORM. But who are we
really fooling? Monday, July 4, 2011

Sure, we can use an ORM. But who are we
really fooling? ... and who/what are we going to wake up next to in the morning? Monday, July 4, 2011

NoSQL Really Means... non-relational, next-generation operational datastores and databases Monday,
July 4, 2011

NoSQL Really Means... non-relational, next-generation operational datastores and databases ...
focus on the “non-relational” bit. Monday, July 4, 2011

Horizontally Scalable Architectures no joins + no complex transactions Monday,
July 4, 2011

no joins + no complex transactions Monday, July 4, 2011

New Data Models no joins + no complex transactions Monday,
July 4, 2011

New Data Models Better Ways to Build Applications? Monday, July
4, 2011

Data Models Key / Value memcached, Dynamo Tabular BigTable Document
Oriented MongoDB, CouchDB Monday, July 4, 2011

depth of functionality scalability & performance • memcached • key/value
• RDBMS Monday, July 4, 2011

MongoDB Key/Value Store Relational Database Monday, July 4, 2011

Flexible “Schemas” { “author”: “brendan”, “text”: “...” } { “author”:
“brendan”, “text”: “...”, “tags”: [“mongodb”, “nosql”] } Monday, July 4, 2011

Here is a “simple” SQL Model mysql> select * from
book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Monday, July 4, 2011

Joins are great and all ... Monday, July 4, 2011

Joins are great and all ... • Potentially organizationally messy
Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend • I don’t know about you, but I have better things to do with my time. Monday, July 4, 2011

The Same Data in MongoDB > db.books.find().forEach(printjson) { "_id" :
ObjectId("4dfa6baa9c65dae09a4bbda3"), "title" : "The Demon-Haunted World: Science as a Candle in the Dark", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda4"), "title" : "Cosmos", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } Monday, July 4, 2011

The Same Data in MongoDB (Part 2) { "_id" :
ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011

Access to the embedded objects is integral > db.books.find({"author.first_name": "Martin",
"author.last_name": "Odersky"}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011

As is manipulation of the embedded data > db.books.update({"author.first_name": "Bill",
"author.last_name": "Venners"}, ... {$set: {"author.$.company": "Artima, Inc."}}) > db.books.update({"author.first_name": "Martin", "author.last_name": "Odersky"}, ... {$set: {"author.$.company": "Typesafe, Inc."}}) > db.books.findOne({"title": /Scala$/}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "author" : [ { "company" : "Typesafe, Inc.", "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "company" : "Artima, Inc.", "first_name" : "Bill", "last_name" : "Venners" } ], "title" : "Programming in Scala" } Monday, July 4, 2011

Hardware & MongoDB Monday, July 4, 2011

•MongoDB revolves around memory mapped files Memory Monday, July 4,
2011

•(200 gigs of MongoDB files creates 200 gigs of virtual
memory) •OS controls what data in RAM •When a piece of data isn't found, a page fault occurs (Expensive + Locking!) •OS goes to disk to fetch the data •Indexes are part of the Regular Database files •Deployment Trick: Pre-Warm your Database (PreWarming your cache) to prevent cold start slowdown Operating System map files on the Filesystem to Virtual Memory Monday, July 4, 2011

Big Things To Watch For • % index miss •
faults / sec • flushes / sec Monday, July 4, 2011

•For working set queries, CPU usage is typically low MongoDB
will take advantage of multiple cores Monday, July 4, 2011

•Surprise: Queries which don't hit indexes make heavy use of
CPU & Disk •Deployment Trick: Avoid counting & computing on the fly by caching & precomputing data Full Tablescans Monday, July 4, 2011

DB Profiling is your Friend • Ensure your queries are
being executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: •db.xx.find({..}).explain() •http://www.mongodb.org/display/DOCS/Optimization • Make sure your Queries are properly indexed. • Deployment Trick: Start mongod with --notablescan to disable tablescans Monday, July 4, 2011

Indexes • Index on Foo, Bar, Baz” works for “Foo”,
“Foo, Bar” and “Foo, Bar, Baz” • The Query Optimizer figures out the order but can’t do things in reverse • You can pass hints to force a specific index: db.collection.find({username: ‘foo’, city: ‘New York’}).hint({‘username’: 1}) • Missing Values are indexed as “null” • This includes unique indexes • Deployment Trick: 1.8 has Sparse and Covered Indexes! • system.indexes ! Monday, July 4, 2011

•Currently Single Threaded; runs in parallel across shards •Deployment Trick:
Use the new aggregation output options Map Reduce Monday, July 4, 2011

•Working set should be, as much as possible, in memory
•Your entire dataset need not be! Working set is crucial!!! Monday, July 4, 2011

•Disk I/O becomes your definer of performance in non- working
set queries Disks & I/O Monday, July 4, 2011

•RAID is good for a variety of reasons •Our Recommendations
... Surprise: Faster Disks is better than slow disks. More is also better Monday, July 4, 2011

•Improved write performance •Survives single disk failure •Downside: Needs double
storage needs •e.g. 4 20 gig disks gives you 40 gigs of usable space •LVM of RAID 10 on EBS seems to smooth out performance and reliability best for MongoDB RAID 10 (Mirrored sets inside a striped set; minimum 4 disks) Monday, July 4, 2011

Raid 10 is NOT Raid 0+1 • Striping on top
of Mirrors vs. Mirrors on top of Striping • (The order on diagrams can be confusing) • This is RAID 0 + 1, not RAID 10 Monday, July 4, 2011

Raid 10 is NOT Raid 0+1 • Striping on top
of Mirrors vs. Mirrors on top of Striping • (The order on diagrams can be confusing) • This is RAID 10, not RAID 0+1 Monday, July 4, 2011

•1 or 2 additional disks required for parity •Can survive
1 or 2 disk failures •Implementations seem inconsistent, buyer beware RAID 5 or 6 Monday, July 4, 2011

•Expensive, but getting cheaper •Significantly reduced seek time and increased
I/O Throughput •Random Writes and Sequential Reads are still a weak point Flash (SSD) Monday, July 4, 2011

•For production: Use a 64 bit OS and a 64
bit MongoDB Build •32 Bit has a 2 gig limit; imposed by the operating systems for memory mapped files •Clients can be 32 bit •MongoDB Supports (little endian only) •Linux, FreeBSD, OS X (on Intel, not PowerPC) •Windows •Solaris (Intel only, Joyent offers a cloud service which works for Mongo) OS Monday, July 4, 2011

Put your journal on a separate spindle if possible Monday,
July 4, 2011

Server Status Some tools for examining server status Monday, July
4, 2011

•Shows I/O counters, time spent in locks, etc MongoStat -
free tool which comes with MongoDB Monday, July 4, 2011

•iostat [args] <seconds per poll> •-x for extended report •Disk
can be a bottleneck in large datasets where working set > ram •~200-300Mb/s on XL EC2 instances, but YMMV (EBS is slower) •On Amazon Latency spikes are common, 400-600ms (No, this is not a good thing) Similarly, iostat ships on most Linux machines (or can be installed) [sysstat package on Ubuntu] Monday, July 4, 2011

Use MongoDB’s Built-in Profiler • Ensure your queries are being
executed correctly • Enable profiling • db.setProfilingLevel(n) • n=1: slow operations, n=2: all operations • Viewing profile information • db.system.profile.find({info: /test.foo/}) •http://www.mongodb.org/display/DOCS/Database+Profiler • Query execution plan: •db.xx.find({..}).explain() •http://www.mongodb.org/display/DOCS/Optimization • Deployment / Common Sense Trick: Make sure your Queries are properly indexed! Monday, July 4, 2011

Filesystems Monday, July 4, 2011

•You can create symbolic links to keep different databases on
different disks •Best to aggregate your IO across multiple disks •File Allocation All data & namespace files are stored in the 'data' directory (--dbpath) Monday, July 4, 2011

_id if not specified drivers will add default: ObjectId("4bface1a2231316e04f3c434") timestamp
machine id process id counter http://www.mongodb.org/display/DOCS/Object+IDs Monday, July 4, 2011

BSON Encoding { _id: ObjectId(XXXXXXXXXXXX), hello: “world”} \x27\x00\x00\x00\x07_id\x00 X X
X X X X X X X X X X X X \x02 h e l l o \x00\x06\x00 \x00\x00 w o r l d \x00\x00 http://bsonspec.org Monday, July 4, 2011

bsonspec.org Monday, July 4, 2011

Extent Allocation foo.0 foo.1 foo.2 00000000000 00000000000 00000000000 00000000000 00000000000
00000000000 00000000000 preallocated space 00000000000 0000 foo.$freelist foo.baz foo.bar foo.test allocated per namespace: ns details stored in foo.ns Monday, July 4, 2011

Record Allocation Deleted Record (Size, Offset, Next) BSON Data Header
(Size, Offset, Next, Prev) Padding ... ... Monday, July 4, 2011

Insert Message (TCP / IP ) message length message id
response id op code (insert) \x68\x00\x00\x00 \xXX\xXX\xXX\xXX \x00\x00\x00\x00 \xd2\x07\x00\x00 reserved collection name document(s) \x00\x00\x00\x00 f o o . t e s t \x00 BSON Data http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol Monday, July 4, 2011

•--logpath <file> •Rotation can be requested of MongoDB... •db.runCommand("logRotate") •kill
-SIGUSR1 <mongod pid> •killall -SIGUSR1 mongod •Won't work for ./mongod > [file] syntax Logfiles Monday, July 4, 2011

•MongoDB is filesystem neutral •ext3, ext4 and XFS are most
used •BUT.... •ext4, XFS or any other filesystem with posix_fallocate() are preferred and best Filesystems Monday, July 4, 2011

•Many distros default to ext3 (but Amazon AMI now uses
ext4 by default) •For best performance reformat to EXT4 / XFS •Make sure you use a recent version of EXT4 •Striping (MDADM / LVM) aggregates I/O •See previous recommendations about RAID 10 EC2 Monday, July 4, 2011

•When doing a lot of updates or deletes.... •Compaction may
be needed occasionally on indices and datafiles •db.repair() •Replica Sets: •Rolling repairs, start nodes up with --repair param • Deployment Trick: For large bulk data operations consider removing indexes and re-adding them later! (better : a new DB may help) Maintenance Monday, July 4, 2011

Dump Collection Stats • db.<collectionName>.stats() > db.getCollectionNames().forEach(function(x) { ... print("Collection:
" + x); ... printjson(db[x].stats()); ... }) Monday, July 4, 2011

Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2
rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos / config server mongos / config server mongos / config server Monday, July 4, 2011

Backups Monday, July 4, 2011

•Eliminates impact on master during backup •Hidden Nodes in 1.8
Best driven from a slave Monday, July 4, 2011

•binary, compact object dump •each consistent object is written •NOT
necessarily consistent from start to finish (Unless you lock the database) •mongorestore to restore binary dump •Deployment Trick #1: database doesn't have to be up to restore, can use dbpath • Deployment Trick #2: mongodump with replSetName/ <hostlist> will automatically read from a slave! mongodump / mongorestore Monday, July 4, 2011

•lock: blocks writes •db.runCommand({fsync: 1, lock: 1}) •fsync to flush
buffers to disk •backup •then, unlock •db.$cmd.sys.unlock.findOne(); filelock / fsync Monday, July 4, 2011

•EBS Can disappear (See: last week) •S3 for longer term
backups •USE AMAZON AVAILABILITY ZONES •DR / HA With Journaling, you can run an LVM or EBS snapshot and recover later without locking Monday, July 4, 2011

MongoDB is a Single-Master System • A database is served
by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled Monday, July 4, 2011

A Word on Scalability Monday, July 4, 2011

MongoDB Replica Set Monday, July 4, 2011

MongoDB Supports Sharding • A collection can be sharded •
Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced Monday, July 4, 2011

MongoDB – Sharded Deployment Monday, July 4, 2011

MongoDB Storage Management • Data is kept in memory-mapped files
• Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses Monday, July 4, 2011

MongoDB Server Management • Replica set members are aware of
each other • A majority of votes is required to elect a new primary • Members can be assigned priorities to affect the election • e.g., an “invisible” replica can be created with zero priority for backup purposes Monday, July 4, 2011

• 2,000+ Production Deployments and growing. Monday, July 4, 2011

• 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,
Foursquare, Craigslist, Disney, and more in Production. Monday, July 4, 2011

• 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,
Foursquare, Craigslist, Disney, and more in Production. • Real, full indexes including sparse, covered & geospatial. Monday, July 4, 2011

MongoDB Users • http://www.10gen.com/customers • http://www.10gen.com/presentations • craigslist: http://www.10gen.com/presentation/ mongosf2011/craigslist
• bit.ly: http://blip.tv/mongodb/bit-ly-user-history-auto- sharded-3723147 • shutterfly: http://www.10gen.com/presentation/ mongosv2010/shutterfly Monday, July 4, 2011

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongoB Facebook
| Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Monday, July 4, 2011

A Brief Tour of MongoDB: Intros, Ops & Internal...

A Brief Tour of MongoDB: Intros, Ops & Internals (Mongo Munich, June 2011)

More Decks by Brendan McAdams

Other Decks in Programming

Featured

Transcript