Challenges with MongoDB - GAO, Stone. Tech Lead of UMeng.com

Challenges with MongoDB MongoDB Beijing 2012 Stone Gao Monday, April
2, 2012

About Me Tech Lead at Umeng.com Monday, April 2, 2012

MongoDB is Awesome • Document-oriented storage • Full Index Support
• Replication & High Availability • Auto-Sharding • Querying • Fast In-Place Updates • Map/Reduce • GridFS Monday, April 2, 2012

But... This talk is not Yet Another Talk about it’s
Awesomeness but challenges with MongoDB Monday, April 2, 2012

Outline 1. Global Write Lock Sucks 2. Auto-Sharding is not
that Reliable 3. Schema-less is Over Rated 4. Community Contribution is Quite Low 5. Attitude Matters Monday, April 2, 2012

1. Global Write Lock Sucks http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png Monday, April 2, 2012

1. Global Write Lock Sucks mongod db-n doc2 doc1 collection1
doc2 doc1 collection2 db-1 doc2 doc1 collection1 doc2 doc1 collection2 mysqld db-n doc2 doc1 table1 doc2 doc1 table2 db-1 doc2 doc1 table1 doc2 doc1 table2 VS. DB Process Lock VS. Row Lock single global write lock for the entire server (process) Monday, April 2, 2012

1. Global Write Lock Sucks Intel SSD 320 RAID10 &
mongostat Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw) 39.5K Rread IOPS / 23K Write IOPS Monday, April 2, 2012

Possible Solutions/Workarounds #1 Wait for lock related issues on JIRA
•SERVER-1240 : Collection level locking https://jira.mongodb.org/browse/SERVER-1240 Planning Bucket A Vote (154) •SERVER-1241 : Intra collection locking (maybe extent) https://jira.mongodb.org/browse/SERVER-1241 Planning Bucket A Vote (25) •SERVER-2563 : When hitting disk, yield lock - phase 1 https://jira.mongodb.org/browse/SERVER-2563 Fixed in 1.9.1 Vote (25) • any time we actually have to hit disk. so if a memory mapped page is not in ram, then we should yield update by _id, remove, long cursor iteration •SERVER-1169 : Record level locking https://jira.mongodb.org/browse/SERVER-1169 Rejected Vote (1) and more ... Monday, April 2, 2012

Possible Solutions/Workarounds #2 One Collection per DB to Reduce Lock
Ratio But you can go no further Use Auto-Sharding to the rescue ? Monday, April 2, 2012

2. Auto-Sharding is not that Reliable http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg Monday, April 2,
2012

Auto-Sharding is not that Reliable Monday, April 2, 2012

Problems with Auto-Sharding • MongoDB can’t ﬁgure out how many
docs in a collection after sharding • Balancer dead lock [Balancer] skipping balancing round during ongoing split or move activity.) [Balancer] dist_lock lock failed because taken by.... [Balancer] Assertion failure cm s/balance.cpp... • Uneven shard load distribution • ... (Note: I did the experiment before 2.0. So some of the issues might be ﬁxed or improved in new versions of MongoDB coz it’s evolving very fast) Monday, April 2, 2012

0) Turn off the balancer (balancing won't understand your locations,
but it shouldn't matter b/c you're using hashed shard keys) 1) Shard the empty collection over the shard key { location : 1, hash : 1 } 2) run db.runCommand({ split : "<coll>", middle : { "location":"DEN", "hash": "8000...0" }}) 3) run db.runCommand({ split : "<coll>", middle : { "location":"SC", "hash": "0000...0" }}) 4) move those empty chunks to whatever shards you want - Greg Studer Possible Solutions/Workarounds #1 Manual Chunk Pre-Splitting http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks https://groups.google.com/d/msg/mongodb-user/tYBFKSMM3cU/TiYtoOiNMgEJ http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/ Monday, April 2, 2012

Possible Solutions/Workarounds #2 https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png SERVER-2001 : Option to hash shard
key https://jira.mongodb.org/browse/SERVER-2001 Unresolved Fix Version/s: 2.1.1 Vote (27) “The lack of hashing based read/write distribution amongst available shards is a huge issue for us now. We're actually considering implementing an app-side layer to do this but that obviously has a number of serious drawbacks.” - Remon van Vliet “Seems like a good idea : we implemented hashed shard key on client-side : operation rate sky rocked ( x3 and less variability). Balancing is moreover quicker and done during our very heavy insertion process : perfect !” - Grégoire Seux Monday, April 2, 2012

Possible Solutions/Workarounds #3 Plain-old Application Level Sharding https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png Monday, April
2, 2012

3. Schema-less is Over Rated http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg Monday, April 2, 2012

Schema-less is Over Rated Schema-Free (schema-less) is not free. It
means repeat the schema in every docs (records) ! Monday, April 2, 2012

Possible Solutions/Workarounds #1 Use Short Key Names ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names
{"sequence":"AHAHSPGPGSAVKLPAPHSVGKSALR", "location":{ "chromosome":"19", "strand":"-", "begin":"51067007", "end":"51067085" }} {"s":"AHAHSPGPGSAVKLPAPHSVGKSALR", "l":{ "c":"19", "s":"-", "b":"51067007", "e":"51067085" }} 1.6 billion documents 243 GB 183 GB 60 GB saved! Monday, April 2, 2012

Possible Solutions/Workarounds #2 SERVER-863 : Tokenize the field names https://jira.mongodb.org/browse/SERVER-863
planned but not scheduled Vote (66) “Most collections, even if they don’t contain the same structure , they contain similar. So it would make a lot of sense and save a lot of space to tokenize the field names.” “The overall benefit as mentioned by other users is that you reduce the amount of storage/RAM taken up by redundant data in each document (so you can use less resources per request, hence gain more throughput and capacity), while importantly also freeing the developer from having to pick short and hard to read field names as a workaround for a technical limitation.” - Andrew Armstrong Monday, April 2, 2012

Possible Solutions/Workarounds #3 SERVER-164 : Option to store data compressed
https://jira.mongodb.org/browse/SERVER-164 planned but not scheduled Vote (126) “The way oracle handles this is transparent to the database server at the block engine level. They compress the blocks similar to how SAN store's handle it rather than at a record level. They use zlib type compression and the overhead is less than 5 percent. Due to the IO access reduction in both number of blocks touched, and amount of data transferred, the overall effect is a cumulative speed increase. Should MongoDB do it this way? Maybe? But at the end of the day, the architecture must make Mongo more scalable, as well as increase the ability limit the storage footprint.” - Michael D. Joy Monday, April 2, 2012

4. Community Contribution is Quite Low http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg Monday, April 2,
2012

Community Contribution is Quite Low https://github.com/mongodb/mongo/graphs/impact https://github.com/mongodb/mongo/contributors Monday, April 2,
2012

5. Attitude Matters Monday, April 2, 2012

5. Attitude Matters MongoDB already has the sweetest API in
the NoSQL world. Wish more effort invested in ﬁxing the Hard Problems : locking, sharding, storage engine... http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart Monday, April 2, 2012

We are hiring • Backend Engineer (MongoDB, Hadoop, HBase, Storm,
Scala, Java, Ruby, Clojure) • Data Mining Engineer • DevOps Engineer • Front End Engineer We are doing bigdata analytics [email protected] Monday, April 2, 2012

Contact • Email : [email protected] [email protected] • Twitter: @stonegao Monday,
April 2, 2012

Thanks Q & A Monday, April 2, 2012

Challenges with MongoDB - GAO, Stone. Tech Lead...

Challenges with MongoDB - GAO, Stone. Tech Lead of UMeng.com

mongodb

More Decks by mongodb

Other Decks in Technology

Featured

Transcript

Challenges with MongoDB MongoDB Beijing 2012 Stone Gao Monday, April

About Me Tech Lead at Umeng.com Monday, April 2, 2012

MongoDB is Awesome • Document-oriented storage • Full Index Support

But... This talk is not Yet Another Talk about it’s

Outline 1. Global Write Lock Sucks 2. Auto-Sharding is not

1. Global Write Lock Sucks http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png Monday, April 2, 2012

1. Global Write Lock Sucks mongod db-n doc2 doc1 collection1

1. Global Write Lock Sucks Intel SSD 320 RAID10 &

1. Global Write Lock Sucks Intel SSD 320 RAID10 &

1. Global Write Lock Sucks Intel SSD 320 RAID10 &

1. Global Write Lock Sucks Intel SSD 320 RAID10 &

Possible Solutions/Workarounds #1 Wait for lock related issues on JIRA

Possible Solutions/Workarounds #2 One Collection per DB to Reduce Lock

2. Auto-Sharding is not that Reliable http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg Monday, April 2,

Auto-Sharding is not that Reliable Monday, April 2, 2012

Problems with Auto-Sharding • MongoDB can’t ﬁgure out how many

0) Turn off the balancer (balancing won't understand your locations,

Possible Solutions/Workarounds #2 https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png SERVER-2001 : Option to hash shard

Possible Solutions/Workarounds #3 Plain-old Application Level Sharding https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png Monday, April

3. Schema-less is Over Rated http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg Monday, April 2, 2012

Schema-less is Over Rated Schema-Free (schema-less) is not free. It

Possible Solutions/Workarounds #1 Use Short Key Names ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names

Possible Solutions/Workarounds #2 SERVER-863 : Tokenize the ﬁeld names https://jira.mongodb.org/browse/SERVER-863

Possible Solutions/Workarounds #3 SERVER-164 : Option to store data compressed

4. Community Contribution is Quite Low http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg Monday, April 2,

Community Contribution is Quite Low https://github.com/mongodb/mongo/graphs/impact https://github.com/mongodb/mongo/contributors Monday, April 2,

5. Attitude Matters Monday, April 2, 2012

5. Attitude Matters MongoDB already has the sweetest API in

We are hiring • Backend Engineer (MongoDB, Hadoop, HBase, Storm,

Contact • Email : [email protected] [email protected] • Twitter: @stonegao Monday,

Thanks Q & A Monday, April 2, 2012