across multiple attributes • Ex. Type and Genre • Good fit for storing multiple product types with different attributes • Include things like “type” in your shard key
in Javascript • Complex MRs might not be fun to write • Runs in the single-‐threaded Javascript engine • Could be resource intensive for simple operations • Not robust enough for complex operations
• Simpler tasks should be easier • Skip writing Javascript • Skip executing Javascript • Plus, get some support for complex document structures • In development and testing now (2.1.x), stable release soon (2.2)
• Describe a chain of operations • We’re going to continue adding functions • Implemented in the core server (C++) so it works faster and scales better
pipe) • Documents are passed through the pipeline to produce/compute a result • Pipeline operations chained together • Aggregate information • ETL data into different forms
• $project: extract fields, compute values • $unwind: split arrays • $group: put items into defined buckets • $sort: order documents • $limit: only process N documents • $skip: start after X documents
• Avoids collection scanning and pulling in more than you need • Use $sort as early as possible • Query optimizer can be used to choose an index instead of sorting the result itself • Support in drivers via db.runCommand() • Break up the work, documents are limited to 16MB
Building and running MR examples with adapter • http://www.mongodb.org/display/DOCS/Hadoop +Quick+Start • Walkthrough of using streaming • http://blog.mongodb.org/post/24610529795/hadoop-‐ streaming-‐support-‐for-‐mongodb
• Business analytics platforms • Enterprise and Community editions • Ad-‐hoc analysis and reporting, explore data from MongoDB • Integration with other parts of the big data stack • http://www.mongodb.org/display/DOCS/ Business+Intelligence