Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Aggregation Framework Overview

MongoDB Aggregation Framework Overview

Presented at DeNormalised London

cj_harris5

March 23, 2012
Tweet

More Decks by cj_harris5

Other Decks in Technology

Transcript

  1. Chris Harris Email : [email protected] Twitter : cj_harris5 DeNormalised London:

    Aggregation Framework Overview Wednesday, 21 March 12
  2. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

    Join Embedding & Linking Partition Shard Partition Key Shard Key Wednesday, 21 March 12
  3. Here is a “simple” SQL Model mysql> select * from

    book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Wednesday, 21 March 12
  4. The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title"

    : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Wednesday, 21 March 12
  5. What problem are we solving? • Map/Reduce can be used

    for aggregation… • Currently being used for totaling, averaging, etc • Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine • We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arrays Wednesday, 21 March 12
  6. How will we solve the problem? • New aggregation framework

    • Declarative framework (no JavaScript) • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: new operations added easily • C++ implementation Wednesday, 21 March 12
  7. Aggregation - Pipelines • Aggregation requests specify a pipeline •

    A pipeline is a series of operations • Members of a collection are passed through a pipeline to produce a result • ps -ef | grep -i mongod Wednesday, 21 March 12
  8. Example - twitter { "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : {

    "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, } } • Find the # of followers and # friends by location Wednesday, 21 March 12
  9. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 },

    "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Wednesday, 21 March 12
  10. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 },

    "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Predicate Wednesday, 21 March 12
  11. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 },

    "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Predicate Parts of the document you want to project Wednesday, 21 March 12
  12. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 },

    "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Predicate Parts of the document you want to project Function to apply to the result set Wednesday, 21 March 12
  13. Example - twitter { "result" : [ { "_id" :

    "Far Far Away", "friends" : 344, "followers" : 789 }, ... ], "ok" : 1 } Wednesday, 21 March 12
  14. Projections • $project can reshape results • Include or exclude

    fields • Computed fields • Arithmetic expressions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documents Wednesday, 21 March 12
  15. Unwinding • $unwind can “stream” arrays • Array values are

    doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returning Wednesday, 21 March 12
  16. Grouping • $group aggregation expressions • Define a grouping key

    as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $last Wednesday, 21 March 12
  17. Sorting • $sort can sort documents • Sort specifications are

    the same as today, e.g., $sort:{ key1: 1, key2: -1, …} Wednesday, 21 March 12
  18. Computed Expressions • Available in $project operations • Prefix expression

    language • $add:[“$field1”, “$field2”] • $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • $divide, $mod, $multiply Wednesday, 21 March 12
  19. Computed Expressions • String functions • $toUpper, $toLower, $substr •

    Date field extraction • $year, $month, $day, $hour... • Date arithmetic • $ifNull • Ternary conditional • Return one of two values based on a predicate Wednesday, 21 March 12