Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Aggregation Framework

MongoDB Aggregation Framework

An introduction to the new MongoDB aggregation framework.

Tyler Brock

April 11, 2012
Tweet

More Decks by Tyler Brock

Other Decks in Programming

Transcript

  1. Map Reduce • Used to perform complex analytics tasks on

    massive amounts of data • Users are currently using it for aggregation… • totaling, averaging, etc Map/Reduce is a big hammer Wednesday, April 11, 12
  2. • It should be easier to do simple aggregations •

    Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine Problem Wednesday, April 11, 12
  3. New Aggregation Framework • Declarative • No JavaScript required •

    C++ implementation • Higher performance than JavaScript • Expression evaluation • Return computed values • Framework: we can add new operations easily Wednesday, April 11, 12
  4. Pipeline • Series of operations • Members of a collection

    are passed through a pipeline to produce a result Wednesday, April 11, 12
  5. The Aggregation Command • Takes two arguments • Aggregate --

    name of collection • Pipeline -- array of pipeline operators db.runCommand( { aggregate : "article", pipeline : [ {$op1, $op2, ...} ] } ); Wednesday, April 11, 12
  6. Aggregation helper db.article.aggregate( { $pipeline_op1 }, { $pipeline_op2 }, {

    $pipeline_op3 }, { $pipeline_op4 }, ... ); Wednesday, April 11, 12
  7. Pipeline Operators • $match • $sort • $limit • $skip

    • $project • $unwind • $group Old Faves New Hotness Wednesday, April 11, 12
  8. $match • Uses a query predicate (like .find({…})) as a

    filter { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , } { $match : { author : "bob" } } { $match : { pgv : { $gt : 50, $lte : 90 } } } Wednesday, April 11, 12
  9. • Sorts input documents • Requires sort key -- specified

    like index keys { $sort : { name : 1, age: -1 } } $sort Wednesday, April 11, 12
  10. $limit • Limits the number of JSON documents $skip •

    Skips a number of JSON documents { $limit : 5 } { $skip : 5 } Wednesday, April 11, 12
  11. $project • Project can reshape a document • add, remove,

    rename, move • Similar to .find()’s field selection syntax • But much more powerful • Can generate computed values Wednesday, April 11, 12
  12. $project (include and exclude fields) { $project : { title

    : 1 , /* include this field, if it exists */ author : 1 , /* include this field, if it exists */ "comments.author" : 1 } } { $project : { title : 0 , /* exclude this field */ author : 0 , /* exclude this field */ } } Wednesday, April 11, 12
  13. $project (computed fields) { $project : { title : 1,

    /* include this field if it exists */ doctoredPageViews : { $add: ["$pageViews", 10] } } } Wednesday, April 11, 12
  14. Computed Fields • Prefix expression language • Add two fields

    • $add:[“$field1”, “$field2”] • Provide a value for a missing field • $ifnull:[“$field1”, “$field2”] • Nesting • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] • Date field extraction • Get year, month, day, hour, etc, from Date • Date arithmetic Wednesday, April 11, 12
  15. $project (rename and pull fields up) { $project : {

    title : 1 , page_views : "$pageViews" , /* rename this field */ upgrade : "$other.foo" /* move to top level */ } } Wednesday, April 11, 12
  16. $project (push fields down) { $project : { title :

    1 , stats : { pv : "$pageViews", /* rename this from the top-level */ } } } Wednesday, April 11, 12
  17. $unwind • Produces document for each value in an array

    where the array value is single array element { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "awesome" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } } Wednesday, April 11, 12
  18. { ... tags : "fun" ... }, { ... tags

    : "good" ... } { ... tags : "awesome" ... } Wednesday, April 11, 12
  19. $unwind db.article.aggregate( { $project : { author : 1 ,

    /* include this field */ title : 1 , /* include this field */ tags : 1 /* include this field */ }}, { $unwind : "$tags" } ); Wednesday, April 11, 12
  20. { "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" :

    "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "ok" : 1 } Wednesday, April 11, 12
  21. Grouping • $group aggregation expressions • Total of column values:

    $sum • Average of column values: $avg • Collect column values in an array: $push { $group : { _id: "$author", fieldname: { $aggfunc: “$field” } } } Wednesday, April 11, 12
  22. $group example db.article.aggregate( { $group : { _id : "$author",

    viewsPerAuthor : { $sum : "$pageViews" } }} ); Wednesday, April 11, 12
  23. { "result" : [ { "_id" : "jane", "viewsPerAuthor" :

    6 }, { "_id" : "dave", "viewsPerAuthor" : 7 }, { "_id" : "bob", "viewsPerAuthor" : 5 } ], "ok" : 1 } Wednesday, April 11, 12
  24. Pulling it all together { title : "this is my

    title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] } { tag : “fun” authors: [ ..., ..., ... ] }, { tag: “good” authors: [ ..., ..., ... ] } Wednesday, April 11, 12
  25. db.article.aggregate( { $project : { author : 1, tags :

    1, }}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }} ); Wednesday, April 11, 12
  26. "result" : [ { "_id" : { "tags" : "cool"

    }, "authors" : [ "jane","dave" ] }, { "_id" : { "tags" : "fun" }, "authors" : [ "dave", "bob" ] }, { "_id" : { "tags" : "good" }, "authors" : [ "bob" ] }, { "_id" : { "tags" : "awful" }, "authors" : [ "jane" ] } ] Wednesday, April 11, 12
  27. Usage Tips • Use $match in a pipeline as early

    as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection Wednesday, April 11, 12
  28. Driver Support • Initial version is a command • For

    any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : [ ] } • Beware of command result size limit Wednesday, April 11, 12
  29. Sharding support • Initial release will support sharding • Mongos

    analyzes pipeline, and forwards operations up to first $group or $sort to shards; combines shard server results and continues Wednesday, April 11, 12
  30. Common SQL • Distinct • aggregate({ $group: { _id: "$author"

    }}) • Count • aggregate({ $group: {_id:null, count: {$sum:1}}}]) • Sum • aggregate({ $group: {_id:null, total: {$sum: "$price"}}}) Wednesday, April 11, 12