Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aggregation Framework | Mikhail Burtylev

Aggregation Framework | Mikhail Burtylev

Mikhail Burtylev
Meetup #6

Minsk MongoDB User Group

September 18, 2012
Tweet

More Decks by Minsk MongoDB User Group

Other Decks in Programming

Transcript

  1. Map/Reduce •  Big hammer - simple tasks should be easier.

    •  JavaScript Problems  to  solve  
  2. Declarative framework No JavaScript required Describe a chain of operations

    to apply Expression evaluation Return computed values C++ implementation Higher performance than JavaScript Solu'on  (v2.1+)  
  3. Components   Pipelines A pipeline is a series of operations.

    The members of a collection are passed through a pipeline to produce a result. Expressions Computed values (fields & operators). Document format using prefixes.
  4. Invoca'on   Command: db.runCommand{ aggregate: “users", pipeline: [{$op1, $op2, ...}]});

    Helper: db.users.aggregate( {$pipeline_operation1}, {$pipeline_operation2}, ...);
  5. Result   { result: [ {…}, … ], ok: 1

    } Document size limit (16MB)
  6. $match   Filter documents. Similar to find({...}). $where and geospatial

    operations are not allowed. db.users.aggregate([ {$match: {age: {$gt: 16}}} ]);
  7. Projec'ons   {$project: { title: 1, /* include field if

    it exists */ _id: 0, /* exclude */ /* calculated field */ age_month: {$multiply: [“$age”, 12]}, user: “$name”, /* rename */ city: “$home.city”, /* move to top */ add: {age: “$age”} /* sub-document */ } }
  8. $unwind   Hands out array elements one at a time

    in the context of their surrounding documents. •  Multiple documents can be generated from a single source document. •  Nothing for absent or empty array. •  Error for non-array fields.
  9. $group   Aggregates items into buckets defined by a key

    (_id): •  single field •  document •  constant value Other fields are computed: $sum, $avg, $min, $max, $first, $last, $push, $addToSet (no sub-documents allowed)
  10. Unwinding,  Grouping   db.authors.aggregate( {$project: {name: 1, tags: 1, _id:

    0}}, {$unwind: "$tags"}, {$group: { _id: “$tags”, /* single field */ author_count: {$sum: 1}, /*count*/ /* array construction */ authors: {$addToSet: "$name"}}});
  11. $limit   Only allow the specified number of documents to

    pass. Similar to find(…).limit(…). db.aggregate.users([ {$sort: {age: -1}}, {$limit: 10}]);
  12. $skip   Skips over the specified number of documents. Similar

    to find(…).skip(…). db.aggregate.users([ {$sort: {age: -1}}, {$skip: 10}]);
  13. New  Opera'ons   Aggregation Framework: New operations can be added

    in future versions. $out Saves the document stream to a collection. Functions like a tee, so that intermediate results can be saved.
  14. Expressions   Return computed values. Used with $project and $group.

    Can be nested. $multiply: [“$age”, 12] $multiply: [ {$ifNull: [“$age”, 10]}, 12] Boolean Comparison Arithmetic String Date Conditional
  15. Boolean  operators   $not $and, $or (input array of one

    or more values, short-circuit logic) BSON conversion standards: •  null, undefined, 0 - false •  Non-zero values, dates, strings, objects - true
  16. Comparison  operators   Compare numbers, strings and dates. Two operands

    in array. $cmp, $eq, $ne $gt, $gte, $lt, $lte {$gt: [“$field1”, “$field2”]}
  17. Date  operators   Extract values from date objects. $dayOfYear, $dayOfMonth,

    $dayOfWeek $year, $month, $week $hour, $minute, $second
  18. Condi'onal  operators   Null value substitution {$ifNull: [“$f1”, “$f2”]} Ternary

    conditional {$cond: [ <bool expression>, <true-case>, <false-case>]}
  19. Early Filtering: $match (appropriate index can be used to avoid

    scanning the entire collection). Sorting: $sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators: $project, $unwind, $group. Op'miza'on  
  20. Sharding   Mongos •  analyzes pipeline, and forwards operations up

    to $group or $sort to shards; •  merges $sort and $group results; •  processes remaining operations. Early $match can exclude shards. Required more CPU resources.
  21. [ {$match: {…}}, /* filter data */ {$sort: {…}}, /*

    sort documents */ {$limit: {…}}, /* limit document stream */ {group: {…}}, /* group data */ {$sort: {…}}, /* sort by calc. values */ {$project: {…}} /* reshape data */ ] Order  of  opera'ons  
  22. SQL  &  MongoDB   db.users.aggregate([ {$match: { age: {$gte: 16}}},

    {$group: { _id: “$city”, count: {$sum: 1}}}, {$match: { count: {$lt: 10000}}}, {$sort: { adult_count: -1}}, {$project: { _id: 0, city: “$_id”, adult_count: “$count”}}]); select - $project where/having - $match group by - $group order by - $sort select city, count(id) from users where age >= 16 group by city having count(id) < 10000 order by 2 desc
  23. Limita'ons   The pipeline cannot operate on values of the

    following types: •  Binary •  Symbol •  MinKey •  MaxKey •  DBRef •  Code •  CodeWScope
  24. Limita'ons   If any single aggregation operation consumes more than

    10% of system RAM the operation will produce an error (5% - warning).
  25. Limita'ons   $where or geospatial operations can’t be used in

    $match queries as part of the aggregation pipeline.