Slide 1

Slide 1 text

  Aggrega'on   Framework  

Slide 2

Slide 2 text

Map/Reduce •  Big hammer - simple tasks should be easier. •  JavaScript Problems  to  solve  

Slide 3

Slide 3 text

Count/distinct/group •  Not enough. Problems  to  solve  

Slide 4

Slide 4 text

Handling complex documents •  Selecting only matching subdocuments or arrays. Problems  to  solve  

Slide 5

Slide 5 text

Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values C++ implementation Higher performance than JavaScript Solu'on  (v2.1+)  

Slide 6

Slide 6 text

Components   Pipelines A pipeline is a series of operations. The members of a collection are passed through a pipeline to produce a result. Expressions Computed values (fields & operators). Document format using prefixes.

Slide 7

Slide 7 text

Invoca'on   Command: db.runCommand{ aggregate: “users", pipeline: [{$op1, $op2, ...}]}); Helper: db.users.aggregate( {$pipeline_operation1}, {$pipeline_operation2}, ...);

Slide 8

Slide 8 text

Result   { result: [ {…}, … ], ok: 1 } Document size limit (16MB)

Slide 9

Slide 9 text

Pipeline  Opera'ons   $match $project $unwind $group $sort $limit $skip

Slide 10

Slide 10 text

$match   Filter documents. Similar to find({...}). $where and geospatial operations are not allowed. db.users.aggregate([ {$match: {age: {$gt: 16}}} ]);

Slide 11

Slide 11 text

$project   Reshape documents. Similar to find()’s optional argument. Include/exclude/rename fields. Computed fields. Nested documents.

Slide 12

Slide 12 text

Projec'ons   {$project: { title: 1, /* include field if it exists */ _id: 0, /* exclude */ /* calculated field */ age_month: {$multiply: [“$age”, 12]}, user: “$name”, /* rename */ city: “$home.city”, /* move to top */ add: {age: “$age”} /* sub-document */ } }

Slide 13

Slide 13 text

$unwind   Hands out array elements one at a time in the context of their surrounding documents. •  Multiple documents can be generated from a single source document. •  Nothing for absent or empty array. •  Error for non-array fields.

Slide 14

Slide 14 text

$group   Aggregates items into buckets defined by a key (_id): •  single field •  document •  constant value Other fields are computed: $sum, $avg, $min, $max, $first, $last, $push, $addToSet (no sub-documents allowed)

Slide 15

Slide 15 text

Unwinding,  Grouping   db.authors.aggregate( {$project: {name: 1, tags: 1, _id: 0}}, {$unwind: "$tags"}, {$group: { _id: “$tags”, /* single field */ author_count: {$sum: 1}, /*count*/ /* array construction */ authors: {$addToSet: "$name"}}});

Slide 16

Slide 16 text

$sort   Sorts documents. Similar to find(…).sort({…}). db.aggregate.users([ {$sort: {age: -1}}]);

Slide 17

Slide 17 text

$limit   Only allow the specified number of documents to pass. Similar to find(…).limit(…). db.aggregate.users([ {$sort: {age: -1}}, {$limit: 10}]);

Slide 18

Slide 18 text

$skip   Skips over the specified number of documents. Similar to find(…).skip(…). db.aggregate.users([ {$sort: {age: -1}}, {$skip: 10}]);

Slide 19

Slide 19 text

New  Opera'ons   Aggregation Framework: New operations can be added in future versions. $out Saves the document stream to a collection. Functions like a tee, so that intermediate results can be saved.

Slide 20

Slide 20 text

Expressions   Return computed values. Used with $project and $group. Can be nested. $multiply: [“$age”, 12] $multiply: [ {$ifNull: [“$age”, 10]}, 12] Boolean Comparison Arithmetic String Date Conditional

Slide 21

Slide 21 text

Boolean  operators   $not $and, $or (input array of one or more values, short-circuit logic) BSON conversion standards: •  null, undefined, 0 - false •  Non-zero values, dates, strings, objects - true

Slide 22

Slide 22 text

Comparison  operators   Compare numbers, strings and dates. Two operands in array. $cmp, $eq, $ne $gt, $gte, $lt, $lte {$gt: [“$field1”, “$field2”]}

Slide 23

Slide 23 text

Arithme'c  operators   $add, $multiply Multiple numbers in array. $subtract, $divide, $mod Two numbers in array.

Slide 24

Slide 24 text

String  operators   $strcasecmp Case-insensitive $toLower, $toUpper $substr Not encoding aware! (Latin alphabet)

Slide 25

Slide 25 text

Date  operators   Extract values from date objects. $dayOfYear, $dayOfMonth, $dayOfWeek $year, $month, $week $hour, $minute, $second

Slide 26

Slide 26 text

Condi'onal  operators   Null value substitution {$ifNull: [“$f1”, “$f2”]} Ternary conditional {$cond: [ , , ]}

Slide 27

Slide 27 text

Early Filtering: $match (appropriate index can be used to avoid scanning the entire collection). Sorting: $sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators: $project, $unwind, $group. Op'miza'on  

Slide 28

Slide 28 text

Sharding   Mongos •  analyzes pipeline, and forwards operations up to $group or $sort to shards; •  merges $sort and $group results; •  processes remaining operations. Early $match can exclude shards. Required more CPU resources.

Slide 29

Slide 29 text

[ {$match: {…}}, /* filter data */ {$sort: {…}}, /* sort documents */ {$limit: {…}}, /* limit document stream */ {group: {…}}, /* group data */ {$sort: {…}}, /* sort by calc. values */ {$project: {…}} /* reshape data */ ] Order  of  opera'ons  

Slide 30

Slide 30 text

SQL  &  MongoDB   db.users.aggregate([ {$match: { age: {$gte: 16}}}, {$group: { _id: “$city”, count: {$sum: 1}}}, {$match: { count: {$lt: 10000}}}, {$sort: { adult_count: -1}}, {$project: { _id: 0, city: “$_id”, adult_count: “$count”}}]); select - $project where/having - $match group by - $group order by - $sort select city, count(id) from users where age >= 16 group by city having count(id) < 10000 order by 2 desc

Slide 31

Slide 31 text

Limita'ons   The pipeline cannot operate on values of the following types: •  Binary •  Symbol •  MinKey •  MaxKey •  DBRef •  Code •  CodeWScope

Slide 32

Slide 32 text

Limita'ons   Output from the pipeline can only contain 16 MB.

Slide 33

Slide 33 text

Limita'ons   If any single aggregation operation consumes more than 10% of system RAM the operation will produce an error (5% - warning).

Slide 34

Slide 34 text

Limita'ons   $where or geospatial operations can’t be used in $match queries as part of the aggregation pipeline.

Slide 35

Slide 35 text

Limita'ons   String operators are not encoding aware (latin alphabet).

Slide 36

Slide 36 text

Used  materials   http://docs.mongodb.org/manual/aggregation/ http://www.slideshare.net/cwestin63/mongodbs-new-aggregation-framework http://www.slideshare.net/mongodb/introduction-to-the-new-aggregation- framework http://www.mongodb.org/display/DOCS/Aggregation http://stackoverflow.com/questions/12337319/mongodb-group-group-and- mapreduce http://blog.mongodb.org/post/16015854270/operations-in-the-new- aggregation-framework https://jira.mongodb.org/browse/SERVER-3253 http://habrahabr.ru/post/139643/

Slide 37

Slide 37 text

Ques'ons?