Slide 1

Slide 1 text

1 Summer  2012   Open  source,  high  performance  database   Welcome to Mongo San Diego

Slide 2

Slide 2 text

2 •  now  –  Aggrega

Slide 3

Slide 3 text

3 MongoDB 2.2: Almost there!

Slide 4

Slide 4 text

4 •  2.2  release  candidate  available  now  (2.2-­‐rc0)     Please  try  it  out  (not  in  produc2on!)  and  report  bugs     •  Fix  bugs  -­‐>  cut  a  new  release  candidate   – Is  it  ready/good?   •  NO:  fix  more  bugs,  cut  the  next  release  candidate   •  YES:  release,  move  on  to  working  on  2.3  

Slide 5

Slide 5 text

5 •  Aggrega

Slide 6

Slide 6 text

6 MongoDB 2.2: Aggregation Framework

Slide 7

Slide 7 text

7 •  Common operations on complex data – totaling, averaging, min, max, etc – ability to return a subset of array values – grouping documents or subdocuments – answering questions across subdocuments – sorting across subdocuments, etc. •  Currently hard in MongoDB – Map/Reduce jobs (JavaScript, slow, hard) – Handle in application code

Slide 8

Slide 8 text

8 •  Our new aggregation framework – Declarative framework •  No JavaScript required – Describe a chain of operations to apply – Expression evaluation •  Return computed values – C++ implementation •  Higher performance than JavaScript

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

10 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Map function emit Reduce function reduce Result collection

Slide 11

Slide 11 text

11 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Pipeline operator Pipeline operator Result set Pipeline operator Pipeline operator Pipeline operator

Slide 12

Slide 12 text

12 •  Aggregation requests specify a pipeline •  A pipeline is a series of operations •  Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a command-line pipe

Slide 13

Slide 13 text

13 Pipeline Operators $match, $project, $unwind Pipeline Operators $group Pipeline Operators $match, $project, $sort, $limit, ...

Slide 14

Slide 14 text

14 db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit: …}, etc ]

Slide 15

Slide 15 text

15 •  $match – Uses a query predicate (like .find({…})) as a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }

Slide 16

Slide 16 text

16 •  $project – Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) •  Include or exclude fields •  Computed fields –  Arithmetic expressions, including built-in functions –  Pull fields from nested documents to the top –  Push fields from the top down into new virtual documents

Slide 17

Slide 17 text

17 •  $unwind – Hands out array elements one at a time { $unwind : {“$myarray” } } •  $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning

Slide 18

Slide 18 text

18 •  $group – Aggregates items into buckets defined by a key

Slide 19

Slide 19 text

19 •  $group aggregation expressions – Define a grouping key as the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions •  $min, $max, $first, $last

Slide 20

Slide 20 text

20 •  $sort – Sort documents – Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }

Slide 21

Slide 21 text

21 •  $limit – Only allow the specified number of documents to pass { $limit : 20 }

Slide 22

Slide 22 text

22 •  $skip – Skip over the specified number of documents { $skip : 10 }

Slide 23

Slide 23 text

23 •  Available in $project operations •  Prefix expression language – Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] – Other functions…. •  And we can easily add more as required

Slide 24

Slide 24 text

24 •  String functions – toUpper, toLower, substr •  Date field extraction – Get year, month, day, hour, etc, from ISODate •  Date arithmetic •  Null value substitution (like MySQL ifnull(), Oracle nvl()) •  Ternary conditional – Return one of two values based on a predicate

Slide 25

Slide 25 text

25 db.scores.aggregate(  [   ...                    {  "$project"  :  {              "newGrade"  :   ...                              {    $cond  :      [    {  "$gt"  :  [  "$score",  90    ]  },    "A",   ...                                        {  $cond:    [  {  $gt  :  ["$score",80]  }  ,"B"  ,   ...                                              {  "$cond":    [    {  "$gt"  :  [  "$score",  70    ]  },    "C",   ...                                                        {  $cond:  [{$gt:["$score",60]  }  ,  "D",  "F"]  }   ...                                              ]    }   ...                                        ]  }   ...                              ]  }   ...                    }  },   ...                  {  $group:  {  _id:"$newGrade",  "total":{$sum:1}  }  },   ...                  {$sort:{"_id":1}}   ...  ]  )  

Slide 26

Slide 26 text

26 •  Use $match in a pipeline as early as possible – The query optimizer can then choose to scan an index and avoid scanning the entire collection •  Use $sort in a pipeline as early as possible – The query optimizer can then be used to choose an index to scan instead of sorting the result

Slide 27

Slide 27 text

27 •  Initial version is a command – For any language, build a JSON database object, and execute the command •  In the shell: db.runCommand({ aggregate : , pipeline : {…} }); – Beware of command result size limit •  Document size limit is 16MB

Slide 28

Slide 28 text

28 •  Initial release will support sharding •  Mongos analyzes pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them

Slide 29

Slide 29 text

29 •  final bug fixes now – available to play with in dev version 2.2-rc0 •  Expect to see this in production soon: – 2.2 GA

Slide 30

Slide 30 text

30 •  More optimizations •  $out pipeline operation – Saves the document stream to a collection – Similar to M/R $out, with sharded output – Functions like a tee, so that intermediate results can be saved

Slide 31

Slide 31 text

31 MongoDB San Diego: Enjoy your evening!