An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

1 Summer 2012 Open source, high performance database
Welcome to Mongo San Diego

2 •  now – Aggrega<on Framework, Asya Kamsky, 10gen
•  7:00 – Real User Monitoring with MongoDB, Eric Azoulay, Neustar •  7:45 – Schema Design Principles and Prac<ce, MaO Shopsin, 10gen •  8:30 – Building Mobile Apps with HTML5 & MongoDB, Max Katz, Tiggzi •  9:00 – Get your Spa<al On with MongoDB in the Cloud, Steve Citron-‐Pousty, Red Hat •  9:45 pm -‐ 10:30 pm A"er Party

3 MongoDB 2.2: Almost there!

4 •  2.2 release candidate available now (2.2-‐rc0)
Please try it out (not in produc2on!) and report bugs •  Fix bugs -‐> cut a new release candidate – Is it ready/good? •  NO: ﬁx more bugs, cut the next release candidate •  YES: release, move on to working on 2.3

5 •  Aggrega<on Framework •  TTL (<me-‐to-‐live) collec<ons
•  Geo (data-‐center) aware sharding •  BeOer concurrency (lock yielding on page fault) •  More granular write-‐lock (no more global lock!) •  BeOer query performance •  BeOer isola<on of diﬀerent components •  and much, much more

6 MongoDB 2.2: Aggregation Framework

7 •  Common operations on complex data – totaling, averaging, min,
max, etc – ability to return a subset of array values – grouping documents or subdocuments – answering questions across subdocuments – sorting across subdocuments, etc. •  Currently hard in MongoDB – Map/Reduce jobs (JavaScript, slow, hard) – Handle in application code

8 •  Our new aggregation framework – Declarative framework •  No
JavaScript required – Describe a chain of operations to apply – Expression evaluation •  Return computed values – C++ implementation •  Higher performance than JavaScript

10 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1
Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Map function emit Reduce function reduce Result collection

11 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1
Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Pipeline operator Pipeline operator Result set Pipeline operator Pipeline operator Pipeline operator

12 •  Aggregation requests specify a pipeline •  A pipeline
is a series of operations •  Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a command-line pipe

13 Pipeline Operators $match, $project, $unwind Pipeline Operators $group Pipeline
Operators $match, $project, $sort, $limit, ...

14 db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit:
…}, etc ]

15 •  $match – Uses a query predicate (like .find({…})) as
a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }

16 •  $project – Uses a sample document to determine the
shape of the result (similar to .find()’s optional argument) •  Include or exclude fields •  Computed fields –  Arithmetic expressions, including built-in functions –  Pull fields from nested documents to the top –  Push fields from the top down into new virtual documents

17 •  $unwind – Hands out array elements one at a
time { $unwind : {“$myarray” } } •  $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning

18 •  $group – Aggregates items into buckets defined by a
key

19 •  $group aggregation expressions – Define a grouping key as
the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions •  $min, $max, $first, $last

20 •  $sort – Sort documents – Sort specifications are the same
as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }

21 •  $limit – Only allow the specified number of documents
to pass { $limit : 20 }

22 •  $skip – Skip over the specified number of documents
{ $skip : 10 }

23 •  Available in $project operations •  Prefix expression language
– Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] – Other functions…. •  And we can easily add more as required

24 •  String functions – toUpper, toLower, substr •  Date field
extraction – Get year, month, day, hour, etc, from ISODate •  Date arithmetic •  Null value substitution (like MySQL ifnull(), Oracle nvl()) •  Ternary conditional – Return one of two values based on a predicate

25 db.scores.aggregate( [ ...
{ "$project" : { "newGrade" : ... { $cond : [ { "$gt" : [ "$score", 90 ] }, "A", ... { $cond: [ { $gt : ["$score",80] } ,"B" , ... { "$cond": [ { "$gt" : [ "$score", 70 ] }, "C", ... { $cond: [{$gt:["$score",60] } , "D", "F"] } ... ] } ... ] } ... ] } ... } }, ... { $group: { _id:"$newGrade", "total":{$sum:1} } }, ... {$sort:{"_id":1}} ... ] )

26 •  Use $match in a pipeline as early as
possible – The query optimizer can then choose to scan an index and avoid scanning the entire collection •  Use $sort in a pipeline as early as possible – The query optimizer can then be used to choose an index to scan instead of sorting the result

27 •  Initial version is a command – For any language,
build a JSON database object, and execute the command •  In the shell: db.runCommand({ aggregate : <collection-name>, pipeline : {…} }); – Beware of command result size limit •  Document size limit is 16MB

28 •  Initial release will support sharding •  Mongos analyzes
pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them

29 •  final bug fixes now – available to play with
in dev version 2.2-rc0 •  Expect to see this in production soon: – 2.2 GA

30 •  More optimizations •  $out pipeline operation – Saves the
document stream to a collection – Similar to M/R $out, with sharded output – Functions like a tee, so that intermediate results can be saved

31 MongoDB San Diego: Enjoy your evening!

An Evening with MongoDB - San Diego 2012: Welco...

An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

mongodb

More Decks by mongodb

Featured

Transcript

1 Summer 2012 Open source, high performance database

2 •  now – Aggrega<on Framework, Asya Kamsky, 10gen

3 MongoDB 2.2: Almost there!

4 •  2.2 release candidate available now (2.2-‐rc0)

5 •  Aggrega<on Framework •  TTL (<me-‐to-‐live) collec<ons

6 MongoDB 2.2: Aggregation Framework

7 •  Common operations on complex data – totaling, averaging, min,

8 •  Our new aggregation framework – Declarative framework •  No

9

10 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1

11 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1

12 •  Aggregation requests specify a pipeline •  A pipeline

13 Pipeline Operators $match, $project, $unwind Pipeline Operators $group Pipeline

14 db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit:

15 •  $match – Uses a query predicate (like .find({…})) as

16 •  $project – Uses a sample document to determine the

17 •  $unwind – Hands out array elements one at a

18 •  $group – Aggregates items into buckets defined by a

19 •  $group aggregation expressions – Define a grouping key as

20 •  $sort – Sort documents – Sort specifications are the same

21 •  $limit – Only allow the specified number of documents

22 •  $skip – Skip over the specified number of documents

23 •  Available in $project operations •  Prefix expression language

24 •  String functions – toUpper, toLower, substr •  Date field

25 db.scores.aggregate( [ ...

26 •  Use $match in a pipeline as early as

27 •  Initial version is a command – For any language,

28 •  Initial release will support sharding •  Mongos analyzes

29 •  final bug fixes now – available to play with

30 •  More optimizations •  $out pipeline operation – Saves the

31 MongoDB San Diego: Enjoy your evening!