Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

D8fc2580cfaca035f666d9e4ee79a7f7?s=47 mongodb
July 25, 2012
250

An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

Asya Kamsky, 10gen
Welcome and Kickoff!
We're working on a new aggregation framework for MongoDB that will introduce a new aggregation system that will make it a lot easier to do simple tasks like counting, averaging, and finding minima or maxima while grouping by keys in a collection. The new aggregation features are not a replacement for map-reduce, but will make it possible to do a number of things much more easily, without having to resort to the big hammer that is map-reduce. After introducing the syntax and usage patterns for the new aggregation system, we will give some demonstrations of aggregation using the new system.

D8fc2580cfaca035f666d9e4ee79a7f7?s=128

mongodb

July 25, 2012
Tweet

Transcript

  1. 1 Summer  2012   Open  source,  high  performance  database  

    Welcome to Mongo San Diego
  2. 2 •  now  –  Aggrega<on  Framework,  Asya  Kamsky,  10gen  

    •  7:00  –  Real  User  Monitoring  with  MongoDB,                                                                                                                        Eric  Azoulay,  Neustar   •  7:45  –  Schema  Design  Principles  and  Prac<ce,                                                                                                                        MaO  Shopsin,  10gen   •  8:30  –  Building  Mobile  Apps  with  HTML5  &  MongoDB,                                                                                                                        Max  Katz,  Tiggzi   •  9:00  –  Get  your  Spa<al  On  with  MongoDB  in  the  Cloud,                                                                                                                        Steve  Citron-­‐Pousty,  Red  Hat   •  9:45  pm  -­‐  10:30  pm          A"er  Party  
  3. 3 MongoDB 2.2: Almost there!

  4. 4 •  2.2  release  candidate  available  now  (2.2-­‐rc0)    

    Please  try  it  out  (not  in  produc2on!)  and  report  bugs     •  Fix  bugs  -­‐>  cut  a  new  release  candidate   – Is  it  ready/good?   •  NO:  fix  more  bugs,  cut  the  next  release  candidate   •  YES:  release,  move  on  to  working  on  2.3  
  5. 5 •  Aggrega<on  Framework   •  TTL  (<me-­‐to-­‐live)  collec<ons  

    •  Geo  (data-­‐center)  aware  sharding   •  BeOer  concurrency  (lock  yielding  on  page  fault)   •  More  granular  write-­‐lock  (no  more  global  lock!)   •  BeOer  query  performance   •  BeOer  isola<on  of  different  components   •  and  much,  much  more  
  6. 6 MongoDB 2.2: Aggregation Framework

  7. 7 •  Common operations on complex data – totaling, averaging, min,

    max, etc – ability to return a subset of array values – grouping documents or subdocuments – answering questions across subdocuments – sorting across subdocuments, etc. •  Currently hard in MongoDB – Map/Reduce jobs (JavaScript, slow, hard) – Handle in application code
  8. 8 •  Our new aggregation framework – Declarative framework •  No

    JavaScript required – Describe a chain of operations to apply – Expression evaluation •  Return computed values – C++ implementation •  Higher performance than JavaScript
  9. 9

  10. 10 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1

    Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Map function emit Reduce function reduce Result collection
  11. 11 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1

    Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Pipeline operator Pipeline operator Result set Pipeline operator Pipeline operator Pipeline operator
  12. 12 •  Aggregation requests specify a pipeline •  A pipeline

    is a series of operations •  Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a command-line pipe
  13. 13 Pipeline Operators $match, $project, $unwind Pipeline Operators $group Pipeline

    Operators $match, $project, $sort, $limit, ...
  14. 14 db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit:

    …}, etc ]
  15. 15 •  $match – Uses a query predicate (like .find({…})) as

    a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }
  16. 16 •  $project – Uses a sample document to determine the

    shape of the result (similar to .find()’s optional argument) •  Include or exclude fields •  Computed fields –  Arithmetic expressions, including built-in functions –  Pull fields from nested documents to the top –  Push fields from the top down into new virtual documents
  17. 17 •  $unwind – Hands out array elements one at a

    time { $unwind : {“$myarray” } } •  $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning
  18. 18 •  $group – Aggregates items into buckets defined by a

    key
  19. 19 •  $group aggregation expressions – Define a grouping key as

    the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions •  $min, $max, $first, $last
  20. 20 •  $sort – Sort documents – Sort specifications are the same

    as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }
  21. 21 •  $limit – Only allow the specified number of documents

    to pass { $limit : 20 }
  22. 22 •  $skip – Skip over the specified number of documents

    { $skip : 10 }
  23. 23 •  Available in $project operations •  Prefix expression language

    – Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] – Other functions…. •  And we can easily add more as required
  24. 24 •  String functions – toUpper, toLower, substr •  Date field

    extraction – Get year, month, day, hour, etc, from ISODate •  Date arithmetic •  Null value substitution (like MySQL ifnull(), Oracle nvl()) •  Ternary conditional – Return one of two values based on a predicate
  25. 25 db.scores.aggregate(  [   ...          

             {  "$project"  :  {              "newGrade"  :   ...                              {    $cond  :      [    {  "$gt"  :  [  "$score",  90    ]  },    "A",   ...                                        {  $cond:    [  {  $gt  :  ["$score",80]  }  ,"B"  ,   ...                                              {  "$cond":    [    {  "$gt"  :  [  "$score",  70    ]  },    "C",   ...                                                        {  $cond:  [{$gt:["$score",60]  }  ,  "D",  "F"]  }   ...                                              ]    }   ...                                        ]  }   ...                              ]  }   ...                    }  },   ...                  {  $group:  {  _id:"$newGrade",  "total":{$sum:1}  }  },   ...                  {$sort:{"_id":1}}   ...  ]  )  
  26. 26 •  Use $match in a pipeline as early as

    possible – The query optimizer can then choose to scan an index and avoid scanning the entire collection •  Use $sort in a pipeline as early as possible – The query optimizer can then be used to choose an index to scan instead of sorting the result
  27. 27 •  Initial version is a command – For any language,

    build a JSON database object, and execute the command •  In the shell: db.runCommand({ aggregate : <collection-name>, pipeline : {…} }); – Beware of command result size limit •  Document size limit is 16MB
  28. 28 •  Initial release will support sharding •  Mongos analyzes

    pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them
  29. 29 •  final bug fixes now – available to play with

    in dev version 2.2-rc0 •  Expect to see this in production soon: – 2.2 GA
  30. 30 •  More optimizations •  $out pipeline operation – Saves the

    document stream to a collection – Similar to M/R $out, with sharded output – Functions like a tee, so that intermediate results can be saved
  31. 31 MongoDB San Diego: Enjoy your evening!