Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

mongodb
July 25, 2012
280

An Evening with MongoDB - San Diego 2012: Welcome and New Aggregation Framework

Asya Kamsky, 10gen
Welcome and Kickoff!
We're working on a new aggregation framework for MongoDB that will introduce a new aggregation system that will make it a lot easier to do simple tasks like counting, averaging, and finding minima or maxima while grouping by keys in a collection. The new aggregation features are not a replacement for map-reduce, but will make it possible to do a number of things much more easily, without having to resort to the big hammer that is map-reduce. After introducing the syntax and usage patterns for the new aggregation system, we will give some demonstrations of aggregation using the new system.

mongodb

July 25, 2012
Tweet

Transcript

  1. 1
    Summer  2012  
    Open  source,  high  performance  database  
    Welcome to
    Mongo San Diego

    View Slide

  2. 2
    •  now  –  Aggrega•  7:00  –  Real  User  Monitoring  with  MongoDB,    
                                                                                                                       Eric  Azoulay,  Neustar  
    •  7:45  –  Schema  Design  Principles  and  Prac                                                                                                                    MaO  Shopsin,  10gen  
    •  8:30  –  Building  Mobile  Apps  with  HTML5  &  MongoDB,    
                                                                                                                       Max  Katz,  Tiggzi  
    •  9:00  –  Get  your  Spa                                                                                                                    Steve  Citron-­‐Pousty,  Red  Hat  
    •  9:45  pm  -­‐  10:30  pm          A"er  Party  

    View Slide

  3. 3
    MongoDB 2.2: Almost there!

    View Slide

  4. 4
    •  2.2  release  candidate  available  now  (2.2-­‐rc0)  
     
    Please  try  it  out  (not  in  produc2on!)  and  report  bugs  
     
    •  Fix  bugs  -­‐>  cut  a  new  release  candidate  
    – Is  it  ready/good?  
    •  NO:  fix  more  bugs,  cut  the  next  release  candidate  
    •  YES:  release,  move  on  to  working  on  2.3  

    View Slide

  5. 5
    •  Aggrega•  TTL  (•  Geo  (data-­‐center)  aware  sharding  
    •  BeOer  concurrency  (lock  yielding  on  page  fault)  
    •  More  granular  write-­‐lock  (no  more  global  lock!)  
    •  BeOer  query  performance  
    •  BeOer  isola•  and  much,  much  more  

    View Slide

  6. 6
    MongoDB 2.2: Aggregation Framework

    View Slide

  7. 7
    •  Common operations on complex data
    – totaling, averaging, min, max, etc
    – ability to return a subset of array values
    – grouping documents or subdocuments
    – answering questions across subdocuments
    – sorting across subdocuments, etc.
    •  Currently hard in MongoDB
    – Map/Reduce jobs (JavaScript, slow, hard)
    – Handle in application code

    View Slide

  8. 8
    •  Our new aggregation framework
    – Declarative framework
    •  No JavaScript required
    – Describe a chain of operations to apply
    – Expression evaluation
    •  Return computed values
    – C++ implementation
    •  Higher performance than JavaScript

    View Slide

  9. 9

    View Slide

  10. 10
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Map
    function
    emit
    Reduce
    function
    reduce
    Result
    collection

    View Slide

  11. 11
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Doc1
    Pipeline
    operator
    Pipeline
    operator
    Result
    set
    Pipeline
    operator
    Pipeline
    operator
    Pipeline
    operator

    View Slide

  12. 12
    •  Aggregation requests specify a pipeline
    •  A pipeline is a series of operations
    •  Conceptually, the members of a collection
    are passed through a pipeline to produce
    a result
    – Similar to a command-line pipe

    View Slide

  13. 13
    Pipeline Operators
    $match, $project,
    $unwind
    Pipeline Operators
    $group
    Pipeline Operators
    $match, $project,
    $sort, $limit, ...

    View Slide

  14. 14
    db.collection.aggregate(
    [ {$match: … },
    {$group: … },
    {$limit: …}, etc
    ]

    View Slide

  15. 15
    •  $match
    – Uses a query predicate (like .find({…})) as a
    filter
    { $match : { author : "dave" } }
    { $match : { score : { $gt : 50, $lte : 90 } } }

    View Slide

  16. 16
    •  $project
    – Uses a sample document to determine the
    shape of the result (similar to .find()’s optional
    argument)
    •  Include or exclude fields
    •  Computed fields
    –  Arithmetic expressions, including built-in functions
    –  Pull fields from nested documents to the top
    –  Push fields from the top down into new virtual documents

    View Slide

  17. 17
    •  $unwind
    – Hands out array elements one at a time
    { $unwind : {“$myarray” } }
    •  $unwind “streams” arrays
    – Array values are doled out one at time in the
    context of their surrounding document
    – Makes it possible to filter out elements before
    returning

    View Slide

  18. 18
    •  $group
    – Aggregates items into buckets defined by a
    key

    View Slide

  19. 19
    •  $group aggregation expressions
    – Define a grouping key as the _id of the result
    – Total grouped column values: $sum
    – Average grouped column values: $avg
    – Collect grouped column values in an array or
    set: $push, $addToSet
    – Other functions
    •  $min, $max, $first, $last

    View Slide

  20. 20
    •  $sort
    – Sort documents
    – Sort specifications are the same as today,
    e.g., $sort:{ key1: 1, key2: -1, …}
    { $sort : {“total”:-1} }

    View Slide

  21. 21
    •  $limit
    – Only allow the specified number of documents
    to pass
    { $limit : 20 }

    View Slide

  22. 22
    •  $skip
    – Skip over the specified number of documents
    { $skip : 10 }

    View Slide

  23. 23
    •  Available in $project operations
    •  Prefix expression language
    – Add two fields: $add:[“$field1”, “$field2”]
    – Provide a value for a missing field: $ifNull:
    [“$field1”, “$field2”]
    – Nesting: $add:[“$field1”, $ifNull:[“$field2”,
    “$field3”]]
    – Other functions….
    •  And we can easily add more as required

    View Slide

  24. 24
    •  String functions
    – toUpper, toLower, substr
    •  Date field extraction
    – Get year, month, day, hour, etc, from ISODate
    •  Date arithmetic
    •  Null value substitution (like MySQL ifnull(),
    Oracle nvl())
    •  Ternary conditional
    – Return one of two values based on a
    predicate

    View Slide

  25. 25
    db.scores.aggregate(  [  
    ...                    {  "$project"  :  {              "newGrade"  :  
    ...                              {    $cond  :      [    {  "$gt"  :  [  "$score",  90    ]  },    "A",  
    ...                                        {  $cond:    [  {  $gt  :  ["$score",80]  }  ,"B"  ,  
    ...                                              {  "$cond":    [    {  "$gt"  :  [  "$score",  70    ]  },    "C",  
    ...                                                        {  $cond:  [{$gt:["$score",60]  }  ,  "D",  "F"]  }  
    ...                                              ]    }  
    ...                                        ]  }  
    ...                              ]  }  
    ...                    }  },  
    ...                  {  $group:  {  _id:"$newGrade",  "total":{$sum:1}  }  },  
    ...                  {$sort:{"_id":1}}  
    ...  ]  )  

    View Slide

  26. 26
    •  Use $match in a pipeline as early as
    possible
    – The query optimizer can then choose to scan
    an index and avoid scanning the entire
    collection
    •  Use $sort in a pipeline as early as possible
    – The query optimizer can then be used to
    choose an index to scan instead of sorting the
    result

    View Slide

  27. 27
    •  Initial version is a command
    – For any language, build a JSON database
    object, and execute the command
    •  In the shell: db.runCommand({ aggregate :
    , pipeline : {…} });
    – Beware of command result size limit
    •  Document size limit is 16MB

    View Slide

  28. 28
    •  Initial release will support sharding
    •  Mongos analyzes pipeline, and forwards
    operations up to $group or $sort to shards;
    combines shard server results and returns
    them

    View Slide

  29. 29
    •  final bug fixes now
    – available to play with in dev version 2.2-rc0
    •  Expect to see this in production soon:
    – 2.2 GA

    View Slide

  30. 30
    •  More optimizations
    •  $out pipeline operation
    – Saves the document stream to a collection
    – Similar to M/R $out, with sharded output
    – Functions like a tee, so that intermediate
    results can be saved

    View Slide

  31. 31
    MongoDB San Diego:
    Enjoy your evening!

    View Slide