$30 off During Our Annual Pro Sale. View Details »

Aggregation Framework Performance Pitfalls

Naomi Pentrel
November 06, 2018

Aggregation Framework Performance Pitfalls

The aggregation framework is incredibly powerful. But with great power comes great responsibility. For application developers, this means ensuring that queries are performant. In this talk, I will share with you some key tips to avoid the most common aggregation framework performance pitfalls in your own deployments.

Naomi Pentrel

November 06, 2018
Tweet

More Decks by Naomi Pentrel

Other Decks in Technology

Transcript

  1. By Naomi Pentrel
    Aggregation
    Framework:
    Performance Pitfalls
    30min in the life of a Support Engineer
    @naomi_pen
    ➜ db.collection.aggregate(
    [ { }, ... ]
    )
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}

    View Slide

  2. @naomi_pen
    A Reminder
    Automatic Optimizations
    The Customer
    Support Tickets
    5 Tips (#4 will SHOCK you)
    3
    4
    5
    1
    2

    View Slide

  3. @naomi_pen
    MongoDB's Aggregation Framework
    allows you to manipulate and
    transform data in ways that are not
    possible with normal queries.

    View Slide

  4. Aggregation Framework: A Reminder
    Aggregation Pipeline: In Theory
    @naomi_pen
    # Pipeline Stages (think unix pipes):
    ➜ db.collection.aggregate( [ { }, ... ] )
    # Operator expressions:
    ➜ { : [ , ... ] }
    ➜ { : }
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}
    {…}

    View Slide

  5. Aggregation Framework: A Reminder
    Example: Grouping Orders
    @naomi_pen
    ➜ db.orders.aggregate( [
    { $match: { status: "A"} },
    { $group: {
    _id: "$cust_id",
    total: { $sum: "$amount" }
    }
    },
    ] )
    pipeline specification
    Operator

    View Slide

  6. Aggregation Framework: A Reminder
    Example: Grouping Orders
    @naomi_pen
    Stages

    View Slide

  7. Aggregation Framework: A Reminder
    Pipeline Stages
    @naomi_pen
    $addFields

    $bucket

    $bucketAuto

    $collStats

    $count

    $currentOp

    $facet

    $geoNear

    $graphLookup

    $group

    $indexStats

    $limit

    $listLocalSessions

    $listSessions

    $lookup

    $match

    $out

    $project

    $redact

    $replaceRoot

    $sample

    $skip

    $sort

    $sortByCount 

    $unwind

    View Slide

  8. Aggregation Framework: A Reminder
    Pipeline Operators
    Arithmetic
    Expression
    Operators
    $abs
    $add
    $ceil
    $divide
    $exp
    $floor
    $ln
    $log
    $log10
    $mod
    $multiply
    $pow
    $sqrt
    $subtract
    $trunc

    Array
    Expression
    Operators
    $arrayElemAt
    $arrayToObject
    $concatArrays
    $filter
    $in
    $indexOfArray
    $isArray
    $map
    $objectToArray
    $range
    $reduce
    $reverseArray
    $size
    $slice
    $zip
    Boolean
    Expression
    Operators
    $and
    $not
    $or
    Comparison
    Expression
    Operators
    $cmp
    $eq
    $gt
    $gte
    $lt
    $lte
    $ne

    Conditional
    Expression
    Operators
    $cond
    $ifNull
    $switch
    Date Expression
    Operators
    $dateFromParts
    $dateFromString
    $dateToParts
    $dateToString
    $dayOfMonth
    $dayOfWeek
    $dayOfYear
    $hour
    $isoDayOfWeek
    $isoWeek
    $isoWeekYear
    $millisecond
    $minute
    $month
    $second
    $toDate
    $week
    $year
    $add
    $subtract
    Literal
    Expression
    Operator
    $literal
    Object
    Expression
    Operators
    $mergeObjects
    $objectToArray
    Set Expression
    Operators
    $allElementsTru
    e
    $anyElementTru
    e
    $setDifference
    $setEquals
    $setIntersection
    $setIsSubset
    $setUnion

    String
    Expression
    Operators
    $concat
    $dateFromString
    $dateToString
    $indexOfBytes
    $indexOfCP
    $ltrim
    $rtrim
    $split
    $strLenBytes
    $strLenCP
    $strcasecmp
    $substrBytes
    $substrCP
    $toLower
    $toString
    $trim
    $toUpper
    Text Expression
    Operator
    $meta
    Type Expression
    Operators
    $convert
    $toBool
    $toDate
    $toDecimal
    $toDouble
    $toInt
    $toLong
    $toObjectId
    $toString
    $type
    Accumulators
    ($group)
    $addToSet
    $avg
    $first
    $last
    $max
    $mergeObjects
    $min
    $push
    $stdDevPop
    $stdDevSamp
    $sum
    Accumulators
    ($project)
    $avg
    $max
    $min
    $stdDevPop
    $stdDevSamp
    $sum
    Variable
    Expression
    Operators
    $let
    @naomi_pen
    A bit much? Here’s the reference:
    bit.ly/MongoDBAR

    View Slide

  9. Aggregation Framework: A Reminder
    Explain…?
    @naomi_pen
    ➜ db.sales.aggregate(
    [
    {
    $group: {
    _id: "$date", revenue: {$sum: “$amount"}
    }
    }
    ],
    {explain: true}
    )

    View Slide

  10. Aggregation Framework: A Reminder
    Explain…?
    @naomi_pen
    ➜ db.sales.explain("executionStats").aggregate(
    [
    {
    $group: {
    _id: “$date", revenue: {$sum: "$amount"}
    }
    }
    ]
    )

    View Slide

  11. @naomi_pen
    1 Aggregations require more RAM and CPU
    Truths about Aggregations
    1
    2
    3
    The amount of data to process is large
    The operation will be run frequently
    The operation is intended to service a real-time app feature
    2 Each stage has a manipulated copy of the data
    3 May use lots of memory; allocated outside WT
    4 Aggregations need consideration, especially if:

    View Slide

  12. @naomi_pen
    A Reminder
    Automatic Optimizations
    The Customer
    Support Tickets
    5 Tips (#4 will SHOCK you)
    3
    4
    5
    1
    2

    View Slide

  13. Automatic Optimizations
    The Optimizer will transform your queries
    @naomi_pen
    • $project or $addFields + $match Sequence Optimization
    • $sort + $match Sequence Optimization
    • $redact + $match Sequence Optimization
    • $project + $skip Sequence Optimization
    • $sort + $limit Coalescence
    • $limit + $limit Coalescence
    • $skip + $skip Coalescence
    • $match + $match Coalescence
    • $lookup + $unwind Coalescence
    • $sort + $skip + $limit Sequence

    View Slide

  14. @naomi_pen
    A Reminder
    Automatic Optimizations
    The Customer
    Learnings
    5 Tips (#4 will SHOCK you)
    4
    5
    1
    3
    2

    View Slide

  15. The Customer
    Meet Robin
    @naomi_pen
    Robin is a Software Engineer at Petflix
    They love MongoDB for its ease of use
    Robin made the decision to move Petflix to
    MongoDB

    View Slide

  16. The Customer
    @naomi_pen

    View Slide

  17. @naomi_pen
    {
    _id: "tt0944947",
    title: "Game of Throws",
    genres: [
    "Drama", "Adventure", "Fantasy"
    ],
    director: "Jeremy Doggo",
    rating: 8.6,
    description: "Nine noble dog families
    fight over throws, (…)",
    actors: [
    "Emilia Samoyed", "Peter Terrier", …
    ],
    year: "2011"

    }
    {
    _id: {
    user: "naomi",
    time_started: "1541364977160",
    },
    movie_id: "tt0944947",
    location: "Germany"
    }
    {
    _id: "naomi",
    email: "[email protected]",

    }
    Videos
    Views
    Users
    The Customer: Data model

    View Slide

  18. @naomi_pen
    A Reminder
    Automatic Optimizations
    The Customer
    Support Tickets
    5 Tips (#4 will SHOCK you)
    5
    1
    2
    4
    3

    View Slide

  19. Support Ticket 1
    @naomi_pen
    CPU Spikes with Query
    Hi there,
    We have an issue with one of our queries. We count movies for two genres,
    and list titles for each genre in our new UI. But it’s slow and causes spikes:
    db.movies.aggregate([
    { $unwind: "$genres" },
    { $match: { "genres": { $in: ["Fantasy", "Adventure"] } } },
    { $group: {"_id": "$genres", "count": {$sum: 1}, "titles": {$push: "$title"}}}
    ])
    Thanks,
    Robin

    Robin from Petflix 1 minute ago

    View Slide

  20. Support Ticket 1
    Investigate the query
    @naomi_pen
    ➜ db.movies.aggregate([
    { $unwind: "$genres" },
    { $match: {
    "genres": {
    $in: ["Fantasy", "Adventure"] }
    } },
    { $group: {
    "_id": "$genres",
    "count":{$sum: 1},
    "titles": {$push: "$title"}
    } }
    ])

    View Slide

  21. Support Ticket 1
    Investigate the logs*
    @naomi_pen
    2018-11-04T15:11:59.670-0800 I COMMAND [conn2] command petflix.movies
    appName: "MongoDB Shell" command: aggregate { aggregate: "movies",
    pipeline: [ { $unwind: "$genres" }, { $match: { genres: { $in: [ "Fantasy",
    "Adventure" ] } } }, { $group: { _id: "$genres", count: { $sum: 1.0 }, titles:
    { $push: "$title" } } } ], cursor: {}, lsid: { id: UUID("610579f8-
    c68e-44f7-881d-2ddc407e5594") }, $db: "petflix" } planSummary: COLLSCAN
    keysExamined:0 docsExamined:49997 cursorExhausted:1 numYields:392
    nreturned:2 reslen:813 locks:{ Global: { acquireCount: { r: 397 } }, Database:
    { acquireCount: { r: 397 } }, Collection: { acquireCount: { r: 397 } } }
    protocol:op_msg 121ms
    *Queries only appear in logs if they run for 100+ ms. Queries not in the logs can still cause load!

    View Slide

  22. Support Ticket 1
    Investigate the query
    @naomi_pen
    ➜ db.movies.aggregate([
    { $unwind: "$genres" },
    { $match: {
    "genres": {
    $in: ["Fantasy", "Adventure"] }
    } },
    { $group: {
    "_id": "$genres",
    "count":{$sum: 1},
    "titles": {$push: "$title"}
    } }
    ])

    View Slide

  23. Support Ticket 1
    { $unwind: "$genres" }
    @naomi_pen
    {
    _id: "tt0944947",
    title: "Game of Throws",
    genres: [
    "Drama", "Adventure",
    "Fantasy"
    ],
    director: "Jeremy Doggo",
    rating: 8.6,
    description: "Nine noble dog
    families fight over throws, (…)",
    actors: [
    "Emilia Samoyed", "Peter
    Terrier", …
    ],
    year: "2011"

    }
    {
    _id: "tt0944947",
    title: "Game of Throws",
    genres: "Drama",
    director: "Jeremy Doggo",
    rating: 8.6,
    description: "Nine noble dog
    families fight over throws, (…)",
    actors: [
    "Emilia Samoyed", "Peter
    Terrier", …
    ],
    year: "2011"

    }
    {
    _id: "tt0944947",
    title: "Game of Throws",
    genres: "Adventure",
    director: "Jeremy Doggo",
    rating: 8.6,
    description: "Nine noble dog
    families fight over throws, (…)",
    actors: [
    "Emilia Samoyed", "Peter
    Terrier", …
    ],
    year: "2011"

    }
    {
    _id: "tt0944947",
    title: "Game of Throws",
    genres: "Fantasy",
    director: "Jeremy Doggo",
    rating: 8.6,
    description: "Nine noble dog
    families fight over throws, (…)",
    actors: [
    "Emilia Samoyed", "Peter
    Terrier", …
    ],
    year: "2011"

    }
    {
    $match: {
    "genres": {
    $in: ["Fantasy", "Adventure"]
    }
    }

    View Slide

  24. Support Ticket 1
    { $unwind: "$genres" }
    @naomi_pen
    {
    _id: "tt1032765",
    title: "Catmanji",
    genres: [
    "Comedy", "Family",
    ],
    director: "Kit Cat",
    rating: 7.9,
    description: "Four cats play a
    game that gets way too real (…)",
    actors: [
    "Cat McCatface", "Kitty Fish"…
    ],
    year: "2018"

    }
    {
    _id: "tt1032765",
    title: "Catmanji",
    genres: "Comedy",
    director: "Kit Cat",
    rating: 7.9,
    description: "Four cats play a
    game that gets way too real (…)",
    actors: [
    "Cat McCatface", "Kitty Fish"…
    ],
    year: "2018"

    }
    {
    _id: "tt1032765",
    title: "Catmanji",
    genres: "Family",
    director: "Kit Cat",
    rating: 7.9,
    description: "Four cats play a
    game that gets way too real (…)",
    actors: [
    "Cat McCatface", "Kitty Fish"…
    ],
    year: "2018"

    }
    {
    $match: {
    "genres": {
    $in: [
    "Fantasy",
    “Adventure”
    ]
    }
    }
    }

    View Slide

  25. Support Ticket 1
    Solution: Duplicate the $match stage & ensure index
    @naomi_pen
    ➜ db.movies.aggregate([
    { $match: {
    "genres": { $in: ["Fantasy", "Adventure"] }
    } },
    { $unwind: "$genres" },
    { $match: {
    "genres": { $in: ["Fantasy", "Adventure"] }
    } }
    { $group: { "_id": "$genres", "count":{$sum: 1}, "titles": {$push: "$title"} } }
    ])

    View Slide

  26. Support Ticket 1
    @naomi_pen
    Re: CPU Spikes with Query
    We’re happy to help. To improve your query’s performance you need to limit input as much as
    possible. In your query the match after the unwind discards all documents that aren't in those
    genres. If you add an additional match before the unwind, then documents that don’t have those
    genres are never unwound, because they never enter the pipeline.
    db.movies.aggregate([
    { $match: {"genres": {$in: ["Fantasy", "Family"]}}}
    { $unwind: "genres"},
    { $match: {"genres": {$in: ["Fantasy", "Family"]}}}
    { "$group": {"_id": "$genres", "count": {"$sum": 1}, "titles": {$push: "$title"} } ] )
    As a bonus, the initial match stage can also make use of existing indexes, so ensure you have
    an index on genres. Does this help?
    Cheers, Naomi
    Naomi from MongoDB

    View Slide

  27. Support Ticket 1
    Now in the logs*
    @naomi_pen
    2018-11-04T15:14:51.491-0800 I COMMAND [conn2] command petflix.movies
    appName: "MongoDB Shell" command: aggregate { aggregate: "movies",
    pipeline: [ { $match: { genres: { $in: [ "Fantasy", "Adventure" ] } } }, { $unwind:
    "$genres" }, { $match: { genres: { $in: [ "Fantasy", "Adventure" ] } } }, { $group:
    { _id: "$genres", count: { $sum: 1.0 }, titles: { $push: "$title" } } } ], cursor: {}, lsid:
    { id: UUID("610579f8-c68e-44f7-881d-2ddc407e5594") }, $db: "petflix" }
    planSummary: IXSCAN { genres: 1 } keysExamined:28 docsExamined:18
    cursorExhausted:1 numYields:0 nreturned:2 reslen:813 locks:{ Global:
    { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 2 } }, Collection:
    { acquireCount: { r: 2 } } } protocol:op_msg 1ms
    *Logs by default only log queries than run longer than 100ms.

    View Slide

  28. Support Ticket 2
    @naomi_pen
    New Join Query is Slow
    Hi team,
    We're again experiencing performance issues. Can you look into this?
    db.movies.aggregate([
    { $match: {"genres": "Drama"} },
    { $lookup: { from : "views",
    localField: "_id",
    foreignField: "movie_id",
    as: "viewsData" } },
    { $project: { viewsData: 1, title: 1} },
    { $sort: { "viewsData.time_started": -1}} ])
    Cheers, Robin
    Robin from Petflix 3 minute ago

    View Slide

  29. Support Ticket 2
    Investigate the query
    @naomi_pen
    ➜ db.movies.aggregate([
    { $match: {"genres": "Drama"} },
    { $lookup: { from : "views",
    localField: "_id",
    foreignField: "movie_id",
    as: "viewsData" } },
    { $project: { viewsData: 1, title: 1} },
    { $sort: { "viewsData.time_started": -1} }
    ])

    View Slide

  30. Support Ticket 2
    Investigate the logs*
    @naomi_pen
    2018-11-04T16:35:00.235-0800 I COMMAND [conn4] command petflix.views
    appName: "MongoDB Shell" command: aggregate { aggregate: "movies",
    pipeline: [ { $match: { genres: { $in: [ "Drama" ] } } }, { $lookup: { from: "views",
    localField: "_id", foreignField: "movie_id", as: "viewsData" } }, { $project:
    { viewsData: 1.0, title: 1.0 } }, { $sort: { viewsData.time_started: -1.0 } } ], cursor:
    {}, lsid: { id: UUID("617b7851-4985-47e8-a647-991173e60667") }, $db:
    "petflix" } planSummary: IXSCAN { genres: 1 } cursorid:8208857423668239231
    keysExamined:24997 docsExamined:24997 hasSortStage:1 numYields:200310
    nreturned:51 reslen:15709 locks:{ Global: { acquireCount: { r: 250310 } },
    Database: { acquireCount: { r: 250310 } }, Collection: { acquireCount: { r:
    250309 } } } protocol:op_msg 11172ms
    *Queries only appear in logs if they run for 100+ ms. Queries not in the logs can still cause load!

    View Slide

  31. Support Ticket 2
    Investigate the query & indexes
    @naomi_pen
    ➜ db.movies.aggregate([
    { $match: {"genres": "Drama"} },
    { $lookup: { from : "views",
    localField: "_id",
    foreignField: "movie_id",
    as: "viewsData" } },
    { $project: { viewsData: 1, title: 1} },
    { $sort: { "viewsData.time_started": -1} }
    ])
    ➜ db.movies.getIndexes()
    # result: _id, genres
    ➜ db.views.getIndexes()
    # result: _id

    View Slide

  32. Support Ticket 2
    Solution: Index views.movie_id, optionally add limit
    @naomi_pen
    ➜ db.movies.aggregate([
    { $match: {"genres": { $in : ["Drama"] }} },
    { $lookup: { from : "views",
    localField: "_id",
    foreignField: "movie_id",
    as: "viewsData" } },
    { $sort: { "viewsData.time_started": -1}},
    { $limit: 50 }
    ])

    View Slide

  33. Support Ticket 2
    @naomi_pen
    Re: New Join Query is Slow
    Hi Robin,
    Of course! We looked into the query and have a few suggestions:
    1. The foreignField specified in the $lookup should be indexed. Please add an index on
    views.movie_id. This should boost performance quite a bit.
    2. If possible you could also add a $limit stage to improve the performance of the sort.
    Even with these changes, this query may still be slow. If this is a common query pattern,
    we would suggest considering changes to your data model. Does this help?

    Best,

    Naomi
    Naomi from MongoDB

    View Slide

  34. Support Ticket 2
    Now in the logs
    @naomi_pen
    2018-11-05T03:43:22.109-0800 I COMMAND [conn4] command petflix.views
    appName: "MongoDB Shell" command: aggregate { aggregate: "movies",
    pipeline: [ { $match: { genres: { $in: [ "Drama" ] } } }, { $lookup: { from: "views",
    localField: "_id", foreignField: "movie_id", as: "viewsData" } }, { $sort:
    { viewsData.time_started: -1.0 } }, { $limit: 50.0 } ], cursor: {}, lsid: { id:
    UUID("617b7851-4985-47e8-a647-991173e60667") }, $db: "petflix" }
    planSummary: IXSCAN { genres: 1 } keysExamined:24997 docsExamined:
    24997 hasSortStage:1 cursorExhausted:1 numYields:208 nreturned:51 reslen:
    9100 locks:{ Global: { acquireCount: { r: 50206 } }, Database: { acquireCount: { r:
    50206 } }, Collection: { acquireCount: { r: 50205 } } } protocol:op_msg 1245ms

    View Slide

  35. Support Ticket 3
    @naomi_pen
    Help with Metrics
    Hello old friend,
    Thanks so much for all your help so far. I think I got most of the aggregation
    framework performance pitfalls figured out by now…
    Unfortunately our Marketing team is pushing us hard to deliver metrics around
    how many movies are streamed per country at any given time. We tried to run the
    queries but I think if we run this query every few seconds it will bring down our
    deployment. Is there any magic solution?
    Thanks,
    Robin
    Robin from Petflix 5 minute ago

    View Slide

  36. Support Ticket 3
    @naomi_pen
    Re: Help with Metrics
    Hi Robin,
    I understand your concern. We agree that the additional load might be a bit much for
    your deployment. We would suggest you consider using a dedicated analytics node for
    these queries. That way the additional load would not impact your other nodes.
    Let us know if this would suit you. If you have any questions let me know.

    Best,

    Naomi
    Naomi from MongoDB

    View Slide

  37. Support Ticket 4
    @naomi_pen
    URGENT
    HI,
    Our analytics node was going great. So great in fact we decided to use the
    analytics node for a search feature. Now everything is burning. But we don't want
    to completely turn it off. There's only some queries that are really bad. Can we
    somehow limit a queries impact? HELP
    Thx,
    R
    Robin from Petflix 1 minute ago

    View Slide

  38. Support Ticket 4
    @naomi_pen
    Re: URGENT
    Hi Robin,
    You can consider setting maxTimeMS (http://bit.ly/maxTimeMS, http://bit.ly/
    maxTimeMS2) on the queries.

    This setting will protect the database server by ending operations that take longer than
    maxTimeMS. That said, be aware that:

    • In some cases, partial results are returned by terminated operations.
    • maxTimeMS is not a guarantee. In high-queue situations, operations that are queued
    but not active may not be terminated until they become active again.

    Please note that this is a safeguard, not a solution. To help alleviate the load you should
    add indexes for commonly used query patterns (to all nodes or only the analytics node).

    Best, Naomi
    Naomi from MongoDB

    View Slide

  39. Support Ticket 5
    @naomi_pen
    How do I query this…?
    We started tracking views information by deviceIDs in the collection `deviceviews`. Now we're trying to
    find all views where `movie_completed` is false. However, I don't know how to do that since the
    children of `views` are uuids…
    { 

    "_id": ObjectId("5be090ca8dd84581f7120e5f"),
    "deviceId": { user: "Naomi", device: "iPhone", first_seen: "1541364977160" },
    "views": {
    "e5a52daf-f02a-4eca-acaa-690498c8590e": {
    "id": "e5a52daf-f02a-4eca-acaa-690498c8590e",
    "movie_title": "It's a wonderful dog life",
    "movie_completed": false,
    "time_started": "1541443718297"
    },

    } }
    Robin from Petflix 7 minute ago

    View Slide

  40. Support Ticket 5
    Investigate the data model
    @naomi_pen
    { 

    "_id": ObjectId("5be090ca8dd84581f7120e5f"),
    "deviceId": { user: "Naomi", device: "iPhone", first_seen: "1541364977160" },
    "views": {
    "e5a52daf-f02a-4eca-acaa-690498c8590e": {
    "id": "e5a52daf-f02a-4eca-acaa-690498c8590e",
    "movie_title": "It's a wonderful dog life",
    "movie_completed": false,
    "time_started": "1541443718297"
    },

    } }

    View Slide

  41. Support Ticket 5
    @naomi_pen
    Re: How do I query this…?
    You can solve this with $objectToArray (code attached) but this would result in extra calculation load. We
    would suggest restructuring your schema instead:
    {
    "_id": "e5a52daf-f02a-4eca-acaa-690498c8590e",
    "movie_title": "It’s a wonderful dog life",
    "movie_completed": false,
    "time_started": "1541443718297"
    "deviceId": {
    user: "Naomi",
    device: "iPhone",
    first_seen: "1541364977160"
    },
    }
    }

    Naomi from MongoDB

    View Slide

  42. Support Ticket 5
    $project: {
    views: {
    $objectToArray: "$views"
    },
    deviceId: 1
    }
    @naomi_pen
    { 

    "_id": ObjectId("5be090ca8d…"),
    "deviceId": {
    user: "Naomi",
    device: "iPhone",
    first_seen: "1541364977160"
    },
    "views": {
    "e5a52daf-…": {
    "id": "e5a52daf-…",
    "movie_title": "It's a wonderful dog life",
    "movie_completed": false,
    "time_started": "1541443718297"
    },
    "f7d34ced-…": {

    },

    }
    }
    { 

    "_id": ObjectId("5be090ca8d…"),
    "deviceId": {
    user: "Naomi",
    device: "iPhone",
    first_seen: "1541364977160"
    },
    "views" : [
    {
    "k" : "e5a52daf-…",
    "v" : {
    "id" : "e5a52daf-…",
    "movie_title" : "It's a wonderful dog life",
    "movie_completed" : false,
    "time_started" : "1541443718297"
    }
    }, {
    "k" : "f7d34ced-…",
    "v" : {

    }
    }
    ]
    }

    View Slide

  43. Support Ticket 5
    Investigate the query
    @naomi_pen
    ➜ db.deviceviews.aggregate([
    {
    $project: {
    views: { $objectToArray: "$views" },
    deviceId: 1
    }
    }, {
    $unwind: "$views"
    }, {
    $match: { "views.v.movie_completed": false }
    }
    ])

    View Slide

  44. @naomi_pen
    A Reminder
    Automatic Optimizations
    The Customer
    Support Tickets
    5 Tips (#4 will SHOCK you)
    4
    1
    2
    5
    3

    View Slide

  45. @naomi_pen
    1 Indexable stages before unindexable stages
    Start with stages that can use indexes (e.g. $sort). Index the foreignField of a $lookup.
    Aggregated Aggregation Tips
    3
    2
    Start with selective stages. If matching on fields in an array after an $unwind, $match before too.
    $match documents early
    3
    Place $limit close to $sort to allow for more efficient sorting.
    Use $limit
    5
    maxTimeMS can be used as a safeguard.
    Use a dedicated analytics node
    4
    Expensive aggregations that run often may indicate a need for a schema change.
    Consider changing the data model

    View Slide

  46. @naomi_pen
    Thank you!
    Hi folks,
    We’re off for today. Thank you so much for all your work!!
    Have a wonderful day,
    Robin
    Robin from Petflix 1 minute ago

    View Slide

  47. @naomi_pen
    By Naomi Pentrel
    Thank you!
    Find me later with any questions!


    Slides are available at speakerdeck.com/npentrel
    @naomi_pen

    View Slide