Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-time Data Aggregation with MongoDB

Real-time Data Aggregation with MongoDB

Presented October 5, 2013 at PHP South Africa: https://joind.in/talk/view/9465

Presented May 24, 2013 at Symfony Live: Portland: https://joind.in/talk/view/8663

Reveal.js presentation published at: http://jmikola.github.io/slides/mongodb_real_time_aggregation/

Jeremy Mikola

October 05, 2013
Tweet

More Decks by Jeremy Mikola

Other Decks in Programming

Transcript

  1. REAL-TIME DATA AGGREGATION WITH
    Jeremy Mikola
    @jmikola

    View full-size slide

  2. AGENDA
    Available Tools
    Schemas
    Commands
    MapReduce
    Aggregation Framework
    Pipeline, Expressions
    Usage, Sharding
    Looking Ahead

    View full-size slide

  3. SETTING THE STAGE
    We have data stored in MongoDB.
    We need to do ad-hoc reporting,
    grouping, common aggregations, etc.
    What can we use for this?

    View full-size slide

  4. DATA WAREHOUSING
    SQL for reporting and analytics
    Infrastructure complications
    Additional maintenance
    Data duplication
    ETL processes
    Real time?

    View full-size slide

  5. SCHEMA-BASED
    Documents hold pre-aggregated data
    Processing at time of insert/update
    Optimized for real-time queries
    In-place updates avoid fragmentation
    by
    MongoDB for Analytics John Nunemaker

    View full-size slide

  6. SINGLE-PURPOSE
    COMMANDS
    count
    distinct
    group

    View full-size slide

  7. Count
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , b
    : 0 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , b
    : 1 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , b
    : 4 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 2
    , b
    : 2 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    c
    o
    u
    n
    t
    (
    )
    ; /
    / R
    e
    t
    u
    r
    n
    s 4
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    c
    o
    u
    n
    t
    (
    { a
    : 1 }
    )
    ; /
    / R
    e
    t
    u
    r
    n
    s 3

    View full-size slide

  8. Distinct
    d
    b
    .
    o
    r
    d
    e
    r
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { c
    u
    s
    t
    o
    m
    e
    r
    : "
    A
    1
    2
    3
    "
    , a
    m
    o
    u
    n
    t
    : 5
    0
    0
    , s
    t
    a
    t
    u
    s
    : "
    A
    " }
    )
    ;
    d
    b
    .
    o
    r
    d
    e
    r
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { c
    u
    s
    t
    o
    m
    e
    r
    : "
    A
    1
    2
    3
    "
    , a
    m
    o
    u
    n
    t
    : 2
    5
    0
    , s
    t
    a
    t
    u
    s
    : "
    A
    " }
    )
    ;
    d
    b
    .
    o
    r
    d
    e
    r
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { c
    u
    s
    t
    o
    m
    e
    r
    : "
    B
    2
    1
    2
    "
    , a
    m
    o
    u
    n
    t
    : 2
    0
    0
    , s
    t
    a
    t
    u
    s
    : "
    A
    " }
    )
    ;
    d
    b
    .
    o
    r
    d
    e
    r
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { c
    u
    s
    t
    o
    m
    e
    r
    : "
    A
    1
    2
    3
    "
    , a
    m
    o
    u
    n
    t
    : 3
    0
    0
    , s
    t
    a
    t
    u
    s
    : "
    D
    " }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    d
    i
    s
    t
    i
    n
    c
    t
    (
    )
    ; /
    / R
    e
    t
    u
    r
    n
    s [
    "
    A
    1
    2
    3
    "
    , "
    B
    2
    1
    2
    "
    ]

    View full-size slide

  9. Group
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , c
    o
    u
    n
    t
    : 4 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , c
    o
    u
    n
    t
    : 2 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , c
    o
    u
    n
    t
    : 4 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 2
    , c
    o
    u
    n
    t
    : 3 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 2
    , c
    o
    u
    n
    t
    : 1 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 1
    , c
    o
    u
    n
    t
    : 5 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    i
    n
    s
    e
    r
    t
    (
    { a
    : 4
    , c
    o
    u
    n
    t
    : 4 }
    )
    ;
    d
    b
    .
    r
    e
    c
    o
    r
    d
    s
    .
    g
    r
    o
    u
    p
    (
    {
    k
    e
    y
    : { a
    : 1 }
    ,
    c
    o
    n
    d
    : { a
    : { $
    l
    t
    : 3 } }
    ,
    r
    e
    d
    u
    c
    e
    : f
    u
    n
    c
    t
    i
    o
    n
    (
    c
    u
    r
    , r
    e
    s
    u
    l
    t
    ) { r
    e
    s
    u
    l
    t
    .
    c
    o
    u
    n
    t +
    = c
    u
    r
    .
    c
    o
    u
    n
    t }
    ,
    i
    n
    i
    t
    i
    a
    l
    : { c
    o
    u
    n
    t
    : 0 }
    }
    )
    ;
    /
    / R
    e
    t
    u
    r
    n
    s [ { a
    : 1
    , c
    o
    u
    n
    t
    : 1
    5 }
    , { a
    : 2
    , c
    o
    u
    n
    t
    : 4 } ]

    View full-size slide

  10. MAPREDUCE
    Extremely versatile, powerful
    Intended for complex data analysis
    Overkill for simple aggregation tasks
    Averages
    Summation
    Grouping

    View full-size slide

  11. MapReduce in MongoDB
    Implemented with JavaScript
    Convenient, but difficult to debug
    for batches
    Better concurrency with V8
    Incremental map-reduce

    View full-size slide

  12. MapReduce in Action: MongoQP
    MongoDB includes a
    Use map-reduce for query log analysis
    Parse BSON query objects with JavaScript
    Group and aggregate similar queries
    database profiler

    View full-size slide

  13. Profile Data
    {
    "
    o
    p
    " : "
    q
    u
    e
    r
    y
    "
    ,
    "
    n
    s
    " : "
    d
    b
    .
    c
    o
    l
    l
    e
    c
    t
    i
    o
    n
    "
    ,
    "
    q
    u
    e
    r
    y
    " : { … }
    ,
    "
    n
    t
    o
    r
    e
    t
    u
    r
    n
    " : 0
    ,
    "
    n
    t
    o
    s
    k
    i
    p
    " : 0
    ,
    "
    n
    s
    c
    a
    n
    n
    e
    d
    " : 1
    1
    4
    2
    6
    ,
    "
    l
    o
    c
    k
    S
    t
    a
    t
    s
    " : { … }
    ,
    "
    n
    r
    e
    t
    u
    r
    n
    e
    d
    " : 0
    ,
    "
    r
    e
    s
    p
    o
    n
    s
    e
    L
    e
    n
    g
    t
    h
    " : 2
    0
    ,
    "
    m
    i
    l
    l
    i
    s
    " : 1
    2
    ,
    "
    t
    s
    " : I
    S
    O
    D
    a
    t
    e
    (
    "
    2
    0
    1
    3
    -
    0
    5
    -
    2
    3
    T
    2
    1
    :
    2
    4
    :
    3
    9
    .
    3
    2
    7
    Z
    "
    )
    ,
    }

    View full-size slide

  14. And now for something completely different…

    View full-size slide

  15. AGGREGATION
    FRAMEWORK
    Declared with BSON, like queries
    Executes in C++, not JavaScript
    Flexible, functional, and simple
    Operation pipeline
    Computational expressions

    View full-size slide

  16. PIPELINE
    Process a stream of documents
    Original input is a collection
    Final output is a result document
    Series of operators
    Filter or transform data
    Input/output chain
    p
    s a
    x | g
    r
    e
    p m
    o
    n
    g
    o
    d | h
    e
    a
    d -
    n 1

    View full-size slide

  17. PIPELINE OPERATORS
    $
    m
    a
    t
    c
    h
    $
    p
    r
    o
    j
    e
    c
    t
    $
    g
    r
    o
    u
    p
    $
    u
    n
    w
    i
    n
    d
    $
    s
    o
    r
    t
    $
    l
    i
    m
    i
    t
    $
    s
    k
    i
    p
    $
    g
    e
    o
    N
    e
    a
    r

    View full-size slide

  18. OUR EXAMPLE DATA
    Library Books
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    I
    S
    B
    N
    : "
    9
    7
    8
    1
    8
    5
    7
    1
    5
    0
    1
    9
    3
    "
    ,
    a
    v
    a
    i
    l
    a
    b
    l
    e
    : t
    r
    u
    e
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    c
    h
    a
    p
    t
    e
    r
    s
    : 9
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : [
    "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    ,
    "
    N
    e
    w Y
    o
    r
    k
    "
    ,
    "
    1
    9
    2
    0
    s
    "
    ]
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    ,
    p
    u
    b
    l
    i
    s
    h
    e
    r
    : {
    c
    i
    t
    y
    : "
    L
    o
    n
    d
    o
    n
    "
    ,
    n
    a
    m
    e
    : "
    R
    a
    n
    d
    o
    m H
    o
    u
    s
    e
    "
    }
    }

    View full-size slide

  19. $MATCH
    Filter documents
    Uses existing query syntax

    View full-size slide

  20. $MATCH
    Matching field values
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    m
    a
    t
    c
    h
    : {
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }

    View full-size slide

  21. $MATCH
    Matching with query operators
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    m
    a
    t
    c
    h
    : {
    p
    a
    g
    e
    s
    : { $
    g
    t
    : 1
    0
    0
    0 }
    }
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }

    View full-size slide

  22. $PROJECT
    Reshape documents
    Include, exclude or rename fields
    Inject computed fields
    Manipulate sub-document fields

    View full-size slide

  23. $PROJECT
    Including and excluding fields
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    I
    S
    B
    N
    : "
    9
    7
    8
    1
    8
    5
    7
    1
    5
    0
    1
    9
    3
    "
    ,
    a
    v
    a
    i
    l
    a
    b
    l
    e
    : t
    r
    u
    e
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    c
    h
    a
    p
    t
    e
    r
    s
    : 9
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : [
    "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    ,
    "
    N
    e
    w Y
    o
    r
    k
    "
    ,
    "
    1
    9
    2
    0
    s
    "
    ]
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    p
    r
    o
    j
    e
    c
    t
    : {
    _
    i
    d
    : 0
    ,
    t
    i
    t
    l
    e
    : 1
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : 1
    }
    }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }

    View full-size slide

  24. $PROJECT
    Renaming and computing fields
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    I
    S
    B
    N
    : "
    9
    7
    8
    1
    8
    5
    7
    1
    5
    0
    1
    9
    3
    "
    ,
    a
    v
    a
    i
    l
    a
    b
    l
    e
    : t
    r
    u
    e
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    c
    h
    a
    p
    t
    e
    r
    s
    : 9
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : [
    "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    ,
    "
    N
    e
    w Y
    o
    r
    k
    "
    ,
    "
    1
    9
    2
    0
    s
    "
    ]
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    p
    r
    o
    j
    e
    c
    t
    : {
    a
    v
    g
    P
    a
    g
    e
    s
    P
    e
    r
    C
    h
    a
    p
    t
    e
    r
    : {
    $
    d
    i
    v
    i
    d
    e
    : [
    "
    $
    p
    a
    g
    e
    s
    "
    ,
    "
    $
    c
    h
    a
    p
    t
    e
    r
    s
    "
    ]
    }
    ,
    l
    a
    n
    g
    : "
    $
    l
    a
    n
    g
    u
    a
    g
    e
    "
    }
    }
    { _
    i
    d
    : 3
    7
    5
    ,
    a
    v
    g
    P
    a
    g
    e
    s
    P
    e
    r
    C
    h
    a
    p
    t
    e
    r
    : 2
    4
    .
    2
    2
    2
    2
    2
    2
    2
    2
    2
    2
    ,
    l
    a
    n
    g
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }

    View full-size slide

  25. $PROJECT
    Creating and extracting sub-document fields
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    I
    S
    B
    N
    : "
    9
    7
    8
    1
    8
    5
    7
    1
    5
    0
    1
    9
    3
    "
    ,
    a
    v
    a
    i
    l
    a
    b
    l
    e
    : t
    r
    u
    e
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    c
    h
    a
    p
    t
    e
    r
    s
    : 9
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : [
    "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    ,
    "
    N
    e
    w Y
    o
    r
    k
    "
    ,
    "
    1
    9
    2
    0
    s
    "
    ]
    ,
    p
    u
    b
    l
    i
    s
    h
    e
    r
    : {
    c
    i
    t
    y
    : "
    L
    o
    n
    d
    o
    n
    "
    ,
    n
    a
    m
    e
    : "
    R
    a
    n
    d
    o
    m H
    o
    u
    s
    e
    "
    }
    }


    { $
    p
    r
    o
    j
    e
    c
    t
    : {
    t
    i
    t
    l
    e
    : 1
    ,
    s
    t
    a
    t
    s
    : {
    p
    a
    g
    e
    s
    : "
    $
    p
    a
    g
    e
    s
    "
    ,
    c
    h
    a
    p
    t
    e
    r
    s
    : "
    $
    c
    h
    a
    p
    t
    e
    r
    s
    "
    ,
    }
    ,
    p
    u
    b
    _
    c
    i
    t
    y
    : "
    $
    p
    u
    b
    l
    i
    s
    h
    e
    r
    .
    c
    i
    t
    y
    "
    }
    }
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    s
    t
    a
    t
    s
    : {
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    ,
    p
    u
    b
    _
    c
    i
    t
    y
    : "
    L
    o
    n
    d
    o
    n
    "
    }

    View full-size slide

  26. $GROUP
    Group documents by an ID
    Field reference, object, constant
    Other output fields are computed
    $
    m
    a
    x
    , $
    m
    i
    n
    , $
    a
    v
    g
    , $
    s
    u
    m
    $
    a
    d
    d
    T
    o
    S
    e
    t
    , $
    p
    u
    s
    h
    $
    f
    i
    r
    s
    t
    , $
    l
    a
    s
    t

    View full-size slide

  27. $GROUP
    Calculating an average
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    g
    r
    o
    u
    p
    : {
    _
    i
    d
    : "
    $
    l
    a
    n
    g
    u
    a
    g
    e
    "
    ,
    a
    v
    g
    P
    a
    g
    e
    s
    : { $
    a
    v
    g
    : "
    $
    p
    a
    g
    e
    s
    " }
    }
    }
    { _
    i
    d
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    ,
    a
    v
    g
    P
    a
    g
    e
    s
    : 1
    4
    4
    0
    }
    { _
    i
    d
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    ,
    a
    v
    g
    P
    a
    g
    e
    s
    : 6
    5
    3
    }

    View full-size slide

  28. $GROUP
    Summating fields and counting
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    g
    r
    o
    u
    p
    : {
    _
    i
    d
    : "
    $
    l
    a
    n
    g
    u
    a
    g
    e
    "
    ,
    n
    u
    m
    T
    i
    t
    l
    e
    s
    : { $
    s
    u
    m
    : 1 }
    ,
    s
    u
    m
    P
    a
    g
    e
    s
    : { $
    s
    u
    m
    : "
    $
    p
    a
    g
    e
    s
    " }
    }
    }
    { _
    i
    d
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    ,
    n
    u
    m
    T
    i
    t
    l
    e
    s
    : 1
    ,
    s
    u
    m
    P
    a
    g
    e
    s
    : 1
    4
    4
    0
    }
    { _
    i
    d
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    ,
    n
    u
    m
    T
    i
    t
    l
    e
    s
    : 2
    ,
    s
    u
    m
    P
    a
    g
    e
    s
    : 1
    3
    0
    6
    }

    View full-size slide

  29. $GROUP
    Collecting distinct values
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    p
    a
    g
    e
    s
    : 2
    1
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }
    { t
    i
    t
    l
    e
    : "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    "
    ,
    p
    a
    g
    e
    s
    : 1
    4
    4
    0
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    }
    { t
    i
    t
    l
    e
    : "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    p
    a
    g
    e
    s
    : 1
    0
    8
    8
    ,
    l
    a
    n
    g
    u
    a
    g
    e
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    }


    { $
    g
    r
    o
    u
    p
    : {
    _
    i
    d
    : "
    $
    l
    a
    n
    g
    u
    a
    g
    e
    "
    ,
    t
    i
    t
    l
    e
    s
    : { $
    a
    d
    d
    T
    o
    S
    e
    t
    : "
    $
    t
    i
    t
    l
    e
    " }
    }
    }
    { _
    i
    d
    : "
    R
    u
    s
    s
    i
    a
    n
    "
    ,
    t
    i
    t
    l
    e
    s
    : [ "
    W
    a
    r a
    n
    d P
    e
    a
    c
    e
    " ]
    }
    { _
    i
    d
    : "
    E
    n
    g
    l
    i
    s
    h
    "
    ,
    t
    i
    t
    l
    e
    s
    : [
    "
    A
    t
    l
    a
    s S
    h
    r
    u
    g
    g
    e
    d
    "
    ,
    "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ]
    }

    View full-size slide

  30. $UNWIND
    Operate on an array field
    For each element, yield a document
    Pipe to $
    g
    r
    o
    u
    p to aggregate array values

    View full-size slide

  31. $UNWIND
    Yielding multiple documents from one
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : [
    "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    ,
    "
    N
    e
    w Y
    o
    r
    k
    "
    ,
    "
    1
    9
    2
    0
    s
    "
    ]
    }


    { $
    u
    n
    w
    i
    n
    d
    : "
    $
    s
    u
    b
    j
    e
    c
    t
    s
    " }
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : "
    L
    o
    n
    g I
    s
    l
    a
    n
    d
    "
    }
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : "
    N
    e
    w Y
    o
    r
    k
    "
    }
    { _
    i
    d
    : 3
    7
    5
    ,
    t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    "
    ,
    s
    u
    b
    j
    e
    c
    t
    s
    : "
    1
    9
    2
    0
    s
    "
    }

    View full-size slide

  32. $SORT, $LIMIT, $SKIP
    Sort documents by one or more fields
    Same order syntax as
    Limit and skip follow cursor behavior
    cursors

    View full-size slide

  33. $SORT
    Sort all documents in the pipeline
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    " }
    { t
    i
    t
    l
    e
    : "
    B
    r
    a
    v
    e N
    e
    w W
    o
    r
    l
    d
    " }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    a
    p
    e
    s o
    f W
    r
    a
    t
    h
    " }
    { t
    i
    t
    l
    e
    : "
    A
    n
    i
    m
    a
    l F
    a
    r
    m
    " }
    { t
    i
    t
    l
    e
    : "
    L
    o
    r
    d o
    f t
    h
    e F
    l
    i
    e
    s
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }


    { $
    s
    o
    r
    t
    : { t
    i
    t
    l
    e
    : 1 }
    }
    { t
    i
    t
    l
    e
    : "
    A
    n
    i
    m
    a
    l F
    a
    r
    m
    " }
    { t
    i
    t
    l
    e
    : "
    B
    r
    a
    v
    e N
    e
    w W
    o
    r
    l
    d
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }
    { t
    i
    t
    l
    e
    : "
    L
    o
    r
    d o
    f t
    h
    e F
    l
    i
    e
    s
    " }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    a
    p
    e
    s o
    f W
    r
    a
    t
    h
    " }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    " }

    View full-size slide

  34. $LIMIT
    Limit documents through the pipeline
    { t
    i
    t
    l
    e
    : "
    A
    n
    i
    m
    a
    l F
    a
    r
    m
    " }
    { t
    i
    t
    l
    e
    : "
    B
    r
    a
    v
    e N
    e
    w W
    o
    r
    l
    d
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }
    { t
    i
    t
    l
    e
    : "
    L
    o
    r
    d o
    f t
    h
    e F
    l
    i
    e
    s
    " }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    a
    p
    e
    s o
    f W
    r
    a
    t
    h
    " }
    { t
    i
    t
    l
    e
    : "
    T
    h
    e G
    r
    e
    a
    t G
    a
    t
    s
    b
    y
    " }


    { $
    l
    i
    m
    i
    t
    : 5 }
    { t
    i
    t
    l
    e
    : "
    A
    n
    i
    m
    a
    l F
    a
    r
    m
    " }
    { t
    i
    t
    l
    e
    : "
    B
    r
    a
    v
    e N
    e
    w W
    o
    r
    l
    d
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }

    View full-size slide

  35. $SKIP
    Skip over documents in the pipeline
    { t
    i
    t
    l
    e
    : "
    A
    n
    i
    m
    a
    l F
    a
    r
    m
    " }
    { t
    i
    t
    l
    e
    : "
    B
    r
    a
    v
    e N
    e
    w W
    o
    r
    l
    d
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }


    { $
    s
    k
    i
    p
    : 2 }
    { t
    i
    t
    l
    e
    : "
    F
    a
    h
    r
    e
    n
    h
    e
    i
    t 4
    5
    1
    " }
    { t
    i
    t
    l
    e
    : "
    F
    a
    t
    h
    e
    r
    s a
    n
    d S
    o
    n
    s
    " }
    { t
    i
    t
    l
    e
    : "
    I
    n
    v
    i
    s
    i
    b
    l
    e M
    a
    n
    " }

    View full-size slide

  36. $GEONEAR
    Only used to start the pipeline
    Returns documents near a point, in order
    Proxies the command
    Supports the same options
    Adds a d
    i
    s
    t
    a
    n
    c
    e
    F
    i
    e
    l
    d option
    geoNear

    View full-size slide

  37. EXPRESSIONS
    Return computed values
    Used with $
    p
    r
    o
    j
    e
    c
    t and $
    g
    r
    o
    u
    p
    Reference fields using $ (e.g. "
    $
    x
    "
    )
    Expressions may be nested

    View full-size slide

  38. EXPRESSIONS
    Logic
    $
    a
    n
    d
    , $
    o
    r
    , $
    n
    o
    t

    Comparison
    $
    c
    m
    p
    , $
    e
    q
    , $
    g
    t

    Arithmetic
    $
    a
    d
    d
    , $
    d
    i
    v
    i
    d
    e

    String
    $
    s
    t
    r
    c
    a
    s
    e
    c
    m
    p
    , $
    s
    u
    b
    s
    t
    r

    Date
    $
    y
    e
    a
    r
    , $
    d
    a
    y
    O
    f
    M
    o
    n
    t
    h

    Conditional
    $
    c
    o
    n
    d
    , $
    i
    f
    N
    u
    l
    l

    View full-size slide

  39. USAGE
    a
    g
    g
    r
    e
    g
    a
    t
    e database command
    c
    o
    l
    l
    e
    c
    t
    i
    o
    n
    .
    a
    g
    g
    r
    e
    g
    a
    t
    e
    (
    ) shell helper
    in PHP
    MongoCollection::aggregate()

    View full-size slide

  40. COLLECTION METHOD
    d
    b
    .
    b
    o
    o
    k
    s
    .
    a
    g
    g
    r
    e
    g
    a
    t
    e
    (
    [
    { $
    s
    o
    r
    t
    : { c
    r
    e
    a
    t
    e
    d
    : 1 }
    }
    ,
    { $
    u
    n
    w
    i
    n
    d
    : "
    $
    s
    u
    b
    j
    e
    c
    t
    s
    " }
    ,
    { $
    g
    r
    o
    u
    p
    : { _
    i
    d
    : "
    $
    s
    u
    b
    j
    e
    c
    t
    s
    "
    , n
    : { $
    s
    u
    m
    : 1 }
    ,
    f
    c
    : { $
    f
    i
    r
    s
    t
    : "
    $
    c
    r
    e
    a
    t
    e
    d
    " } }
    }
    ,
    { $
    p
    r
    o
    j
    e
    c
    t
    : { _
    i
    d
    : 1
    , n
    : 1
    , f
    c
    : { $
    y
    e
    a
    r
    : "
    $
    f
    c
    " }
    }
    }
    ]
    )
    ;

    {
    r
    e
    s
    u
    l
    t
    : [
    { "
    _
    i
    d
    "
    : "
    F
    a
    n
    t
    a
    s
    y
    "
    , "
    n
    u
    m
    "
    : 6
    , "
    f
    c
    "
    : 2
    0
    0
    8 }
    ,
    { "
    _
    i
    d
    "
    : "
    H
    i
    s
    t
    o
    r
    i
    c
    a
    l
    "
    , "
    n
    u
    m
    "
    : 7
    , "
    f
    c
    "
    : 2
    0
    1
    2 }
    ,
    { "
    _
    i
    d
    "
    : "
    W
    o
    r
    l
    d L
    i
    t
    e
    r
    a
    t
    u
    r
    e
    "
    , "
    n
    "
    : 2
    , "
    f
    c
    "
    : 2
    0
    0
    9 }
    /
    / O
    t
    h
    e
    r r
    e
    s
    u
    l
    t
    s f
    o
    l
    l
    o
    w

    ]
    ,
    o
    k
    : 1
    }

    View full-size slide

  41. DATABASE COMMAND
    d
    b
    .
    r
    u
    n
    C
    o
    m
    m
    a
    n
    d
    (
    { a
    g
    g
    r
    e
    g
    a
    t
    e
    : "
    b
    o
    o
    k
    s
    "
    , p
    i
    p
    e
    l
    i
    n
    e
    : [
    { $
    s
    o
    r
    t
    : { c
    r
    e
    a
    t
    e
    d
    : 1 }
    }
    ,
    { $
    u
    n
    w
    i
    n
    d
    : "
    $
    s
    u
    b
    j
    e
    c
    t
    s
    " }
    ,
    { $
    g
    r
    o
    u
    p
    : { _
    i
    d
    : "
    $
    s
    u
    b
    j
    e
    c
    t
    s
    "
    , n
    : { $
    s
    u
    m
    : 1 }
    ,
    f
    c
    : { $
    f
    i
    r
    s
    t
    : "
    $
    c
    r
    e
    a
    t
    e
    d
    " } }
    }
    ,
    { $
    p
    r
    o
    j
    e
    c
    t
    : { _
    i
    d
    : 1
    , n
    : 1
    , f
    c
    : { $
    y
    e
    a
    r
    : "
    $
    f
    c
    " }
    }
    }
    ]
    }
    )
    ;

    {
    r
    e
    s
    u
    l
    t
    : [
    { "
    _
    i
    d
    "
    : "
    F
    a
    n
    t
    a
    s
    y
    "
    , "
    n
    u
    m
    "
    : 6
    , "
    f
    c
    "
    : 2
    0
    0
    8 }
    ,
    { "
    _
    i
    d
    "
    : "
    H
    i
    s
    t
    o
    r
    i
    c
    a
    l
    "
    , "
    n
    u
    m
    "
    : 7
    , "
    f
    c
    "
    : 2
    0
    1
    2 }
    ,
    { "
    _
    i
    d
    "
    : "
    W
    o
    r
    l
    d L
    i
    t
    e
    r
    a
    t
    u
    r
    e
    "
    , "
    n
    "
    : 2
    , "
    f
    c
    "
    : 2
    0
    0
    9 }
    /
    / O
    t
    h
    e
    r r
    e
    s
    u
    l
    t
    s f
    o
    l
    l
    o
    w

    ]
    ,
    o
    k
    : 1
    }

    View full-size slide

  42. LIMITATIONS
    Result limited by BSON document size
    Final command result
    Intermediate shard results
    Pipeline operator memory limits

    View full-size slide

  43. Aggregation Framework in Action
    Stock ticker data
    Real-time candlestick charts
    Open/close rates per minute
    Plot peak points and averages
    Time-series visualization

    View full-size slide

  44. EXAMPLE DATA
    {
    "
    _
    i
    d
    " : O
    b
    j
    e
    c
    t
    I
    d
    (
    "
    4
    f
    4
    b
    8
    9
    1
    6
    f
    b
    1
    c
    8
    0
    e
    1
    4
    1
    e
    a
    6
    2
    0
    1
    "
    )
    ,
    "
    a
    s
    k
    " : 1
    .
    3
    0
    0
    2
    8
    ,
    "
    b
    i
    d
    " : 1
    .
    3
    0
    0
    2
    ,
    "
    t
    s
    " : I
    S
    O
    D
    a
    t
    e
    (
    "
    2
    0
    1
    2
    -
    0
    2
    -
    1
    6
    T
    1
    2
    :
    4
    8
    :
    0
    0
    Z
    "
    )
    }

    View full-size slide

  45. http://git.io/mongodb-demo

    View full-size slide

  46. SHARDING
    Split the pipeline at first $
    g
    r
    o
    u
    p or $
    s
    o
    r
    t
    Shards execute pipeline up to that point
    mongos merges results and continues
    Early $
    m
    a
    t
    c
    h may excuse shards

    View full-size slide

  47. SHARDING
    [
    { $
    m
    a
    t
    c
    h
    : { /
    * f
    i
    l
    t
    e
    r b
    y s
    h
    a
    r
    d k
    e
    y *
    / }
    }
    ,
    { $
    g
    r
    o
    u
    p
    : { /
    * g
    r
    o
    u
    p b
    y s
    o
    m
    e f
    i
    e
    l
    d *
    / }
    }
    ,
    { $
    s
    o
    r
    t
    : { /
    * s
    o
    r
    t b
    y s
    o
    m
    e f
    i
    e
    l
    d *
    / }
    }
    ,
    { $
    p
    r
    o
    j
    e
    c
    t
    : { /
    * r
    e
    s
    h
    a
    p
    e r
    e
    s
    u
    l
    t *
    / }
    }
    ]

    View full-size slide

  48. SHARDING
    shard1
    $
    m
    a
    t
    c
    h
    $
    g
    r
    o
    u
    p
    1

    shard2
    $
    m
    a
    t
    c
    h
    $
    g
    r
    o
    u
    p
    1

    shard3
    mongos
    $
    g
    r
    o
    u
    p
    2
    $
    s
    o
    r
    t
    $
    p
    r
    o
    j
    e
    c
    t

    Result

    View full-size slide

  49. LOOKING
    AHEAD

    View full-size slide

  50. Framework Use Cases
    Basic aggregation queries
    Ad-hoc reporting
    Real-time analytics
    Visualizing time series data

    View full-size slide

  51. Future Enhancements
    Coming in MongoDB 2.6
    operator for output control
    Down the road
    $
    o
    u
    t
    Pipeline explain facility
    Optimizing $
    m
    a
    t
    c
    h position
    Grouping sorted input

    View full-size slide

  52. Enabling Developers
    Doing more within MongoDB, faster
    Refactoring MapReduce and groupings
    Replace pages of JavaScript
    Longer aggregation pipelines
    Quick aggregations from the shell

    View full-size slide

  53. THANKS!
    Server and drivers
    Aggregation documentation
    MapReduce query profiler demo app
    Time-series visualization demo app
    MongoDB shell enhancements for hackers
    Questions?

    View full-size slide

  54. Photo Credits
    http://dilbert.com/strips/comic/2012-09-05
    http://img.timeinc.net/time/photoessays/2009/monty_python/monty_python_02.jpg
    http://elevatedscholars.files.wordpress.com/2011/01/dscf0459.jpg
    http://purenintendo.com/wp-content/uploads/2009/10/i_20476.jpg
    http://i198.photobucket.com/albums/aa77/Stinger1118/Video%20Games/SuperMarioBrosWarpZone.jpg
    http://www.blogcdn.com/www.joystiq.com/media/2009/10/marioart102613.jpg

    View full-size slide