Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch - aggregations

Elasticsearch Inc
May 26, 2014
2.6k

Elasticsearch - aggregations

"Elasticsearch - aggregations" at Berlin Buzzwords 2014

Elasticsearch Inc

May 26, 2014
Tweet

Transcript

  1. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Adrien Grand
    @jpountz
    aggregations

    View full-size slide

  2. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    outline
    • what aggregations are
    • why we built them
    • how they work
    what the trade-offs are

    View full-size slide

  3. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    aggregations
    • analytics
    histograms, distributions, statistics
    • over any partition of your data
    anything that can be selected with queries/filters
    • in near real time
    computed on the fly, ~1s refresh interval
    • that can be composed
    unlike facets

    View full-size slide

  4. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    bucket / metrics
    • bucket
    terms
    histogram
    range
    filter
    geohash grid
    • metrics
    stats
    min / max / avg / sum
    percentiles
    cardinality
    root aggregation:
    collects everything
    inner aggregation:
    bucket
    leaf aggregation:
    bucket or metric

    View full-size slide

  5. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    traffic analysis
    {
    “source_ip" : “77.104.12.13”,
    “timestamp” : “2014-05-25T23:44:12.779Z”
    }
    Unique visitors per day
    0
    27,5
    55
    82,5
    110
    Mon Tue Wed Thu Fri Sat Sun
    histogram
    (timestamp)
    cardinality
    (source_ip)
    root

    View full-size slide

  6. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    performance analysis
    {
    “resp_time” : 205,
    “timestamp” : “2014-05-25T23:44:12.779Z”
    }
    Median, 90th, 99th percentiles over time
    0
    125
    250
    375
    500
    0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00
    histogram
    (timestamp)
    percentiles
    (resp_time)
    root

    View full-size slide

  7. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    e-commerce
    {
    “category” : “Dresses”,
    “site” : “Zalando”,
    “brand” : “Desigual”,
    “designation”: “dress”,
    “price”: 85
    }
    • Dresses: 23 offers, 9 sites
    • Urbanist: 12 min_price: 60
    • Desigual: 8 min_price: 85
    • Life: 3 min_price: 52
    • Shoes: 19, 3 sites
    • Skirts: 8, 5 sites
    terms
    (category)
    cardinality
    (site)
    terms
    (brand)
    min
    (price)
    root

    View full-size slide

  8. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    e-commerce
    {
    “category” : “Dresses”,
    “site” : “Zalando”,
    “brand” : “Desigual”,
    “designation”: “dress”,
    “price”: 85
    }
    • Dresses: 23 offers, 9 sites
    • Urbanist: 12 min_price: 60
    • Desigual: 8 min_price: 85
    • Life: 3 min_price: 52
    • Shoes: 19, 3 sites
    • Skirts: 8, 5 sites
    terms
    (category)
    cardinality
    (site)
    terms
    (brand)
    min
    (price)
    root

    View full-size slide

  9. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    e-commerce
    {
    “category” : “Dresses”,
    “site” : “Zalando”,
    “brand” : “Desigual”,
    “designation”: “dress”,
    “price”: 85
    }
    • Dresses: 23 offers, 9 sites
    • Urbanist: 12 min_price: 60
    • Desigual: 8 min_price: 85
    • Life: 3 min_price: 52
    • Shoes: 19, 3 sites
    • Skirts: 8, 5 sites
    terms
    (category)
    cardinality
    (site)
    terms
    (brand)
    min
    (price)
    root

    View full-size slide

  10. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    e-commerce
    {
    “category” : “Dresses”,
    “site” : “Zalando”,
    “brand” : “Desigual”,
    “designation”: “dress”,
    “price”: 85
    }
    • Dresses: 23 offers, 9 sites
    • Urbanist: 12 min_price: 60
    • Desigual: 8 min_price: 85
    • Life: 3 min_price: 52
    • Shoes: 19, 3 sites
    • Skirts: 8, 5 sites
    terms
    (category)
    cardinality
    (site)
    terms
    (brand)
    min
    (price)
    root

    View full-size slide

  11. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    e-commerce
    {
    “category” : “Dresses”,
    “site” : “Zalando”,
    “brand” : “Desigual”,
    “designation”: “dress”,
    “price”: 85
    }
    • Dresses: 23 offers, 9 sites
    • Urbanist: 12 min_price: 60
    • Desigual: 8 min_price: 85
    • Life: 3 min_price: 52
    • Shoes: 19, 3 sites
    • Skirts: 8, 5 sites
    terms
    (category)
    cardinality
    (site)
    terms
    (brand)
    min
    (price)
    root

    View full-size slide

  12. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    why on elasticsearch?
    • powerful when combined with search
    data exploration
    • search engines have had faceted search for a very
    long time
    storage is optimized for such a workload
    • aggregations are a new iteration
    with increased capabilities / flexibility

    View full-size slide

  13. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    why is it fast?
    • data stored to make information retrieval fast
    yet indexing remains faster than what you expect
    • optimized data structures
    compressed columnar storage (field data / doc values)
    strings are enums (per segment)
    • single pass on your data
    no matter how many levels of aggregations there are

    View full-size slide

  14. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    inverted
    index
    top hits
    collector
    aggregations
    collector

    View full-size slide

  15. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    terms
    (category)
    min
    (price)
    bucket of the
    parent aggregation

    View full-size slide

  16. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    Shoes 1 60
    terms
    (category)
    min
    (price)

    View full-size slide

  17. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    Shoes 1 60
    Clothing 1 80
    terms
    (category)
    min
    (price)

    View full-size slide

  18. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    Shoes 2 50
    Clothing 1 80
    terms
    (category)
    min
    (price)

    View full-size slide

  19. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    Shoes 2 50
    Clothing 1 80
    Sports 1 10
    terms
    (category)
    min
    (price)

    View full-size slide

  20. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (shard level)
    Shoes
    Clothing
    Shoes
    Sports
    Sports
    Category Price
    60
    80
    50
    10
    35
    Shoes 2 50
    Clothing 1 80
    Sports 2 10
    terms
    (category)
    min
    (price)

    View full-size slide

  21. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (cluster level)
    Clothing 5 45
    Shoes 3 60
    Accessories 12 5
    Shoes 2 50
    Clothing 1 80
    Sports 2 10

    View full-size slide

  22. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (cluster level)
    Clothing 5 45
    Shoes 3 60
    Accessories 12 5
    Shoes 2 50
    Clothing 1 80
    Sports 2 10
    Shoes 5 50

    View full-size slide

  23. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (cluster level)
    Clothing 5 45
    Shoes 3 60
    Accessories 12 5
    Shoes 2 50
    Clothing 1 80
    Sports 2 10
    Shoes 5 50
    Clothing 6 45

    View full-size slide

  24. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (cluster level)
    Clothing 5 45
    Shoes 3 60
    Accessories 12 5
    Shoes 2 50
    Clothing 1 80
    Sports 2 10
    Shoes 5 50
    Clothing 6 45
    Sports 2 10

    View full-size slide

  25. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    how it works (cluster level)
    Clothing 5 45
    Shoes 3 60
    Accessories 12 5
    Shoes 2 50
    Clothing 1 80
    Sports 2 10
    Shoes 5 50
    Clothing 6 45
    Sports 2 10
    Accessories 12 5

    View full-size slide

  26. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    goodies
    • support for document relations
    via nested documents and the nested/reverse_nested aggs
    no parent/child support (yet?)
    • significant_terms
    find the uncommonly common
    • upcoming top_hits aggregations in 1.3
    compute top hits on each bucket
    • performance / memory usage improved in 1.2
    Upgrade if you rely on aggregations

    View full-size slide

  27. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission is strictly prohibited
    thank you!

    View full-size slide