$30 off During Our Annual Pro Sale. View Details »

Deep dive into Aggregations

Deep dive into Aggregations

Elasticsearch 1.0 features a completely new way of doing analytics called Aggregations. As powerful as Aggregations are to its predecessor called facets, it needs a bit more time to grasp its concepts. This talk will introduce you into Aggregations step-by-step and shows some use-cases, how easy it is to extract useful information from your data.

Boaz Leskes

April 29, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Deep dive into analytics

    using Aggregation
    Boaz Leskes
    @bleskes

    View Slide

  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    an end-to-end
    search and
    analytics platform.
    Elasticsearch

    View Slide

  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.
    • full text search
    • highlighted search snippets
    • search-as-you-type
    • did-you-mean suggestions

    View Slide

  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.
    • combines full text search with
    geolocation
    • uses more-like-this to find 

    related questions and answers

    View Slide

  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.
    • search repositories, users, 

    issues, pull requests
    • search 130 billion lines of code
    • track all alerts, events, logs

    View Slide

  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.
    • index and analyse 

    5TB of log data every day

    View Slide

  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.
    • combine visitor logs with 

    social network data
    • real-time feedback to editors

    View Slide

  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

    View Slide

  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Feature summary
    • Fully-featured search
    Relevance-ranked text search
    Scalable search
    High-performance geo, temporal, numeric range
    and key lookup
    Highlighting
    Support for complex document types (nested
    structures) *
    Spelling suggestions
    Powerful query DSL *
    “Standing” queries *
    Real-time results *
    Extensible via plugins *
    !
    • Powerful faceting/analysis
    Summarise large sets by any combinations of
    time, geo, category and more. *
    “Kibana” visualisation tool *
    !
    • Management
    Simple and robust deployments *
    REST APIs for handling all aspects of
    administration/monitoring *
    “Marvel” console for monitoring and
    administering clusters *
    Special features to manage the life cycle of
    content *
    • Integration
    Hadoop (MapRed,Hive, Pig, Cascading..)*
    Client libraries (Python, Java, Ruby, javascript…)
    Data connectors (Twitter, JMS…)
    Logstash ETL framework *
    • Support
    Development and Production support with tiered
    levels
    Support staff are the core developers of the
    product *
    * Features we see as differentiators

    View Slide

  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    data === json
    Let’s talk data

    View Slide

  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    {
    "created_at": "Mon Apr 28 12:31:48 +0000 2014",
    "id": 460758276159062000,
    "text": "Prepping up for my talk tomorrow at #noslq14 Cologne,
    where I’ll spend the coming two days. Drop by for everything
    #elasticsearch related.",
    "user": {
    "id": 15037017,
    "name": "Boaz Leskes",
    "screen_name": "bleskes",
    "location": "Amsterdam",
    "description": "Coder at Elasticsearch",
    "time_zone": "Amsterdam",
    },
    "geo": null,
    "retweet_count": 1,
    "entities": {
    "hashtags": [
    { "text": “noslq14” },
    { "text": “elasticsearch" }
    ],
    "symbols": [],
    "urls": [],
    "user_mentions": []
    },
    "favorited": false,
    "retweeted": false,
    "lang": "en"
    }

    View Slide

  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    {
    "dt": "2014-03-03T02:01:48.026Z",
    "url": "http://www.theguardian.com/film/2014/mar/03/oscars-2014-winners",
    "queryString": "",
    "host": "www.theguardian.com",
    "path": "/film/2014/mar/03/oscars-2014-winners-list",
    "section": "film",
    "platform": "r2",
    "userAgent": {
    "type": "Browser",
    "family": "Safari 5.1.9",
    "os": "OS X 10.6.8",
    "device": "Personal computer"
    },
    "documentReferrer": "http://www.theguardian.com/world",
    "browser": {
    "id": "gA6RUFLhWNQvWdt0rW4r78Fg",
    "isNew": false
    },
    "referringHost": "theguardian.com",
    "referringPath": "/world",
    "isContent": true,
    "contentPublicationDate": "2014-03-03",
    "countryCode": "US",
    "countryName": "United States",
    "location": {
    "lonlat": [-73.4409, 41.2094]
    }
    }

    View Slide

  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Data can be anything
    • Questions
    • Code
    • Logs
    • Credit card transactions
    • Click logs
    • …

    View Slide

  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    aggregations == 50km view == patterns

    View Slide

  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    aggregations == 50km view == patterns
    insights

    View Slide

  16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    a simple UI element…

    View Slide

  17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    … or more complex …

    View Slide

  18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    .. even more complex?

    View Slide

  19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    back to search
    Underpinnings

    View Slide

  20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Powerful search engine
    GET tweets/tweet/_search
    {
    "query": {
    "filtered": {
    "query": {
    "match": {
    "text": "jumping"
    }
    },
    "filter": {
    "range": {
    "created_at": {
    "from": "2014-01-28T05:16:29+00:00",
    "to": "now"
    }
    }
    }
    }
    }
    }

    View Slide

  21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Inverted index
    nosql
    128
    New
    York
    lat=6.9
    lon=50
    F
    2 6 8 48 112 379
    6 9 10 48
    11 13 14 134 207
    6 9
    2 4 9 36 103 310

    View Slide

  22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Our goal
    6
    8
    11
    38
    153
    results

    View Slide

  23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Counting
    6
    8
    11
    38
    153
    results
    Accessories
    Lenses
    Optics
    Cameras

    View Slide

  24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Counting
    6
    8
    11
    38
    153
    results
    Accessories
    Lenses
    Optics
    Cameras 3
    1
    2
    0

    View Slide

  25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Field data
    2 6 8 48 112 379
    6 9 10 48
    11 13 14 134 207
    6 9
    2 4 9 36 103 310
    2
    4
    6
    8
    9
    1 5
    5
    1 2 4
    1
    2 4 5
    1 nosql
    2 128
    3
    New
    York
    4
    lat=6.9
    lon=50
    5 F

    View Slide

  26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    analysis lego
    Introducing aggregations

    View Slide

  27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Before there were facets
    • facets are awesome

    • the serve well and long

    • but… they are not scalable from a functionality
    perspective

    View Slide

  28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Analysis lego

    View Slide

  29. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Buckets and metrics
    6
    8
    11
    38
    153
    results
    Accessories
    Lenses
    Optics
    Cameras 3
    1
    2
    0
    buckets metrics

    View Slide

  30. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Buckets and metrics
    6
    8
    11
    38
    153
    results
    Accessories
    Lenses
    Optics
    Cameras
    3
    1
    2
    0
    buckets metrics
    2013
    2012
    2012
    2013

    View Slide

  31. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    json style
    GET localhost:9200/_search
    {
    "aggs" : {
    "countries" : {
    "terms" : { "field" : "country" },
    "aggs" : {
    "subjects" : {
    "terms" : {
    field" : "subject"
    },
    "aggs" : {
    "avg_score" : {
    "avg" : {
    "field" : "score"
    }
    }
    }
    }
    }
    }
    }
    }
    {
    "hits" : { ... },
    "aggregations" : {
    "countries" : {
    "buckets" : [
    {
    "key" : "USA",
    "doc_count" : 5
    "aggregations" : {
    "subjects" :
    "buckets" : [
    {
    "key" : "Mathematics",
    "doc_count" : 3,
    "aggregations" : {
    "avg_score" : {
    "value" : 87.5
    }
    }
    },
    ...
    ]}},
    ...
    ]}
    }
    }

    View Slide

  32. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    measure all the things
    • doc count (free!)
    • avg
    • min
    • max
    • sum
    • count
    • stats
    • extended stats
    • cardinality
    • percentiles

    View Slide

  33. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Buckets
    • global
    • filter
    • missing
    • terms
    • range
    • date range
    • ip range
    • histogram
    • date histogram
    • geo distance
    • nested
    • geohash grid
    • significant terms

    View Slide

  34. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example - range bucket
    GET localhost:9200/grades/grade/_search
    {
    "aggs" : {
    "age_groups" : {
    "range" : {
    "field" : "age",
    "ranges" : [
    { "from" : 5, "to" : 10 },
    { "from" : 10 }
    ]
    },
    "aggs" : {
    "avg_grade" : {
    "avg" : {
    "field" : "grade"
    }
    }
    }
    }
    }
    }'
    "age_groups": {
    "buckets": [
    {
    "from": 5,
    "to": 10,
    "doc_count": 911,
    "avg_grade": {
    "value": 81.603
    }
    },
    {
    "from": 10,
    "doc_count": 2276,
    "avg_grade": {
    "value": 82.357
    }
    }
    ]
    }
    !

    View Slide

  35. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    GET grades/grade/_search
    {
    "aggs" : {
    "grades_distribution" : {
    "histogram" : {
    "field" : "grade",
    "interval" : 10
    }
    }
    }
    }
    Example - histogram

    View Slide

  36. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    GET grades/grade/_search
    {
    "aggs" : {
    "grades_distribution" : {
    "histogram" : {
    "field" : "grade",
    "interval" : 10
    }
    }
    }
    }
    Example - histogram
    "aggregations": {
    "grades_distribution": {
    "buckets": [
    {
    "key": 60,
    "doc_count": 467
    },
    {
    "key": 70,
    "doc_count": 873
    },
    {
    "key": 80,
    "doc_count": 930
    },
    {
    "key": 90,
    "doc_count": 915
    }
    ]
    }
    }
    0
    250
    500
    750
    1000
    60 70 80 90
    915
    930
    873
    467

    View Slide

  37. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    analytics as
    search
    Significant terms

    View Slide

  38. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Common crimes
    GET ukcrimes/_search
    {
    "query": { }
    "aggregations" : {
    "map" : {
    "geohash_grid" : {
    "field":"location",
    "precision":5,
    },
    "aggregations":{
    "most_popular_crime_type":{
    "terms":{
    "field" : "crime_type", "size" : 1
    } }
    }
    }
    }
    }

    View Slide

  39. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Geo-what?

    View Slide

  40. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Geo-what?

    View Slide

  41. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Geo-what?

    View Slide

  42. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    The common terms problem

    View Slide

  43. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Uncommonly common
    GET ukcrimes/_search
    {
    "query": { }
    "aggregations" : {
    "map" : {
    "geohash_grid" : {
    "field":"location",
    "precision":5,
    },
    "aggregations":{
    "most_popular_crime_type":{
    "significant_terms":{
    "field" : "crime_type", "size" : 1
    } }
    }
    }
    }
    }

    View Slide

  44. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Uncommonly common

    View Slide

  45. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Demo!

    View Slide

  46. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    thank you!
    http://elasticsearch.com/support
    @elasticsearch , @bleskes
    http://elasticsearch.org/resources

    View Slide