Deep dive into Aggregations

Deep dive into Aggregations

Elasticsearch 1.0 features a completely new way of doing analytics called Aggregations. As powerful as Aggregations are to its predecessor called facets, it needs a bit more time to grasp its concepts. This talk will introduce you into Aggregations step-by-step and shows some use-cases, how easy it is to extract useful information from your data.

9a2049bf377d85f15dd1f7a3ce697a68?s=128

Boaz Leskes

April 29, 2014
Tweet

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Deep dive into analytics using Aggregation Boaz Leskes @bleskes
  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited an end-to-end search and analytics platform. Elasticsearch
  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. • full text search • highlighted search snippets • search-as-you-type • did-you-mean suggestions
  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. • combines full text search with geolocation • uses more-like-this to find 
 related questions and answers
  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. • search repositories, users, 
 issues, pull requests • search 130 billion lines of code • track all alerts, events, logs
  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. • index and analyse 
 5TB of log data every day
  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. • combine visitor logs with 
 social network data • real-time feedback to editors
  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited.
  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Feature summary • Fully-featured search Relevance-ranked text search Scalable search High-performance geo, temporal, numeric range and key lookup Highlighting Support for complex document types (nested structures) * Spelling suggestions Powerful query DSL * “Standing” queries * Real-time results * Extensible via plugins * ! • Powerful faceting/analysis Summarise large sets by any combinations of time, geo, category and more. * “Kibana” visualisation tool * ! • Management Simple and robust deployments * REST APIs for handling all aspects of administration/monitoring * “Marvel” console for monitoring and administering clusters * Special features to manage the life cycle of content * • Integration Hadoop (MapRed,Hive, Pig, Cascading..)* Client libraries (Python, Java, Ruby, javascript…) Data connectors (Twitter, JMS…) Logstash ETL framework * • Support Development and Production support with tiered levels Support staff are the core developers of the product * * Features we see as differentiators
  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited data === json Let’s talk data
  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited { "created_at": "Mon Apr 28 12:31:48 +0000 2014", "id": 460758276159062000, "text": "Prepping up for my talk tomorrow at #noslq14 Cologne, where I’ll spend the coming two days. Drop by for everything #elasticsearch related.", "user": { "id": 15037017, "name": "Boaz Leskes", "screen_name": "bleskes", "location": "Amsterdam", "description": "Coder at Elasticsearch", "time_zone": "Amsterdam", }, "geo": null, "retweet_count": 1, "entities": { "hashtags": [ { "text": “noslq14” }, { "text": “elasticsearch" } ], "symbols": [], "urls": [], "user_mentions": [] }, "favorited": false, "retweeted": false, "lang": "en" }
  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited { "dt": "2014-03-03T02:01:48.026Z", "url": "http://www.theguardian.com/film/2014/mar/03/oscars-2014-winners", "queryString": "", "host": "www.theguardian.com", "path": "/film/2014/mar/03/oscars-2014-winners-list", "section": "film", "platform": "r2", "userAgent": { "type": "Browser", "family": "Safari 5.1.9", "os": "OS X 10.6.8", "device": "Personal computer" }, "documentReferrer": "http://www.theguardian.com/world", "browser": { "id": "gA6RUFLhWNQvWdt0rW4r78Fg", "isNew": false }, "referringHost": "theguardian.com", "referringPath": "/world", "isContent": true, "contentPublicationDate": "2014-03-03", "countryCode": "US", "countryName": "United States", "location": { "lonlat": [-73.4409, 41.2094] } }
  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Data can be anything • Questions • Code • Logs • Credit card transactions • Click logs • …
  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited aggregations == 50km view == patterns
  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited aggregations == 50km view == patterns insights
  16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited a simple UI element…
  17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited … or more complex …
  18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited .. even more complex?
  19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited back to search Underpinnings
  20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Powerful search engine GET tweets/tweet/_search { "query": { "filtered": { "query": { "match": { "text": "jumping" } }, "filter": { "range": { "created_at": { "from": "2014-01-28T05:16:29+00:00", "to": "now" } } } } } }
  21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Inverted index nosql 128 New York lat=6.9 lon=50 F 2 6 8 48 112 379 6 9 10 48 11 13 14 134 207 6 9 2 4 9 36 103 310
  22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Our goal 6 8 11 38 153 results
  23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Counting 6 8 11 38 153 results Accessories Lenses Optics Cameras
  24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Counting 6 8 11 38 153 results Accessories Lenses Optics Cameras 3 1 2 0
  25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Field data 2 6 8 48 112 379 6 9 10 48 11 13 14 134 207 6 9 2 4 9 36 103 310 2 4 6 8 9 1 5 5 1 2 4 1 2 4 5 1 nosql 2 128 3 New York 4 lat=6.9 lon=50 5 F
  26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited analysis lego Introducing aggregations
  27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Before there were facets • facets are awesome • the serve well and long • but… they are not scalable from a functionality perspective
  28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Analysis lego
  29. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Buckets and metrics 6 8 11 38 153 results Accessories Lenses Optics Cameras 3 1 2 0 buckets metrics
  30. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Buckets and metrics 6 8 11 38 153 results Accessories Lenses Optics Cameras 3 1 2 0 buckets metrics 2013 2012 2012 2013
  31. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited json style GET localhost:9200/_search { "aggs" : { "countries" : { "terms" : { "field" : "country" }, "aggs" : { "subjects" : { "terms" : { field" : "subject" }, "aggs" : { "avg_score" : { "avg" : { "field" : "score" } } } } } } } } { "hits" : { ... }, "aggregations" : { "countries" : { "buckets" : [ { "key" : "USA", "doc_count" : 5 "aggregations" : { "subjects" : "buckets" : [ { "key" : "Mathematics", "doc_count" : 3, "aggregations" : { "avg_score" : { "value" : 87.5 } } }, ... ]}}, ... ]} } }
  32. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited measure all the things • doc count (free!) • avg • min • max • sum • count • stats • extended stats • cardinality • percentiles
  33. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Buckets • global • filter • missing • terms • range • date range • ip range • histogram • date histogram • geo distance • nested • geohash grid • significant terms
  34. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example - range bucket GET localhost:9200/grades/grade/_search { "aggs" : { "age_groups" : { "range" : { "field" : "age", "ranges" : [ { "from" : 5, "to" : 10 }, { "from" : 10 } ] }, "aggs" : { "avg_grade" : { "avg" : { "field" : "grade" } } } } } }' "age_groups": { "buckets": [ { "from": 5, "to": 10, "doc_count": 911, "avg_grade": { "value": 81.603 } }, { "from": 10, "doc_count": 2276, "avg_grade": { "value": 82.357 } } ] } !
  35. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited GET grades/grade/_search { "aggs" : { "grades_distribution" : { "histogram" : { "field" : "grade", "interval" : 10 } } } } Example - histogram
  36. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited GET grades/grade/_search { "aggs" : { "grades_distribution" : { "histogram" : { "field" : "grade", "interval" : 10 } } } } Example - histogram "aggregations": { "grades_distribution": { "buckets": [ { "key": 60, "doc_count": 467 }, { "key": 70, "doc_count": 873 }, { "key": 80, "doc_count": 930 }, { "key": 90, "doc_count": 915 } ] } } 0 250 500 750 1000 60 70 80 90 915 930 873 467
  37. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited analytics as search Significant terms
  38. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Common crimes GET ukcrimes/_search { "query": { } "aggregations" : { "map" : { "geohash_grid" : { "field":"location", "precision":5, }, "aggregations":{ "most_popular_crime_type":{ "terms":{ "field" : "crime_type", "size" : 1 } } } } } }
  39. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Geo-what?
  40. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Geo-what?
  41. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Geo-what?
  42. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited The common terms problem
  43. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Uncommonly common GET ukcrimes/_search { "query": { } "aggregations" : { "map" : { "geohash_grid" : { "field":"location", "precision":5, }, "aggregations":{ "most_popular_crime_type":{ "significant_terms":{ "field" : "crime_type", "size" : 1 } } } } } }
  44. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Uncommonly common
  45. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Demo!
  46. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources