Slide 1

Slide 1 text

Elasticsearch Aggregation Kanji Yomoda (@k-yomo) May 2022

Slide 2

Slide 2 text

Confidential & Proprietary 2021 Agenda ● Aggregation types ● random_sampler aggregation ● How to build facets in e-commerce

Slide 3

Slide 3 text

Confidential & Proprietary 2021 Type of aggregations ● Metric aggregations => calculate metrics, such as a sum or average, from field values. ● Bucket aggregations => group documents into buckets, based on field values, ranges, or other criteria. ● Pipeline aggregations => take input from other aggregations.

Slide 4

Slide 4 text

Confidential & Proprietary 2021 Metrics aggregations

Slide 5

Slide 5 text

Confidential & Proprietary 2021 Metrics aggregations ● Avg ● Boxplot ● Cardinality ● Extended stats ● Geo-bounds ● Geo-centroid ● Geo-Line ● Matrix stats ● Max ● Median absolute deviation ● Min ● Percentile ranks ● Percentiles ● Rate ● Scripted metric ● Stats ● String stats ● Sum ● T-test ● Top hits ● Top metrics ● Value count ● Weighted avg

Slide 6

Slide 6 text

Confidential & Proprietary 2021 Bucket aggregations

Slide 7

Slide 7 text

Confidential & Proprietary 2021 Bucket aggregations ● Adjacency matrix ● Auto-interval date histogram ● Categorize text ● Children ● Composite ● Date histogram ● Date range ● Diversified sampler ● Filter ● Filters ● Geo-distance ● Geohash grid ● Geohex grid ● Geotile grid ● Global ● Histogram ● IP prefix ● IP range ● Missing ● Multi Terms ● Nested ● Parent ● Random sampler ● Range ● Rare terms ● Reverse nested ● Sampler ● Significant terms ● Significant text ● Terms ● Variable width histogram ● Subtleties of bucketing range fields

Slide 8

Slide 8 text

Confidential & Proprietary 2021 Pipeline aggregations

Slide 9

Slide 9 text

Confidential & Proprietary 2021 Pipeline aggregations ● Extended stats bucket ● Inference bucket ● Max bucket ● Min bucket ● Moving function ● Moving percentiles ● Normalize ● Percentiles bucket ● Serial differencing ● Stats bucket ● Sum bucket ● Average bucket ● Bucket script ● Bucket count K-S test ● Bucket correlation ● Bucket selector ● Bucket sort ● Change point ● Cumulative cardinality ● Cumulative sum ● Derivative

Slide 10

Slide 10 text

Confidential & Proprietary 2021 How aggregation works

Slide 11

Slide 11 text

Confidential & Proprietary 2021 random_sampler aggregation

Slide 12

Slide 12 text

Confidential & Proprietary 2021 random_sampler aggregation

Slide 13

Slide 13 text

Confidential & Proprietary 2021 random_sampler aggregation = roughly seeing only 0.1% of the documents (1 in every 1000th doc) = needed to get consistent result

Slide 14

Slide 14 text

Confidential & Proprietary 2021 How random_sampler aggregation works

Slide 15

Slide 15 text

Confidential & Proprietary 2021 random_sampler metrics 64 million documents spreaded across many shards

Slide 16

Slide 16 text

Confidential & Proprietary 2021 How to build facets in e-commerce

Slide 17

Slide 17 text

Confidential & Proprietary 2021 Spec ● Facet group: (name: colors, facets [{white:10}, { black: 6},...]) ● Show top N facets for each metadata (category, brand, color, and etc) ● Show all facet counts for it when filtered by itself ● Show filtered facet count by the other applied filters

Slide 18

Slide 18 text

Confidential & Proprietary 2021 Filters aggregation × post_filter Search Category aggs - brand filter - color filter Brand aggs - color filter post_filter - brand filter - color filter Color aggs - brand filter Response

Slide 19

Slide 19 text

Confidential & Proprietary 2021 Example queries http://brunozrk.github.io/building- faceted-search-with-elasticsearch -for-e-commerce-part-4/

Slide 20

Slide 20 text

Confidential & Proprietary 2021 References ● Aggregations | Elasticsearch Guide [8.2] ● Aggregate data faster with new the random_sampler aggregation ● Building faceted search with elasticsearch for e-commerce: part 1

Slide 21

Slide 21 text

Confidential & Proprietary 2021 Thanks!