Slide 1

Slide 1 text

ashwanth kumar @_ashwanthkumar principal engineer using monoids for large scale business stats

Slide 2

Slide 2 text

overview - Stats for Batch Jobs - Stats for Streaming Workloads - Generalizing aggregations with Monoids - Abel - Some cool logos!

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

but stats for MR jobs?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Used Scalding (from Twitter) + Simple to express aggregations stats as map-reduce jobs

Slide 7

Slide 7 text

Used Scalding (from Twitter) + Simple to express aggregations - Have to include intermediary data in the output just for stats stats as map-reduce jobs

Slide 8

Slide 8 text

Used Scalding (from Twitter) + Simple to express aggregations - Have to include intermediary data in the output just for stats - Have to think about writing stats after writing production code Stats as Map-Reduce Jobs

Slide 9

Slide 9 text

Used Scalding (from Twitter) + Simple to express aggregations - Have to include intermediary data in the output just for stats - Have to think about writing stats after writing production code - Not updated at-least till next run (Not “realtime”) stats as map-reduce jobs

Slide 10

Slide 10 text

but what about services running forever?

Slide 11

Slide 11 text

(riemann.io)

Slide 12

Slide 12 text

Service A Service B Service C

Slide 13

Slide 13 text

stats for streaming workloads Push computed rollups to InfluxDB + Realtime

Slide 14

Slide 14 text

stats for streaming workloads Push computed rollups to InfluxDB + Realtime + Allows arbitrary functions as rollups - code as config

Slide 15

Slide 15 text

stats for streaming workloads Push computed rollups to InfluxDB + Realtime + Allows arbitrary functions as rollups - code as config - Not distributable - since it allows arbitrary functions

Slide 16

Slide 16 text

stats for streaming workloads Push computed rollups to InfluxDB + Realtime + Allows arbitrary functions as rollups - code as config - Not distributable - since it allows arbitrary functions - Stats emission and rollups are at separate places - making it difficult to test and keep them in sync

Slide 17

Slide 17 text

lessons so far

Slide 18

Slide 18 text

- Have to include intermediary data in the output just for stats - Have to think about writing stats after writing production code - Not updated at-least till next run (Not “realtime”) - Not distributable - since it allows arbitrary functions - Stats emission and rollups are at separate places - making it difficult to test and keep them in sync stats for map-reduce jobs stats for streaming workloads

Slide 19

Slide 19 text

what do we really want in stats?

Slide 20

Slide 20 text

aggregations distribution / parallelism real time

Slide 21

Slide 21 text

aggregates as monoids

Slide 22

Slide 22 text

aggregates as monoids

Slide 23

Slide 23 text

An operation is considered a monoid if: (x . y) . z = x . (y . z) (associativity aka semigroup) identity . x = x . identity = x (identity) monoids trait Semigroup[T] { def plus(left: T, right: T): T } trait Monoid[T] extends Semigroup[T] { def zero: T }

Slide 24

Slide 24 text

A monoid can also be commutative x . y = y . x monoids Commutative property of monoids are used for parallel processing on large datasets

Slide 25

Slide 25 text

monoids - count / sum sum is associative sum(sum(2, 6), 6) == sum(2, sum(6, 6)) sum(8, 6) == sum(2, 12) 14 == 14

Slide 26

Slide 26 text

monoids - average Average of an average is not an average, aka, not associative avg(avg(2, 6), 6) != avg(2, avg(6, 6)) avg(4, 6) != avg(2, 6) 5 != 4

Slide 27

Slide 27 text

monoids - average But Average can be associative, if we have total & count individually case class Average(total: Double, count: Long) { def toAvg: Double = total / count }

Slide 28

Slide 28 text

monoids : themes

Slide 29

Slide 29 text

monoids : parallelism

Slide 30

Slide 30 text

parallel aggregations 3 4 ... 7 2 1 3 ... 8 7 5 ... 1 3 4 ... 7 Σ A 2 1 3 ... 8 7 5 ... 1 Σ C Σ B Σ = Σ A +Σ B +Σ C

Slide 31

Slide 31 text

monoids : approximates

Slide 32

Slide 32 text

- While sum is accurate, distinct counts in constant memory are not monoids - approximates

Slide 33

Slide 33 text

- While sum is accurate, distinct counts in constant memory are not - Approximate structures like HyperLogLog can find unique counts in constant memory (and a known error bound) monoids - approximates

Slide 34

Slide 34 text

- While sum is accurate, distinct counts in constant memory are not - Approximate structures like HyperLogLog can find unique counts in constant memory (and a known error bound) - 2 more HLL can be merged and their merge is both associative and commutative - can be expressed as a monoid monoids - approximates

Slide 35

Slide 35 text

approximate stats now is better than accurate stats tomorrow

Slide 36

Slide 36 text

- Stats naturally can be expressed as Monoids learnings so far

Slide 37

Slide 37 text

- Stats naturally can be expressed as Monoids - Monoids given they are associative (and some are also commutative), we can exploit them for massive parallel processing learnings so far

Slide 38

Slide 38 text

- Stats naturally can be expressed as Monoids - Monoids given they are associative (and some are also commutative), we can exploit them for massive parallel processing - We need our stats to be real time even if they’re approximate for some metrics as long as the error bounds are known learnings so far

Slide 39

Slide 39 text

abel

Slide 40

Slide 40 text

Written in Scala Backed by RocksDB Uses twitter/algebird for HLL Uses Kafka for stats delivery abel Consumes stats in (near) Realtime Expose aggregations over HTTP Crunches 1M events in less than 15 seconds on 1 machine

Slide 41

Slide 41 text

stats.service.ix abel data flow Count(“a”, 1L) Unique(“a”, 1L) Count(“c”, 1L) Unique(“a”, 1L) Count(“b”, 1L)

Slide 42

Slide 42 text

abel: internals

Slide 43

Slide 43 text

case class Metric[T <: Aggregate[T]] (key: Key, value: T with Aggregate[T]) abel internals Metric = Key * Aggregate (Semigroup)

Slide 44

Slide 44 text

trait Aggregate[T <: Aggregate[_]] { self: T => def plus(another: T): T def show: JsValue } abel internals

Slide 45

Slide 45 text

case class Time(time: Long, granularity: Long) case class Key(name:String, tags:SortedSet[String], time:Time = Time.Forever) abel internals Key = Name * Tags * Granularity * Timestamp

Slide 46

Slide 46 text

client.send(Metric(Key( name = “unique-ups-per-hour”, tags = SortedSet(“site:www.amazon.com”), time = Time.ThisHour ), UniqueCount(“825633348769”)) abel internals

Slide 47

Slide 47 text

Let’s find Unique count of a UPC occurring per site and across all sites at the granularities of every hour, every day and overall. That would need 6 metrics per record. abel internals

Slide 48

Slide 48 text

abel internals client.send(Metrics( “unique-upcs”, (tag(“site:www.amazon.com”) | `#`) * (perhour | perday | forever) * now, UniqueCount(“825633348769”) ))

Slide 49

Slide 49 text

abel: distributed (experimental) made possible with suuchi (github.com/ashwanthkumar/suuchi)

Slide 50

Slide 50 text

- Peer-to-Peer system built using Suuchi - Kafka consumer auto rebalances the partitions across instances - Uses scatter-gather primitive in Suuchi to perform query time reductions of the metrics before serving it to the users distributed abel

Slide 51

Slide 51 text

1.1.1.1 1.1.1.2 1.1.1.3 A stats.service.ix 1.1.1.1 1.1.1.2 1.1.1.3 DNS based Load Balancing distributed abel architecture Count(“a”, 1L) Unique(“a”, 1L) Count(“c”, 1L) Unique(“a”, 1L) Count(“b”, 1L) monoid.plus monoid.plus monoid.plus

Slide 52

Slide 52 text

twitter/algebird ashwanthkumar/suuchi

Slide 53

Slide 53 text

questions? https://github.com/ashwanthkumar/large-scale-business-stats-talk

Slide 54

Slide 54 text

-- . - .-

Slide 55

Slide 55 text

suuchi toolkit for building distributed function shipping applications github.com/ashwanthkumar/suuchi

Slide 56

Slide 56 text

rocksdb

Slide 57

Slide 57 text

Open source by facebook Fast persistent KV store Server Workloads Embeddable Optimized for SSDs rocksdb Fork of LevelDB Modelled after BigTable LSM Tree based SST files Written in C++

Slide 58

Slide 58 text

Simple C++ API Has bindings in C Java Go Python rocksdb

Slide 59

Slide 59 text

rocksdb @indix

Slide 60

Slide 60 text

- Serving our API in production for 3+ years - Search on hierarchical documents - Dynamic fields didn’t scale well on Solr - Brand / Store / Category Counts for a filter - Price History Service - More than a billion prices and serve online to REST queries rocksdb @indix

Slide 61

Slide 61 text

- Stats (as Monoids) Storage System - All we want was approximate aggregates real-time - HTML Archive System - Stores ~120TB of url and timestamp indexed HTML pages - Real-time scheduler for our crawlers - Finds out which of the 20 urls to crawl now out of 3+ billion urls - Helps crawler crawl 20+ million urls everyday rocksdb @indix

Slide 62

Slide 62 text

recursive reduction

Slide 63

Slide 63 text

- sum / multiplication - (sorted) top-K elements - operations on a graph - eg. link reach on twitter graph - function should be associative and optionally commutative recursive reduction

Slide 64

Slide 64 text

EOF