Slide 1

Slide 1 text

Using Monoids for Large Scale Aggregates Scala.io 2017

Slide 2

Slide 2 text

Sriram Ramachandrasekaran @brewkode Principal Engineer, Indix

Slide 3

Slide 3 text

Ashwanth Kumar @_ashwanthkumar Principal Engineer, Indix

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

class Crawler { def crawl(url: String) { agent.doCrawl(url) metric.count(“urls_crawled”, 1L) } }

Slide 6

Slide 6 text

1 1 1 1 1 1 1 1 1 +

Slide 7

Slide 7 text

1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 + +

Slide 8

Slide 8 text

1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 8 1 + = 9 Total URLs Crawled + +

Slide 9

Slide 9 text

1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 8 1 = 9 Total URLs Crawled (1+1+1+1)+(1+1+1+1)+1 = 4+4+1 (4+5) = (8+1) = 9 + + +

Slide 10

Slide 10 text

(1+1+1+1)+(1+1+1+1)+1 = 4+4+1 (4+5) = (8+1) = 9 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 8 1 = 9 Total URLs Crawled Associativity + + +

Slide 11

Slide 11 text

class Crawler { def crawl(url: String) { val page = agent.doCrawl(url) metric.average(“response_times”, page.responseTime) } }

Slide 12

Slide 12 text

1200 3600 4800

Slide 13

Slide 13 text

1200 3600 4800 [1200,1] [3600,1] [4800,1] +

Slide 14

Slide 14 text

1200 3600 4800 [1200,1] [3600,1] [4800,1] [4800,2] [4800,1] + +

Slide 15

Slide 15 text

1200 3600 4800 [1200,1] [3600,1] [4800,1] [4800,2] [4800,1] = 3200 Average Response Time + +

Slide 16

Slide 16 text

Generalizing Sum and Average ● Takes 2 numbers and produces another number (binary operation) - Add: simple add of two numbers - Average: maintain two values - sum and count & “adds” each of them ● Ordering of operations don’t matter (commutative) ● Grouping of operations don’t matter (associative) ● Ignores 0s

Slide 17

Slide 17 text

Abstraction ● We are dealing with Sets ● Associative binary operations ● Identity element exists (for additions - it’s zero)

Slide 18

Slide 18 text

Abstraction ● We are dealing with Sets ● Associative binary operations ● Identity element exists (for additions - it’s zero) = Monoid

Slide 19

Slide 19 text

Abstraction ● We are dealing with Sets ● Associative binary operations ● Identity element exists (for additions - it’s zero) ● Add Commutativity to the mix = Commutative Monoid

Slide 20

Slide 20 text

Aggregations at Scale ● Associative and Commutative ○ Makes it an EMBARRASSINGLY PARALLEL* problem ● User Queries are handled via Scatter Gather ○ Reduce on individual nodes ○ Re-reduce on the results and return as the response

Slide 21

Slide 21 text

Talk is cheap, show me the code!

Slide 22

Slide 22 text

Introducing Abel

Slide 23

Slide 23 text

● Monoid based aggregations ● Durable delivery via Kafka ● Persistence via RocksDB ● User queries handled via Scatter Gather ● Scala all the way Abel twitter/algebird ashwanthkumar/suuchi

Slide 24

Slide 24 text

stats.service.ix Count(“a”, 1L) Count(“c”, 1L) Unique(“ua”, “a”) Count(“b”, 1L) Abel Data Flow Average(“a”, 5682)

Slide 25

Slide 25 text

Abel Internals Metric = Key * Aggregate (Monoid) case class Metric[T <: Aggregate[T]] (key: Key, value: T with Aggregate[T])

Slide 26

Slide 26 text

Abel Internals Metric = Key * Aggregate (Monoid) trait Aggregate[T <: Aggregate[_]] { self: T => def plus(another: T): T def show: JsValue }

Slide 27

Slide 27 text

Abel Internals Key = Name * Tags * Time case class Time(time: Long, granularity: Long) case class Key(name:String, tags:SortedSet[String], time:Time = Time.Forever)

Slide 28

Slide 28 text

Abel Internals client.send(Metric(Key( name = “unique-url-per-hour”, tags = SortedSet(“www.amazon.com”), time = Time.ThisHour ), UniqueCount(“http://...”))

Slide 29

Slide 29 text

client.send(Metrics( “unique-urls”, tag(“site:www.amazon.com”) * (perday | forever) * now, UniqueCount(“http://...”) )) Abel Internals To find Unique count of URLs crawled per site for every day and forever.

Slide 30

Slide 30 text

client.send(Metrics( “unique-urls”, (tag(“site:www.amazon.com”) | `#`) * (perday | forever) * now, UniqueCount(“http://...”) )) Abel Internals To find Unique count of URLs crawled per site and across sites for every day and forever.

Slide 31

Slide 31 text

client.send(Metrics( “unique-urls”, (tag(“site:www.amazon.com”) | `#`) * (perday | forever) * now, UniqueCount(“http://...”) )) Abel Internals To find Unique count of URLs crawled per site and across sites for every day and forever. It is implemented as a Ring.

Slide 32

Slide 32 text

stats.service.ix Count(“a”, 1L) Count(“c”, 1L) Abel v1 Average(“a”, 5682)

Slide 33

Slide 33 text

stats.service.ix Count(“a”, 1L) Count(“c”, 1L) Unique(“ua”, “a”) Count(“b”, 1L) Abel v1 Average(“a”, 5682)

Slide 34

Slide 34 text

A stats.service.ix 1.1.1.1 1.1.1.2 1.1.1.3 aggregate.plus 1.1.1.2 Count(“a”, 1L) Average(“a”, 5682) Count(“c”, 1L) Count(“b”, 1L) Abel in Distributed Mode aggregate.plus 1.1.1.1 aggregate.plus 1.1.1.3 DNS based Load Balancing Unique(“ua”, “a”)

Slide 35

Slide 35 text

Scatter Gather - Average (123, 8) (3, 1) (12303, 24) Reduce Reduce Reduce

Slide 36

Slide 36 text

(123, 8) (3, 1) (12303, 24) Re-reduce (12429, 33) = 376.6 Scatter Gather - Average

Slide 37

Slide 37 text

● Monoid based aggregations ● Durable delivery via Kafka ● Persistence via RocksDB ● User queries handled via Scatter Gather ● Scala all the way Abel twitter/algebird ashwanthkumar/suuchi

Slide 38

Slide 38 text

Monoid Cheatsheet Stat / Metric Type Abstraction Count of Urls Sum Average Response Time Sum with Count & Total Unique count of urls crawled HyperLogLog HTTP Response Code Distribution Count-Min Sketch Top K Websites with poor response time Heap with K elements Website response times percentiles QTree (loosely based on q-digest) Histogram of response times Array(to model bins) and slotwise Sum

Slide 39

Slide 39 text

Credits VinothKumar Raman @eventaken Swathi Ravichandran @swathrav Thank you

Slide 40

Slide 40 text

Meta

Slide 41

Slide 41 text

Ring ● Abelian group under Addition ○ Associative ○ Commutative ○ Identity ○ Inverse ● Monoid under multiplication ○ Associative ○ Multiplicative Identity ● Multiplication is distributive with respect to addition ○ (a + b) . c = (ac + bc) Right Distributivity ○ a . (b + c) = (ab + ac) Left Distributivity

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

suuchi toolkit for building distributed function shipping applications github.com/ashwanthkumar/suuchi

Slide 45

Slide 45 text

Slides designed by www.swathiravichandran.com