Aggregators are Applicative Functors
Functor: has a map method
map(t: A[T])(fn: T => U): A[U]
Applicative: has a join method:
def join(t: A[T], u: A[U]): A[(T, U)]
Monad: has a flatMap method:
def flatMap(t: A[T])(fn: T => A[U]): A[U]
Slide 34
Slide 34 text
Aggregators are Applicative Functors
Functor: has a map method
map(t: A[T])(fn: T => U): A[U]
Applicative: has a join method:
def join(t: A[T], u: A[U]): A[(T, U)]
Monad: has a flatMap method:
def flatMap(t: A[T])(fn: T => A[U]): A[U]
Slide 35
Slide 35 text
Let’s go to the REPL
http://bit.ly/AggregatingWithAlice
https://gist.github.com/johnynek/
814fc1e77aad1d295bb7
Slide 36
Slide 36 text
Aggregators “just work” with scala collections
Aggregators are built in to Scalding
Aggregators are easy to use with Spark
Slide 37
Slide 37 text
@Twitter
Algebird with spark:
https://github.com/twitter/algebird/pull/397
37
Slide 38
Slide 38 text
@Twitter
Algebird with spark:
https://github.com/twitter/algebird/pull/397
38
Slide 39
Slide 39 text
Key Points
1) Aggregators encapsulate very general query
logic independent of how it is executed (in
memory, scalding, spark, you name it)
2) Aggregators compose so you can define parts
you use, and easily glue them together
3) Algebird has many advanced, well tested
Aggregators: TopK, HyperLogLog,
CountMinSketch, Mean, Stddev, …
Slide 40
Slide 40 text
Oscar Boykin @posco / oscar@twitter.com
Algebird has these aggregators and more:
https://github.com/twitter/algebird