P. Oscar Boykin
January 10, 2015
2k

# Aggregators: modeling data queries functionally

This talk introduces Aggregators, an abstraction that makes it easy to build composable queries against big data sets. The Algebird library gives you access to many powerful aggregators that you can compose and use with Scalding, Spark or in your own code.

January 10, 2015

## Transcript

scala
3. ### @Twitter How to compute size of a list in Map/Reduce?

3 2 3 5 7 11 13 17
4. ### @Twitter How to compute size of a list in Map/Reduce?

4 2 3 5 7 11 13 17 1 1 1 1 1 1 1 map(x => 1)
5. ### @Twitter How to compute size of a list in Map/Reduce?

5 2 3 5 7 11 13 17 1 1 1 1 1 1 1 2 2 2 3 7 4 reduce {(x, y) => x+y}

13 17
9. ### @Twitter Getting the average 9 2 3 5 7 11

13 17 (1,2) (1,3) (1,5) (1,7) (1,11) (1,13) (1,17) map(x => (1,x))
10. ### @Twitter Getting the average 10 2 3 5 7 11

13 17 (1,2) (1,3) (1,5) (1,7) (1,11) (1,13) (1,17) 2,24 2, 5 3,41 7,58 4,17 2,12 reduce(Semigroup.plus)
11. ### @Twitter Getting the average 11 2 3 5 7 11

13 17 (1,2) (1,3) (1,5) (1,7) (1,11) (1,13) (1,17) 7,58 8.285 map(case (c, s) => s/c.toDouble)

13. ### trait Aggregator[In, Middle, Out] { def prepare(i: In): Middle def

semigroup: Semigroup[Middle] def present(m: Middle): Out } https://github.com/twitter/algebird

19. ### Not such a new idea. Scalding had a mapReduceMap function

in the ﬁrst release:

22. ### “Does not compose” is the new “is a piece of

crap” paraphrasing Dan Rosen @mergeconﬂict

= Aggregator

= Aggregator

32. ### map (prepare) reduce (semigroup) map (present) Joined Aggregator Aggregator *

Aggregator = Aggregator
33. ### Aggregators are Applicative Functors Functor: has a map method map(t:

A[T])(fn: T => U): A[U] Applicative: has a join method: def join(t: A[T], u: A[U]): A[(T, U)] Monad: has a ﬂatMap method: def ﬂatMap(t: A[T])(fn: T => A[U]): A[U]
34. ### Aggregators are Applicative Functors Functor: has a map method map(t:

A[T])(fn: T => U): A[U] Applicative: has a join method: def join(t: A[T], u: A[U]): A[(T, U)] Monad: has a ﬂatMap method: def ﬂatMap(t: A[T])(fn: T => A[U]): A[U]

36. ### Aggregators “just work” with scala collections Aggregators are built in

to Scalding Aggregators are easy to use with Spark