ScalaDays 2013: How Kiama helps with Hadoop computing

A walk down the beach: How Kiama helps with Hadoop
computing ScalaDays

What’s in there for you? Big data iama

Big data

Big data tools Storm

Scoobi

A collection-like DSL String! (Int, Int)! (Int, Iterable[Int])! (Int, Int)!

Hadoop

Hadoop / Java

Scoobi

What is Scoobi? A compiler!

Hadoop

A framework Mapper Mapper Mapper Mapper Mapper Mapper Mapper Mapper
Reducer Reducer “Hello”! “World”! (“Hello”, ! [1, 1, 1])! (“World”, ! [1, 1, 1, 1])! (“Hello”, 3)! (“World”, 4)! combiner (“World”, 1)! (“Hello”, 1)! “Hello”! “World”! “World”! “World”! “Hello”!

Scoobi => Hadoop

Step one: create nodes Easy :-) Load ParallelDo
GroupByKey Combine Materialise

Step one: create nodes Load ParallelDo GroupByKey
Combine Materialise

Scoobi => Hadoop

Transla=on? Load ParallelDo GroupByKey Combine
Materialise Mapper Mapper Mapper Mapper Mapper Mapper Mapper Mapper Reducer Reducer (“Hello”, ! [1, 1, 1])! (“World”, ! [1, 1, 1, 1])! (“Hello”, 3)! (“World”, 4)! combiner (“World”, 1)! (“Hello”, 1)!

Step two: simplify Node 0 ParallelDo 1
ParallelDo 2 Node 3 Node 0 fused ParallelDo Node 3 A => B! B => C! A => C!

Kiama tree rewri=ng pattern matching partial function rule combinators

Kiama tree

Kiama tree team! person1! person2! Eric! G.! Ben! M.!
Torreborre! Lever! number1(xxx xxx xxx)! number1(xxx xxx xxx)!

Kiama strategies

Kiama strategies “Family” combinators parent! child1! child3! child2!

Kiama strategies “Control” combinators s1 ‘or’ s2! s1 ‘and’
s2! s1 ‘non-deterministic or’ s2! if s1 then s2 else s3!

Kiama strategies try something! iterate! succeed if s fails!
“Control” combinators

Kiama strategies Traversals from the top of the tree
to the leaves!

Kiama strategies Traversals from the leaves to the top
of the tree! start from bottom, stop as soon as succeed!

Kiama strategies Example Capitalize all last names in the
team!

Kiama strategies Scoobi examples All Scoobi strategies!

Kiama strategies Scoobi examples Truncate the graph!

Kiama strategies, warnings Inﬁnite loops! Can create semantically invalid
trees Equality is used everywhere. Deﬁne equals and hashCode wisely => use unique ids on nodes The new tree is a copy of the old one. Traceability? Node attributes might be traversable!

Kiama: rewri=ng Fusion rule for a DAG ParallelDo 1
Load 1 ParallelDo 2 uses(pd1).size <= 1 ! ParallelDo 3 Dataflow direction! Graph direction!

Kiama: rewri=ng Rewriting duplicates! PD1 Load 1
PD2 PD3 Dataflow direction! Graph direction! PD1’ Load 1’ PD2’ PD3’ PD1’ Load 1’ Root Root => use the MemoRewriter!

PaJern matching / implicits / case classes Doesn’t play
well with pattern matching and rewriting!

Scoobi => Hadoop

Step three: deﬁne MapReduce jobs Load ParallelDo 1
GroupByKey Combine A! (K, V)! (K, Iterable[V])! (K, V)! ParallelDo 2 B! Mapper Reducer combiner MapReduce job!

Step three: jobs dependencies Load ParallelDo 1
GroupByKey1 ParallelDo 2 GroupByKey2 ParallelDo 3 Mscr1 depends on Mscr2! Find dependencies? Mscr1 Mscr2

Graph algorithm: layering Output1 Output3 1. selection!
2. longest path to leaves! 3. group by longest path! Output2 Longest path?

Kiama: aJribute grammars node: AJributable child1 attribute!
child2 children! parent! Attribute definition! Function! Values are memoised!

Kiama: aJribute grammars Output 1 Output 3
Output 2 leaves! leaves! leaves!

Kiama: aJribute grammars Output 1 Output 3
Output 2 longest = 1! longest = 2! longest = 2! leaves! leaves! leaves!

Kiama: aJribute grammars Longest path to a node

Kiama: aJribute grammars Layers all nodes in the tree!
group by longest path! join with longest path! return nodes only!

Scoobi => Hadoop

Step four: channels deﬁni=on Load ParallelDo 1
ParallelDo 2 GroupByKey1 ParallelDo 4 Output channel! ParallelDo 3 GroupByKey2 ParallelDo 5 Input channel! Output channel! Mapper Reducer tag1! tag2!

Cycles n1 n2 n3 co-parent! co-parent! Graph
direction! n4 n5 co-parent! Kiama: aJribute grammars

Kiama: aJribute grammars Transitive uses n2 n3
n5 children! root! n1 n3

Kiama: aJribute grammars Transitive uses usesTable:! ! n1 ->
Set()! n2 -> Set(n1)! n3 -> Set(n1)! n4 -> Set(n2)! n5 -> Set(n2, n3)! ! Scalaz FTW!! n2 n4 n5 n1 n3

Scalaz Update of a Map[K, Collection[V]] 1. if the
key doesn’t exist, add a new key with an empty collection! 1.1 add the new element to the new collection! 2. Otherwise add the new element to the existing collection!

Scalaz Monoids! Can add 2 maps together! Because can
add to a Set!

Kiama: aJribute grammars Mutation :-( Step 1! n1
n2 n3 children! parent! Step 2! n0 n1 n2 n3 children! parent! n4 Also for “attr”!

Kiama: aJribute grammars Directed graphs != trees ParallelDo
Materialise1 Load1 Load2 Materialise2 Graph direction! Dataflow direction! Where to start? Root

Kiama: rewri=ng + aJribute grammars Combine 1 Load
1 PD2 PD3 Dataflow direction! Graph direction! Relations change! Root PD1’ Load 1 PD2 PD3 Root parent?!

Scoobi

Step ﬁve: execu=on Layer1 Layer2
sink value mscr1 mscr2 mscr3 mscr4 mscr5

Recap Big data iama

Recap It is possible to abstract the Hadoop “map-reduce” paradigm
to a “distributed collections” one Scala is a ﬂexible language to represent those computations Kiama helps tremendously for: •  modifying the graph •  computing nodes properties

Thank you!

PS You like the Aussie way of life + Scala?
=> Apply at NICTA!

ScalaDays 2013: How Kiama helps with Hadoop com...

ScalaDays 2013: How Kiama helps with Hadoop computing

Other Decks in Technology

Featured

Transcript