Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ScalaDays 2013: How Kiama helps with Hadoop computing

ScalaDays 2013: How Kiama helps with Hadoop computing

This talk gives a brief overview of the BigData tooling landscape and where Scoobi, a distributed collection Scala library for Hadoop, stands. Then it shows what are the challenges in translating Scoobi abstractions to Hadoop constructs and how Scala, as programming language, and Kiama (http://code.google.com/kiama), as a graph-processing library, can be leveraged to support this translation. In particular:
- How rewriting rules based on partial functions are a very succinct way to pre-process the computation graph and to optimise it.
- How attribute grammars can be used to implement general graph traversal algorithms

Eric Torreborre

June 11, 2013
Tweet

Other Decks in Technology

Transcript

  1. A framework Mapper Mapper Mapper Mapper Mapper Mapper Mapper Mapper

    Reducer Reducer “Hello”! “World”! (“Hello”, ! [1, 1, 1])! (“World”, ! [1, 1, 1, 1])! (“Hello”, 3)! (“World”, 4)! combiner (“World”, 1)! (“Hello”, 1)! “Hello”! “World”! “World”! “World”! “Hello”!
  2. Step  one:  create  nodes   Easy :-) Load   ParallelDo

      GroupByKey   Combine   Materialise  
  3. Transla=on?   Load   ParallelDo   GroupByKey   Combine  

    Materialise   Mapper   Mapper   Mapper   Mapper   Mapper   Mapper   Mapper   Mapper   Reducer   Reducer   (“Hello”, ! [1, 1, 1])! (“World”, ! [1, 1, 1, 1])! (“Hello”, 3)! (“World”, 4)! combiner   (“World”, 1)! (“Hello”, 1)!
  4. Step  two:  simplify   Node  0   ParallelDo  1  

    ParallelDo  2   Node  3   Node  0   fused   ParallelDo   Node  3   A => B! B => C! A => C!
  5. Kiama  tree   team! person1! person2! Eric! G.! Ben! M.!

    Torreborre! Lever! number1(xxx xxx xxx)! number1(xxx xxx xxx)!
  6. Kiama  strategies   “Control” combinators s1 ‘or’ s2! s1 ‘and’

    s2! s1 ‘non-deterministic or’ s2! if s1 then s2 else s3!
  7. Kiama  strategies   Traversals from the leaves to the top

    of the tree! start from bottom, stop as soon as succeed!
  8. Kiama  strategies,  warnings   Infinite loops! Can create semantically invalid

    trees Equality is used everywhere. Define equals and hashCode wisely => use unique ids on nodes The new tree is a copy of the old one. Traceability? Node attributes might be traversable!
  9. Kiama:  rewri=ng   Fusion rule for a DAG ParallelDo  1

      Load  1   ParallelDo  2   uses(pd1).size <= 1 ! ParallelDo  3   Dataflow direction! Graph direction!
  10. Kiama:  rewri=ng   Rewriting duplicates! PD1   Load  1  

    PD2   PD3   Dataflow direction! Graph direction! PD1’   Load  1’   PD2’   PD3’   PD1’   Load  1’   Root   Root   => use the MemoRewriter!
  11. Step  three:  define  MapReduce  jobs   Load   ParallelDo  1

      GroupByKey   Combine   A! (K, V)! (K, Iterable[V])! (K, V)! ParallelDo  2   B! Mapper   Reducer   combiner   MapReduce job!
  12. Step  three:  jobs  dependencies   Load   ParallelDo  1  

    GroupByKey1   ParallelDo  2   GroupByKey2   ParallelDo  3   Mscr1 depends on Mscr2! Find dependencies? Mscr1 Mscr2
  13. Graph  algorithm:  layering   Output1   Output3   1. selection!

    2. longest path to leaves! 3. group by longest path! Output2   Longest path?
  14. Kiama:  aJribute  grammars   node:  AJributable   child1   attribute!

    child2   children! parent! Attribute definition! Function! Values are memoised!
  15. Kiama:  aJribute  grammars   Output  1   Output  3  

    Output  2   leaves! leaves! leaves!
  16. Kiama:  aJribute  grammars   Output  1   Output  3  

    Output  2   longest = 1! longest = 2! longest = 2! leaves! leaves! leaves!
  17. Kiama:  aJribute  grammars   Layers all nodes in the tree!

    group by longest path! join with longest path! return nodes only!
  18. Step  four:  channels  defini=on   Load   ParallelDo  1  

    ParallelDo  2   GroupByKey1   ParallelDo  4   Output channel! ParallelDo  3   GroupByKey2   ParallelDo  5   Input channel! Output channel! Mapper   Reducer   tag1! tag2!
  19. Cycles n1   n2   n3   co-parent! co-parent! Graph

    direction! n4   n5   co-parent! Kiama:  aJribute  grammars  
  20. Kiama:  aJribute  grammars   Transitive uses usesTable:! ! n1 ->

    Set()! n2 -> Set(n1)! n3 -> Set(n1)! n4 -> Set(n2)! n5 -> Set(n2, n3)! ! Scalaz FTW!! n2   n4   n5   n1   n3  
  21. Scalaz   Update of a Map[K, Collection[V]] 1. if the

    key doesn’t exist, add a new key with an empty collection! 1.1 add the new element to the new collection! 2. Otherwise add the new element to the existing collection!
  22. Kiama:  aJribute  grammars   Mutation :-( Step 1! n1  

    n2   n3   children! parent! Step 2! n0   n1   n2   n3   children! parent! n4   Also for “attr”!
  23. Kiama:  aJribute  grammars   Directed graphs != trees ParallelDo  

    Materialise1   Load1   Load2   Materialise2   Graph direction! Dataflow direction! Where to start? Root  
  24. Kiama:  rewri=ng  +  aJribute  grammars   Combine  1   Load

     1   PD2   PD3   Dataflow direction! Graph direction! Relations change! Root   PD1’   Load  1   PD2   PD3   Root   parent?!
  25. Step  five:  execu=on   Layer1     Layer2    

    sink   value   mscr1   mscr2   mscr3   mscr4   mscr5  
  26. Recap It is possible to abstract the Hadoop “map-reduce” paradigm

    to a “distributed collections” one Scala is a flexible language to represent those computations Kiama helps tremendously for: •  modifying the graph •  computing nodes properties