Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scala solution for the funclub Berlin meetup

Scala solution for the funclub Berlin meetup

George Leontiev

February 28, 2013
Tweet

More Decks by George Leontiev

Other Decks in Programming

Transcript

  1. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Scala solution How to make your Scala controll effects a-la Haskell George Leontiev deltamethod GmbH February 28, 2013 (λx.folonexlambda-calcul.us)@ folone.info George Leontiev deltamethod GmbH Scala solution
  2. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Introduction https://github.com/folone/funclub-words Note: I intentionally made it more ”interesting” to show more neat scalaz stuff I won’t cover everything though. If something seems strange, please ask. George Leontiev deltamethod GmbH Scala solution
  3. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Core Main functions wordCount : : S t r i n g → Map ( String , I n t ) acceptedChars : : Char → Boolean Helper functions time : : ( a → IO b) → IO b c l o s e : : C l o s e a b l e a ⇒ a →IO () George Leontiev deltamethod GmbH Scala solution
  4. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Core def acceptedChars ( c : Char ) = { v a l sum : ( ( ( Boolean , Boolean ) , Boolean )) ⇒ Boolean = _ match { case (( a , b ) , c ) ⇒ a | | b | | c } v a l fun = ((_: Char ) . i s L e t t e r O r D i g i t ) &&& ((_: Char ) . isWhitespace ) &&& ((_: Char ) == ’ − ’) ( fun >>> sum )( c ) } http://www.haskell.org/arrows/index.html George Leontiev deltamethod GmbH Scala solution
  5. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Core def wordCount ( t e x t : S t r i n g ) : Map[ String , I n t ] = t e x t . f i l t e r ( acceptedChars ) // s p l i t words . toLowerCase . s p l i t (”\\W” ) . t o L i s t // group . groupBy ( i d e n t i t y ) // c a l c u l a t e group s i z e s . map { case ( key , value ) ⇒ key . trim → value . length } George Leontiev deltamethod GmbH Scala solution
  6. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Typeclass instances v a l N = 10 i m p l i c i t v a l mapInstances = new Show [ L i s t [ ( String , o v e r r i d e def shows ( l : L i s t [ ( String , I n t ) ] ) = l . f i l t e r N o t (_. _1 . isEmpty ) . sortBy(−_. _2) . take (N) . f o l d L e f t (””) { case ( acc , ( key , value )) ⇒ acc + ”\n” + key + ”: ” + value } } George Leontiev deltamethod GmbH Scala solution
  7. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Executing // f u n c t i o n : : S t r i n g → IO S t r i n g def mainIO ( path : S t r i n g ) = f o r { r e s u l t ← time ( f u n c t i o n ( path )) _ ← putStrLn ( r e s u l t ) } y i e l d () def main ( args : Array [ S t r i n g ] ) = { v a l path = args (0) // Yuck ! mainIO ( path ) . unsafePerformIO () } George Leontiev deltamethod GmbH Scala solution
  8. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Executing package i n f o . f o l o n e . words import s c a l a z ._, Scalaz ._ o b j e c t Main { def main ( args : Array [ S t r i n g ] ) { v a l path = args (0) v a l a c t i o n = WordsMemory . mainIO _ |+| WordsStream . mainIO _ |+| WordMachine . mainIO _ // Yuck ! a c t i o n ( path ) . unsafePerformIO () } } George Leontiev deltamethod GmbH Scala solution
  9. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . All set Let’s see how far we can push this solution. George Leontiev deltamethod GmbH Scala solution
  10. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . First attempt def w h o l e F i l e ( path : S t r i n g ) : IO [ S t r i n g ] = IO { Source . f r o m F i l e ( path ) }. bracket ( c l o s e ) { source ⇒ IO { v a l t e x t = source . mkString v a l r e s u l t = wordCount ( t e x t ) r e s u l t . t o L i s t . shows } } George Leontiev deltamethod GmbH Scala solution
  11. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . First attempt Works fine, but eats all the heap on a large enough file. George Leontiev deltamethod GmbH Scala solution
  12. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Second attempt def byLine ( path : S t r i n g ) : IO [ S t r i n g ] = IO { Source . f r o m F i l e ( path ) }. bracket ( c l o s e ) { source ⇒ IO { v a l stream = source . g e t L i n e s . toStream v a l r e s u l t = stream . map( wordCount ) . f o l d L e f t (Map. empty [ String , I n t ] ) { case ( acc , v ) ⇒ acc |+| v } r e s u l t . t o L i s t . shows } } George Leontiev deltamethod GmbH Scala solution
  13. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Second attempt Just what is this |+|? George Leontiev deltamethod GmbH Scala solution
  14. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Typeclasses i n s t a n c e Show [ ( String , I n t ) ] where . . . i n s t a n c e Show Monoid b ⇒ Map a b where . . . http://debasishg.blogspot.de/2010/06/scala-implicits-type-classes- here-i.html George Leontiev deltamethod GmbH Scala solution
  15. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Monoids (S, ⊗, 1) ∀a, b ∈ S : a ⊗ b ∈ S ∀a, b, c ∈ S : (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c) ∀a ∈ S : 1 ⊗ a = a ⊗ 1 = a http://apocalisp.wordpress.com/2010/06/14/on-monoids/ George Leontiev deltamethod GmbH Scala solution
  16. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Second attempt Pretty good, but can we do better? George Leontiev deltamethod GmbH Scala solution
  17. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Iteratees Scala machines (https://github.com/runarorama/scala-machines) https://dl.dropbox.com/u/4588997/Machines.pdf Gave similar performance on a by-line basis. Thought, three times faster if we provide a Process to ssplit it by words and then monoidally merge single-element Maps. George Leontiev deltamethod GmbH Scala solution
  18. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Iteratees – same as Stream def wordFreq ( path : S t r i n g ) = g e t F i l e L i n e s (new F i l e ( path ) , id outmap wordCount ) execute George Leontiev deltamethod GmbH Scala solution
  19. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Iteratees – 3x faster def splitWords ( t e x t : S t r i n g ) : L i s t [ S t r i n g ] = t e x t . f i l t e r ( acceptedChars ) . toLowerCase . s p l i t (”\\W” ) . t o L i s t v a l words : Process [ String , S t r i n g ] = ( f o r { s ← await [ S t r i n g ] _ ← traversePlan_ ( splitWords ( s ) ) ( emit ) } y i e l d ( ) ) r e p e a t e d l y def wordCount ( path : S t r i n g ) = g e t F i l e L i n e s (new F i l e ( path ) , ( id s p l i t words ) outmap ( _. f o l d ( l ⇒ (1 , Map. empty [ String , I n t ] ) , w ⇒ (0 , Map(w → 1 ) ) ) ) ) execute George Leontiev deltamethod GmbH Scala solution
  20. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Wordcounting software Scoobi http://nicta.github.com/scoobi/ Spark http://spark-project.org/ Scalding https://github.com/twitter/scalding/wiki/Type-safe-api- reference George Leontiev deltamethod GmbH Scala solution
  21. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Wordcounting I did not have time to try to use those. But turns out, this code should work for these ”as is”. George Leontiev deltamethod GmbH Scala solution
  22. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . That’s it Questions? George Leontiev deltamethod GmbH Scala solution