Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monads and Monoids: from daily Java to Big Data analytics in Scala

Monads and Monoids: from daily Java to Big Data analytics in Scala

[Presented at JavaDay 2015]

Finally, after two decades of evolution, Java 8 made a step towards functional programming. What can Java learn from other mature functional languages? How to leverage obscure mathematical abstractions such as Monad or Monoid in practice? Usually people find it scary and difficult to understand. Oleksiy will explain these concepts in simple words to give a feeling of powerful tool applicable in many domains, from daily Java and Scala routines to Big Data analytics with Storm or Hadoop.

Oleksii Diagiliev

October 02, 2015
Tweet

More Decks by Oleksii Diagiliev

Other Decks in Programming

Transcript

  1. • lead software engineer in epam • working on scalable

    computing and data grids (GigaSpaces, Storm, Spark) • blog http://dyagilev.org
  2. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians

    study abstract structures and relationships between them
  3. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians

    study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs
  4. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians

    study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc
  5. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians

    study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc.
  6. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians

    study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc. • 2014, Java 8 released. Functional programming support – lambda, streams
  7. • How abstractions from Math (Category Theory, Abstract Algebra) help

    in functional programming & Big Data • How to leverage them and become a better programmer
  8. User user = findUser(userId); if (user != null) { Address

    address = user.getAddress(); if (address != null) { String zipCode = address.getZipCode(); if (zipCode != null) { City city = findCityByZipCode(zipCode); if (city != null) { return city.getName(); } } } } return null; Example #1
  9. Optional<String> cityName = findUser(userId) .flatMap(user -> user.getAddress()) .flatMap(address -> address.getZipCode())

    .flatMap(zipCode -> findCityByZipCode(zipCode)) .map(city -> city.getName()); which may not return a result. Refactored with Optional
  10. • container with a type M<T> (e.g. Optional<T>) • method

    M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x))
  11. • container with a type M<T> (e.g. Optional<T>) • method

    M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x)) M<U> map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M<U> map(T -> U)
  12. • container with a type M<T> (e.g. Optional<T>) • method

    M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x)) 1. Left identity: unit(x).flatMap(f) = f(x) 2. Right identity: m.flatMap(x -> unit(x)) = m 3. Associativity: m.flatMap(f).flatMap(g) = m.flatMap(x -> f(x).flatMap(g))) M<U> map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M<U> map(T -> U)
  13. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment

    = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly 
  14. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment

    = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions
  15. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment

    = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  val placement = for { u <- findUser(userId) o <- findOrder(orderId) p <- findPayment(orderId) } yield submitOrder(u, o, p) Scala: built-in monad Support  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions
  16. trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T]

    case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = …
  17. trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T]

    case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = … val userParser = for { firstName <- letter.* _ <- space lastName <- letter.* _ <- space phone <- digit.*} yield User(firstName, lastName, phone) “John Doe 0671112222”
  18. scala.Option java.Optional Absence of value scala.List java.Stream Multiple results scala.Future

    scalaz.Task java.CompletableFuture Asynchronous computations scalaz.Reader Read from shared environment scalaz.Writer Collect data in addition to computed values scalaz.State Maintain state scala.Try scalaz.\/ Handling failures
  19. • Remove boilerplate • Modularity: separate computations from combination strategy

    • Composability: compose computations from simple ones • Improve maintainability • Better readability • Vocabulary
  20. New data All data Batch view Real-time view Data stream

    Batch processing Real-time processing Serving layer Query and merge
  21. • Write job logic once and run on many Platforms(Hadoop,

    Storm) • Library authors talk about monoids all the time 
  22. • Write job logic once and run on many Platforms(Hadoop,

    Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store)
  23. • Write job logic once and run on many Platforms(Hadoop,

    Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store) def sumByKey(store: P#Store[K, V])(implicit semigroup: Semigroup[V]): Summer[P, K, V] = …
  24. Given a set S and a binary operation +, we

    say that (, +) is a Semigroup if ∀ , , ∈ : • Closure: + ∈ • Associativity: ( + ) + = + ( + ) Monoid is a semigroup with identity element: • Identity: ∃ ∈ : + = + = • 3 * 2 (numbers under multiplication, 1 is the identity element) • 1 + 5 (numbers under addition, 0 is the identity element) • “ab” + “cd” (strings under concatenation, empty string is the identity element) • many more
  25. Input data map map map map reduce reduce reduce output

    Having a sequence of elements of monoid M, we can reduce them into a final value Associativity ensure that we can parallelize computation(not exactly true) Identity allows to skip elements that don’t affect the result
  26. Associativity: ( + ) + = + ( + )

    General Associativity Theorem https://proofwiki.org/wiki/General_Associativity_Theorem given: + + + + + + + ℎ you can place parentheses anywhere (( + ) + ( + )) + ( + + + ℎ ) or ( + + + ) + ( + + + ℎ)
  27. a b c d e f g h a +

    b + c + d + e + f Batch processing Real-time processing 0 1 2 3 4 5 6 7 time 1h now Real-time sums from 0, each batch Batch proc. recomputes total sum
  28. a b c d e f g h a +

    b + c + d + e + f Batch processing Real-time processing 0 1 2 3 4 5 6 7 time 1h now Query and sum real-time + batch ( + + + + + ) + + ℎ (this is where Semigroup required)
  29. Bloom filter is a space-efficient probabilistic data structure to test

    presence of an element in a set 0 0 0 0 0 0 0 0 0 0 0 0 Operations: • Insert element • Query if element is present. The answer is either No or Maybe (false positives are possible) Consists of: • hash functions: ℎ1 , ℎ2 , … ℎ • bit array of bits
  30. 0 0 1 0 0 0 0 1 0 1

    0 0 ℎ1 () ℎ2 () … ℎ () set bit value to 1
  31. 0 0 1 0 1 0 1 1 0 0

    0 0 ℎ1 () ℎ2 () … ℎ () check if all bits are set to 1
  32. 0 0 1 0 1 0 0 1 0 0

    0 0 Filter A: {1 , 2 , 3 } 1 0 1 0 0 0 0 0 1 0 0 0 Filter B: {4 , 5 , 6 } + OR 1 0 1 0 1 0 0 1 1 0 0 0 Filter A + B: {1 , 2 , 3 , 4 , 5 , 6 }
  33. A few can be found in in Algebird (Abstract Algebra

    for Scala) https://github.com/twitter/algebird/ • Bloom Filter • HyperLogLog • CountMinSketch • TopK • etc
  34. • Monad is just a useful pattern in functional programming

    • You don’t need to understand Category Theory to use Monads • Once you grasp the idea, you will see this pattern everywhere • Semigroup (commutative) and monoid define properties useful in distributed computing and Lambda Architecture. • It’s all about associativity and commutativity. No nonsense!