Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monads and Monoids: from daily Java to Big Data analytics in Scala

Monads and Monoids: from daily Java to Big Data analytics in Scala

[Presented at JavaDay 2015]

Finally, after two decades of evolution, Java 8 made a step towards functional programming. What can Java learn from other mature functional languages? How to leverage obscure mathematical abstractions such as Monad or Monoid in practice? Usually people find it scary and difficult to understand. Oleksiy will explain these concepts in simple words to give a feeling of powerful tool applicable in many domains, from daily Java and Scala routines to Big Data analytics with Storm or Hadoop.

Oleksii Diagiliev

October 02, 2015
Tweet

More Decks by Oleksii Diagiliev

Other Decks in Programming

Transcript

  1. Oleksiy Dyagilev

    View Slide

  2. • lead software engineer in epam
    • working on scalable computing and data grids (GigaSpaces, Storm, Spark)
    • blog http://dyagilev.org

    View Slide

  3. • Abstract Algebra (1900s?) and Category Theory (1940s)
    • Mathematicians study abstract structures and relationships between them

    View Slide

  4. • Abstract Algebra (1900s?) and Category Theory (1940s)
    • Mathematicians study abstract structures and relationships between them
    • Early of 1990s, Eugenio Moggi described the general use of monad to
    structure programs

    View Slide

  5. • Abstract Algebra (1900s?) and Category Theory (1940s)
    • Mathematicians study abstract structures and relationships between them
    • Early of 1990s, Eugenio Moggi described the general use of monad to
    structure programs
    • Early of 1990s, monad appeared in Haskell, a purely functional language.
    As well as other concepts such as Functor, Monoid, Arrow, etc

    View Slide

  6. • Abstract Algebra (1900s?) and Category Theory (1940s)
    • Mathematicians study abstract structures and relationships between them
    • Early of 1990s, Eugenio Moggi described the general use of monad to
    structure programs
    • Early of 1990s, monad appeared in Haskell, a purely functional language.
    As well as other concepts such as Functor, Monoid, Arrow, etc
    • 2003, Martin Odersky creates Scala, a languages that unifies object-
    oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc.

    View Slide

  7. • Abstract Algebra (1900s?) and Category Theory (1940s)
    • Mathematicians study abstract structures and relationships between them
    • Early of 1990s, Eugenio Moggi described the general use of monad to
    structure programs
    • Early of 1990s, monad appeared in Haskell, a purely functional language.
    As well as other concepts such as Functor, Monoid, Arrow, etc
    • 2003, Martin Odersky creates Scala, a languages that unifies object-
    oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc.
    • 2014, Java 8 released. Functional programming support – lambda, streams

    View Slide

  8. • How abstractions from Math (Category Theory, Abstract Algebra) help in functional programming & Big Data
    • How to leverage them and become a better programmer

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. User user = findUser(userId);
    if (user != null) {
    Address address = user.getAddress();
    if (address != null) {
    String zipCode = address.getZipCode();
    if (zipCode != null) {
    City city = findCityByZipCode(zipCode);
    if (city != null) {
    return city.getName();
    }
    }
    }
    }
    return null;
    Example #1

    View Slide

  18. Optional cityName = findUser(userId)
    .flatMap(user -> user.getAddress())
    .flatMap(address -> address.getZipCode())
    .flatMap(zipCode -> findCityByZipCode(zipCode))
    .map(city -> city.getName());
    which
    may not return a result.
    Refactored with Optional

    View Slide

  19. Stream employees = companies.stream()
    .flatMap(company -> company.departments())
    .flatMap(department -> department.employees());
    Example #2
    which can return several values.

    View Slide

  20. • container with a type M (e.g. Optional)
    • method M flatMap(T -> M) (e.g. flatMap(T -> Optional))
    • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x))

    View Slide

  21. • container with a type M (e.g. Optional)
    • method M flatMap(T -> M) (e.g. flatMap(T -> Optional))
    • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x))
    M map(f) { return flatMap(x -> unit(f(x))) }
    Bonus: now we can define M map(T -> U)

    View Slide

  22. • container with a type M (e.g. Optional)
    • method M flatMap(T -> M) (e.g. flatMap(T -> Optional))
    • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x))
    1. Left identity: unit(x).flatMap(f) = f(x)
    2. Right identity: m.flatMap(x -> unit(x)) = m
    3. Associativity: m.flatMap(f).flatMap(g) = m.flatMap(x -> f(x).flatMap(g)))
    M map(f) { return flatMap(x -> unit(f(x))) }
    Bonus: now we can define M map(T -> U)

    View Slide

  23. Optional user = findUser(userId);
    Optional order = findOrder(orderId);
    Optional payment = findPayment(orderId);
    Optional placement = user
    .flatMap(u ->
    (order.flatMap(o ->
    (payment.map(p -> submitOrder(u, o, p))))));
    Java: looks ugly 

    View Slide

  24. Optional user = findUser(userId);
    Optional order = findOrder(orderId);
    Optional payment = findPayment(orderId);
    Optional placement = user
    .flatMap(u ->
    (order.flatMap(o ->
    (payment.map(p -> submitOrder(u, o, p))))));
    Java: looks ugly 
    • Scala, for-comprehension
    • Haskell, do-notation
    • F#, computational expressions

    View Slide

  25. Optional user = findUser(userId);
    Optional order = findOrder(orderId);
    Optional payment = findPayment(orderId);
    Optional placement = user
    .flatMap(u ->
    (order.flatMap(o ->
    (payment.map(p -> submitOrder(u, o, p))))));
    Java: looks ugly 
    val placement =
    for {
    u <- findUser(userId)
    o <- findOrder(orderId)
    p <- findPayment(orderId)
    } yield submitOrder(u, o, p)
    Scala: built-in monad Support 
    • Scala, for-comprehension
    • Haskell, do-notation
    • F#, computational expressions

    View Slide

  26. View Slide

  27. trait Parser[T] extends (String => ParseResult[T])
    sealed abstract class ParseResult[T]
    case class Success[T](result: T, rest: String) extends ParseResult[T]
    case class Failure() extends ParseResult[Nothing]
    val letter: Parser[Char] = …
    val digit: Parser[Char] = …
    val space: Parser[Char] = …
    def map[U](f: T => U): Parser[U] = parser { in => this(in) map f }
    def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f }
    def * : Parser[List[T]] = …

    View Slide

  28. trait Parser[T] extends (String => ParseResult[T])
    sealed abstract class ParseResult[T]
    case class Success[T](result: T, rest: String) extends ParseResult[T]
    case class Failure() extends ParseResult[Nothing]
    val letter: Parser[Char] = …
    val digit: Parser[Char] = …
    val space: Parser[Char] = …
    def map[U](f: T => U): Parser[U] = parser { in => this(in) map f }
    def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f }
    def * : Parser[List[T]] = …
    val userParser = for {
    firstName <- letter.*
    _ <- space
    lastName <- letter.*
    _ <- space
    phone <- digit.*} yield User(firstName, lastName, phone)
    “John Doe 0671112222”

    View Slide

  29. scala.Option java.Optional Absence of value
    scala.List java.Stream Multiple results
    scala.Future scalaz.Task java.CompletableFuture Asynchronous computations
    scalaz.Reader Read from shared environment
    scalaz.Writer Collect data in addition to computed values
    scalaz.State Maintain state
    scala.Try scalaz.\/ Handling failures

    View Slide

  30. • Remove boilerplate
    • Modularity: separate computations from combination strategy
    • Composability: compose computations from simple ones
    • Improve maintainability
    • Better readability
    • Vocabulary

    View Slide

  31. View Slide

  32. New data
    All data Batch view
    Real-time view
    Data
    stream
    Batch processing
    Real-time processing
    Serving layer
    Query
    and merge

    View Slide

  33. • Write job logic once and run on many Platforms(Hadoop, Storm)
    • Library authors talk about monoids all the time 

    View Slide

  34. • Write job logic once and run on many Platforms(Hadoop, Storm)
    • Library authors talk about monoids all the time 
    def wordCount[P <: Platform[P]]
    (source: Producer[P, String], store: P#Store[String, Long]) =
    source.flatMap { sentence =>
    toWords(sentence).map(_ -> 1L)
    }.sumByKey(store)

    View Slide

  35. • Write job logic once and run on many Platforms(Hadoop, Storm)
    • Library authors talk about monoids all the time 
    def wordCount[P <: Platform[P]]
    (source: Producer[P, String], store: P#Store[String, Long]) =
    source.flatMap { sentence =>
    toWords(sentence).map(_ -> 1L)
    }.sumByKey(store)
    def sumByKey(store: P#Store[K, V])(implicit semigroup: Semigroup[V]): Summer[P, K, V] = …

    View Slide

  36. Given a set S and a binary operation +, we say that (, +) is a Semigroup if ∀ , , ∈ :
    • Closure: + ∈
    • Associativity: ( + ) + = + ( + )
    Monoid is a semigroup with identity element:
    • Identity: ∃ ∈ : + = + =
    • 3 * 2 (numbers under multiplication, 1 is the identity element)
    • 1 + 5 (numbers under addition, 0 is the identity element)
    • “ab” + “cd” (strings under concatenation, empty string is the identity element)
    • many more

    View Slide

  37. Input
    data
    map
    map
    map
    map
    reduce
    reduce
    reduce
    output
    Having a sequence of elements of monoid M,
    we can reduce them into a final value
    Associativity ensure that we can parallelize computation(not exactly true)
    Identity allows to skip elements that don’t affect the result

    View Slide

  38. Associativity: ( + ) + = + ( + )
    General Associativity Theorem
    https://proofwiki.org/wiki/General_Associativity_Theorem
    given:
    + + + + + + + ℎ
    you can place parentheses anywhere
    (( + ) + ( + )) + ( + + + ℎ )
    or
    ( + + + ) + ( + + + ℎ)

    View Slide


  39. + + + +
    +
    +
    +

    View Slide


  40. + + + +
    +
    +
    +

    View Slide

  41. a b c d e f g h
    a + b + c + d + e + f
    Batch processing
    Real-time processing
    0
    1
    2
    3
    4
    5
    6
    7
    time
    1h now
    Real-time sums from 0,
    each batch
    Batch proc. recomputes
    total sum

    View Slide

  42. a b c d e f g h
    a + b + c + d + e + f
    Batch processing
    Real-time processing
    0
    1
    2
    3
    4
    5
    6
    7
    time
    1h now
    Query
    and sum
    real-time + batch
    ( + + + + + ) + + ℎ
    (this is where Semigroup required)

    View Slide

  43. View Slide

  44. Bloom filter is a space-efficient probabilistic data structure to test presence of an element in a set
    0 0 0 0 0 0 0 0 0 0 0 0

    Operations:
    • Insert element
    • Query if element is present. The answer is either No or Maybe (false positives are possible)
    Consists of:
    • hash functions: ℎ1
    , ℎ2
    , … ℎ
    • bit array of bits

    View Slide

  45. 0 0 1 0 0 0 0 1 0 1 0 0
    ℎ1
    () ℎ2
    () … ℎ
    ()

    set bit value to 1

    View Slide

  46. 0 0 1 0 1 0 1 1 0 0 0 0
    ℎ1
    () ℎ2
    () … ℎ
    ()

    check if all bits are set to 1

    View Slide

  47. 0 0 1 0 1 0 0 1 0 0 0 0
    Filter A: {1
    , 2
    , 3
    }
    1 0 1 0 0 0 0 0 1 0 0 0
    Filter B: {4
    , 5
    , 6
    }
    + OR
    1 0 1 0 1 0 0 1 1 0 0 0
    Filter A + B: {1
    , 2
    , 3
    , 4
    , 5
    , 6
    }

    View Slide

  48. A few can be found in in Algebird (Abstract Algebra for Scala) https://github.com/twitter/algebird/
    • Bloom Filter
    • HyperLogLog
    • CountMinSketch
    • TopK
    • etc

    View Slide

  49. • Monad is just a useful pattern in functional programming
    • You don’t need to understand Category Theory to use Monads
    • Once you grasp the idea, you will see this pattern everywhere
    • Semigroup (commutative) and monoid define properties useful in distributed computing and Lambda Architecture.
    • It’s all about associativity and commutativity. No nonsense!

    View Slide

  50. View Slide