Slide 1

Slide 1 text

Oleksiy Dyagilev

Slide 2

Slide 2 text

• lead software engineer in epam • working on scalable computing and data grids (GigaSpaces, Storm, Spark) • blog http://dyagilev.org

Slide 3

Slide 3 text

• Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them

Slide 4

Slide 4 text

• Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs

Slide 5

Slide 5 text

• Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc

Slide 6

Slide 6 text

• Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc.

Slide 7

Slide 7 text

• Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc. • 2014, Java 8 released. Functional programming support – lambda, streams

Slide 8

Slide 8 text

• How abstractions from Math (Category Theory, Abstract Algebra) help in functional programming & Big Data • How to leverage them and become a better programmer

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

User user = findUser(userId); if (user != null) { Address address = user.getAddress(); if (address != null) { String zipCode = address.getZipCode(); if (zipCode != null) { City city = findCityByZipCode(zipCode); if (city != null) { return city.getName(); } } } } return null; Example #1

Slide 18

Slide 18 text

Optional cityName = findUser(userId) .flatMap(user -> user.getAddress()) .flatMap(address -> address.getZipCode()) .flatMap(zipCode -> findCityByZipCode(zipCode)) .map(city -> city.getName()); which may not return a result. Refactored with Optional

Slide 19

Slide 19 text

Stream employees = companies.stream() .flatMap(company -> company.departments()) .flatMap(department -> department.employees()); Example #2 which can return several values.

Slide 20

Slide 20 text

• container with a type M (e.g. Optional) • method M flatMap(T -> M) (e.g. flatMap(T -> Optional)) • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x))

Slide 21

Slide 21 text

• container with a type M (e.g. Optional) • method M flatMap(T -> M) (e.g. flatMap(T -> Optional)) • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x)) M map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M map(T -> U)

Slide 22

Slide 22 text

• container with a type M (e.g. Optional) • method M flatMap(T -> M) (e.g. flatMap(T -> Optional)) • constructor to put T into M; same as a static method M unit(T) (e.g. Optional.of(x)) 1. Left identity: unit(x).flatMap(f) = f(x) 2. Right identity: m.flatMap(x -> unit(x)) = m 3. Associativity: m.flatMap(f).flatMap(g) = m.flatMap(x -> f(x).flatMap(g))) M map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M map(T -> U)

Slide 23

Slide 23 text

Optional user = findUser(userId); Optional order = findOrder(orderId); Optional payment = findPayment(orderId); Optional placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly 

Slide 24

Slide 24 text

Optional user = findUser(userId); Optional order = findOrder(orderId); Optional payment = findPayment(orderId); Optional placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions

Slide 25

Slide 25 text

Optional user = findUser(userId); Optional order = findOrder(orderId); Optional payment = findPayment(orderId); Optional placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  val placement = for { u <- findUser(userId) o <- findOrder(orderId) p <- findPayment(orderId) } yield submitOrder(u, o, p) Scala: built-in monad Support  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T] case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = …

Slide 28

Slide 28 text

trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T] case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = … val userParser = for { firstName <- letter.* _ <- space lastName <- letter.* _ <- space phone <- digit.*} yield User(firstName, lastName, phone) “John Doe 0671112222”

Slide 29

Slide 29 text

scala.Option java.Optional Absence of value scala.List java.Stream Multiple results scala.Future scalaz.Task java.CompletableFuture Asynchronous computations scalaz.Reader Read from shared environment scalaz.Writer Collect data in addition to computed values scalaz.State Maintain state scala.Try scalaz.\/ Handling failures

Slide 30

Slide 30 text

• Remove boilerplate • Modularity: separate computations from combination strategy • Composability: compose computations from simple ones • Improve maintainability • Better readability • Vocabulary

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

New data All data Batch view Real-time view Data stream Batch processing Real-time processing Serving layer Query and merge

Slide 33

Slide 33 text

• Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time 

Slide 34

Slide 34 text

• Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store)

Slide 35

Slide 35 text

• Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store) def sumByKey(store: P#Store[K, V])(implicit semigroup: Semigroup[V]): Summer[P, K, V] = …

Slide 36

Slide 36 text

Given a set S and a binary operation +, we say that (, +) is a Semigroup if ∀ , , ∈ : • Closure: + ∈ • Associativity: ( + ) + = + ( + ) Monoid is a semigroup with identity element: • Identity: ∃ ∈ : + = + = • 3 * 2 (numbers under multiplication, 1 is the identity element) • 1 + 5 (numbers under addition, 0 is the identity element) • “ab” + “cd” (strings under concatenation, empty string is the identity element) • many more

Slide 37

Slide 37 text

Input data map map map map reduce reduce reduce output Having a sequence of elements of monoid M, we can reduce them into a final value Associativity ensure that we can parallelize computation(not exactly true) Identity allows to skip elements that don’t affect the result

Slide 38

Slide 38 text

Associativity: ( + ) + = + ( + ) General Associativity Theorem https://proofwiki.org/wiki/General_Associativity_Theorem given: + + + + + + + ℎ you can place parentheses anywhere (( + ) + ( + )) + ( + + + ℎ ) or ( + + + ) + ( + + + ℎ)

Slide 39

Slide 39 text

ℎ + + + + + + +

Slide 40

Slide 40 text

ℎ + + + + + + +

Slide 41

Slide 41 text

a b c d e f g h a + b + c + d + e + f Batch processing Real-time processing 0 1 2 3 4 5 6 7 time 1h now Real-time sums from 0, each batch Batch proc. recomputes total sum

Slide 42

Slide 42 text

a b c d e f g h a + b + c + d + e + f Batch processing Real-time processing 0 1 2 3 4 5 6 7 time 1h now Query and sum real-time + batch ( + + + + + ) + + ℎ (this is where Semigroup required)

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Bloom filter is a space-efficient probabilistic data structure to test presence of an element in a set 0 0 0 0 0 0 0 0 0 0 0 0 Operations: • Insert element • Query if element is present. The answer is either No or Maybe (false positives are possible) Consists of: • hash functions: ℎ1 , ℎ2 , … ℎ • bit array of bits

Slide 45

Slide 45 text

0 0 1 0 0 0 0 1 0 1 0 0 ℎ1 () ℎ2 () … ℎ () set bit value to 1

Slide 46

Slide 46 text

0 0 1 0 1 0 1 1 0 0 0 0 ℎ1 () ℎ2 () … ℎ () check if all bits are set to 1

Slide 47

Slide 47 text

0 0 1 0 1 0 0 1 0 0 0 0 Filter A: {1 , 2 , 3 } 1 0 1 0 0 0 0 0 1 0 0 0 Filter B: {4 , 5 , 6 } + OR 1 0 1 0 1 0 0 1 1 0 0 0 Filter A + B: {1 , 2 , 3 , 4 , 5 , 6 }

Slide 48

Slide 48 text

A few can be found in in Algebird (Abstract Algebra for Scala) https://github.com/twitter/algebird/ • Bloom Filter • HyperLogLog • CountMinSketch • TopK • etc

Slide 49

Slide 49 text

• Monad is just a useful pattern in functional programming • You don’t need to understand Category Theory to use Monads • Once you grasp the idea, you will see this pattern everywhere • Semigroup (commutative) and monoid define properties useful in distributed computing and Lambda Architecture. • It’s all about associativity and commutativity. No nonsense!

Slide 50

Slide 50 text

No content