Slide 1

Slide 1 text

The Functional Model for Distributed Computing CorkDev, 2015-09-07 Johannes Ahlmann

Slide 2

Slide 2 text

Imagine your Application is effortlessly... • composable • testable • easy to reason about • parallelizable • easy concurrency maintainable distributable Image: http://www.mindingthebedside.com/wp-content/uploads/2012/12/Meditation-is-as-easy-as-weightlifting.jpg

Slide 3

Slide 3 text

Challenges we are facing • Parallelism – similar operation on different data – centralized • Concurrency – multiple agents – decentralized • Maintainable code – composable – testable – easy to reason about Common Problem: Managing State Image: http://www.bluegreenit.com/wp-content/uploads/2012/01/it-consulting-cloud.jpg

Slide 4

Slide 4 text

State of the Union • Many languages we use are based on C • Originally designed as a “portable assembly” • Imperative - How to do something • Manipulate (global) State • Fixed sequence of steps • Low-level abstractions - Everything is a byte => Not very suited for highly distributed systems Image: https://it.emcelettronica.com/files/node_images/ansic_programmazione_13.jpg

Slide 5

Slide 5 text

Abstractions Reduce the Mental Load I will show you a sequence of 10 digits for a short moment. See if you can memorize them Image: https://workplacenigeria.files.wordpress.com/2015/06/stress.jpg?w=240

Slide 6

Slide 6 text

Abstractions Reduce the Mental Load 1512251830

Slide 7

Slide 7 text

Abstractions Reduce the Mental Load

Slide 8

Slide 8 text

Abstractions Reduce the Mental Load • It is said we can handle 5 +/- 2 cognitive units • But we can trick our brain, by making the cognitive units larger ;) • Christmas Day, 2015, half past six • 15-12-25 18:30 • Juggling the 10 digits without abstractions is the equivalent of how we develop software most of the time • Manually managing memory, loop counters, locks, etc. • “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” – Brian Kernighan

Slide 9

Slide 9 text

“Functional Model” A couple of weeks ago Yannis talked about “first-class functions” There is another side to the “Functional Model”. Unfortunately I don’t have a better name for it. 1. Pure functions vs. Effects 2. Immutable data 3. Streams (generators, iterators)

Slide 10

Slide 10 text

Pure Functions • f: a -> b • Takes an “a” and returns a “b” • Does not access global state and has no side- effects • Function invocation can be substituted with the function body • Can be used in an expression • Can be “memoized” • Is itempotent

Slide 11

Slide 11 text

• stateless • no sequence, no time • non-strict • x = 1+4 (equality) • “x” can be substituted by the expression (referential transparency) • idempotent • expressions, algebra • stateful • fixed sequence, time • strict • x := x + 1 (assignment) • “x” = changeable memory “slot” Pure Effects Pure functions by themselves are useless. We want to interact with storage, network, screen etc. We need both pure functions and (controlled, contained) effects

Slide 12

Slide 12 text

Immutable State append([1, 2, 3], 4) => [1, 2, 3, 4] • [1, 2, 3] remains unchanged • Inherently thread-safe • Can be shared freely • “Everything is atomic” Image: https://pbs.twimg.com/media/CotjQGDWIAAgpx2.jpg

Slide 13

Slide 13 text

Pure f: a -> b Pure g: b -> c Pure h: (b, c) -> d IO a IO b IO b IO c IO d Effect-Land

Slide 14

Slide 14 text

Streams (Generators, Iterators) xs = [1, 2, 3]; return xs.map(x => x+1); Declarative Imperative xs = [1, 2, 3]; res = [] for (int i = 0; i < 10; i++) { res.append(xs[i] + 1); } return res; Which do you think is easier to parallelize?

Slide 15

Slide 15 text

Stream Fusion xs .map(x => x+1) .map(y => y*2) Iff functions are pure, we can • combine • reorder • optimize the entire chain If application is lazy, we can optimize across functions as well xs .map(x => (x+1)*2)

Slide 16

Slide 16 text

• Idea: “Express structure of workloads in terms of abstract algebra" • Parallel transformations (pure functions) – lazy – create arbitrary operator graph – should be “pure”, idempotent • Actions (effects) – forcing transformations – collapsing the tree • Resilient Distributed Data Set – created through transformations – Immutable

Slide 17

Slide 17 text

Spark Example – Word Count • create RDD from HDFS text file lines • split lines into words, and collapse list-of-lists • create word tuples • sum word occurrences

Slide 18

Slide 18 text

Operator Graph Image: http://bit.ly/2AZG0V1

Slide 19

Slide 19 text

Takeaways • Parallelism and concurrency are here to stay! • Keep your functions pure • Keep your data immutable • Contain your state and effects as much as possible Image: http://www.ibuycarz.com/upload/1/71/171ea33ce2205d2f.jpg