The Functional Model for Distributed Computing

The Functional Model for Distributed Computing CorkDev, 2015-09-07 Johannes Ahlmann

Imagine your Application is effortlessly... • composable • testable •
easy to reason about • parallelizable • easy concurrency maintainable distributable Image: http://www.mindingthebedside.com/wp-content/uploads/2012/12/Meditation-is-as-easy-as-weightlifting.jpg

Challenges we are facing • Parallelism – similar operation on
different data – centralized • Concurrency – multiple agents – decentralized • Maintainable code – composable – testable – easy to reason about Common Problem: Managing State Image: http://www.bluegreenit.com/wp-content/uploads/2012/01/it-consulting-cloud.jpg

State of the Union • Many languages we use are
based on C • Originally designed as a “portable assembly” • Imperative - How to do something • Manipulate (global) State • Fixed sequence of steps • Low-level abstractions - Everything is a byte => Not very suited for highly distributed systems Image: https://it.emcelettronica.com/files/node_images/ansic_programmazione_13.jpg

Abstractions Reduce the Mental Load I will show you a
sequence of 10 digits for a short moment. See if you can memorize them Image: https://workplacenigeria.files.wordpress.com/2015/06/stress.jpg?w=240

Abstractions Reduce the Mental Load 1512251830

Abstractions Reduce the Mental Load

Abstractions Reduce the Mental Load • It is said we
can handle 5 +/- 2 cognitive units • But we can trick our brain, by making the cognitive units larger ;) • Christmas Day, 2015, half past six • 15-12-25 18:30 • Juggling the 10 digits without abstractions is the equivalent of how we develop software most of the time • Manually managing memory, loop counters, locks, etc. • “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” – Brian Kernighan

“Functional Model” A couple of weeks ago Yannis talked about
“first-class functions” There is another side to the “Functional Model”. Unfortunately I don’t have a better name for it. 1. Pure functions vs. Effects 2. Immutable data 3. Streams (generators, iterators)

Pure Functions • f: a -> b • Takes an
“a” and returns a “b” • Does not access global state and has no side- effects • Function invocation can be substituted with the function body • Can be used in an expression • Can be “memoized” • Is itempotent

• stateless • no sequence, no time • non-strict •
x = 1+4 (equality) • “x” can be substituted by the expression (referential transparency) • idempotent • expressions, algebra • stateful • fixed sequence, time • strict • x := x + 1 (assignment) • “x” = changeable memory “slot” Pure Effects Pure functions by themselves are useless. We want to interact with storage, network, screen etc. We need both pure functions and (controlled, contained) effects

Immutable State append([1, 2, 3], 4) => [1, 2, 3,
4] • [1, 2, 3] remains unchanged • Inherently thread-safe • Can be shared freely • “Everything is atomic” Image: https://pbs.twimg.com/media/CotjQGDWIAAgpx2.jpg

Pure f: a -> b Pure g: b -> c
Pure h: (b, c) -> d IO a IO b IO b IO c IO d Effect-Land

Streams (Generators, Iterators) xs = [1, 2, 3]; return xs.map(x
=> x+1); Declarative Imperative xs = [1, 2, 3]; res = [] for (int i = 0; i < 10; i++) { res.append(xs[i] + 1); } return res; Which do you think is easier to parallelize?

Stream Fusion xs .map(x => x+1) .map(y => y*2) Iff
functions are pure, we can • combine • reorder • optimize the entire chain If application is lazy, we can optimize across functions as well xs .map(x => (x+1)*2)

• Idea: “Express structure of workloads in terms of abstract
algebra" • Parallel transformations (pure functions) – lazy – create arbitrary operator graph – should be “pure”, idempotent • Actions (effects) – forcing transformations – collapsing the tree • Resilient Distributed Data Set – created through transformations – Immutable

Spark Example – Word Count • create RDD from HDFS
text file lines • split lines into words, and collapse list-of-lists • create word tuples • sum word occurrences

Operator Graph Image: http://bit.ly/2AZG0V1

Takeaways • Parallelism and concurrency are here to stay! •
Keep your functions pure • Keep your data immutable • Contain your state and effects as much as possible Image: http://www.ibuycarz.com/upload/1/71/171ea33ce2205d2f.jpg

The Functional Model for Distributed Computing

The Functional Model for Distributed Computing

Fluquid Ltd.

More Decks by Fluquid Ltd.

Other Decks in Technology

Featured

Transcript