The Functional Model for Distributed Computing

29ccab0d4e3aa0e1f711ce9e158392ae?s=47 Fluquid Ltd.
September 07, 2015

The Functional Model for Distributed Computing

Distributed Computing is becoming more and more prevalent with the rise of big data, multicore processors and scale-out architecture.

This talk will give an introduction to core functional principles in the context of Distributed Computing and Apache Spark.

29ccab0d4e3aa0e1f711ce9e158392ae?s=128

Fluquid Ltd.

September 07, 2015
Tweet

Transcript

  1. The Functional Model for Distributed Computing CorkDev, 2015-09-07 Johannes Ahlmann

  2. Imagine your Application is effortlessly... • composable • testable •

    easy to reason about • parallelizable • easy concurrency maintainable distributable Image: http://www.mindingthebedside.com/wp-content/uploads/2012/12/Meditation-is-as-easy-as-weightlifting.jpg
  3. Challenges we are facing • Parallelism – similar operation on

    different data – centralized • Concurrency – multiple agents – decentralized • Maintainable code – composable – testable – easy to reason about Common Problem: Managing State Image: http://www.bluegreenit.com/wp-content/uploads/2012/01/it-consulting-cloud.jpg
  4. State of the Union • Many languages we use are

    based on C • Originally designed as a “portable assembly” • Imperative - How to do something • Manipulate (global) State • Fixed sequence of steps • Low-level abstractions - Everything is a byte => Not very suited for highly distributed systems Image: https://it.emcelettronica.com/files/node_images/ansic_programmazione_13.jpg
  5. Abstractions Reduce the Mental Load I will show you a

    sequence of 10 digits for a short moment. See if you can memorize them Image: https://workplacenigeria.files.wordpress.com/2015/06/stress.jpg?w=240
  6. Abstractions Reduce the Mental Load 1512251830

  7. Abstractions Reduce the Mental Load

  8. Abstractions Reduce the Mental Load • It is said we

    can handle 5 +/- 2 cognitive units • But we can trick our brain, by making the cognitive units larger ;) • Christmas Day, 2015, half past six • 15-12-25 18:30 • Juggling the 10 digits without abstractions is the equivalent of how we develop software most of the time • Manually managing memory, loop counters, locks, etc. • “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” – Brian Kernighan
  9. “Functional Model” A couple of weeks ago Yannis talked about

    “first-class functions” There is another side to the “Functional Model”. Unfortunately I don’t have a better name for it. 1. Pure functions vs. Effects 2. Immutable data 3. Streams (generators, iterators)
  10. Pure Functions • f: a -> b • Takes an

    “a” and returns a “b” • Does not access global state and has no side- effects • Function invocation can be substituted with the function body • Can be used in an expression • Can be “memoized” • Is itempotent
  11. • stateless • no sequence, no time • non-strict •

    x = 1+4 (equality) • “x” can be substituted by the expression (referential transparency) • idempotent • expressions, algebra • stateful • fixed sequence, time • strict • x := x + 1 (assignment) • “x” = changeable memory “slot” Pure Effects Pure functions by themselves are useless. We want to interact with storage, network, screen etc. We need both pure functions and (controlled, contained) effects
  12. Immutable State append([1, 2, 3], 4) => [1, 2, 3,

    4] • [1, 2, 3] remains unchanged • Inherently thread-safe • Can be shared freely • “Everything is atomic” Image: https://pbs.twimg.com/media/CotjQGDWIAAgpx2.jpg
  13. Pure f: a -> b Pure g: b -> c

    Pure h: (b, c) -> d IO a IO b IO b IO c IO d Effect-Land
  14. Streams (Generators, Iterators) xs = [1, 2, 3]; return xs.map(x

    => x+1); Declarative Imperative xs = [1, 2, 3]; res = [] for (int i = 0; i < 10; i++) { res.append(xs[i] + 1); } return res; Which do you think is easier to parallelize?
  15. Stream Fusion xs .map(x => x+1) .map(y => y*2) Iff

    functions are pure, we can • combine • reorder • optimize the entire chain If application is lazy, we can optimize across functions as well xs .map(x => (x+1)*2)
  16. • Idea: “Express structure of workloads in terms of abstract

    algebra" • Parallel transformations (pure functions) – lazy – create arbitrary operator graph – should be “pure”, idempotent • Actions (effects) – forcing transformations – collapsing the tree • Resilient Distributed Data Set – created through transformations – Immutable
  17. Spark Example – Word Count • create RDD from HDFS

    text file lines • split lines into words, and collapse list-of-lists • create word tuples • sum word occurrences
  18. Operator Graph Image: http://bit.ly/2AZG0V1

  19. Takeaways • Parallelism and concurrency are here to stay! •

    Keep your functions pure • Keep your data immutable • Contain your state and effects as much as possible Image: http://www.ibuycarz.com/upload/1/71/171ea33ce2205d2f.jpg