Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Functional Model for Distributed Computing

Fluquid Ltd.
September 07, 2015

The Functional Model for Distributed Computing

Distributed Computing is becoming more and more prevalent with the rise of big data, multicore processors and scale-out architecture.

This talk will give an introduction to core functional principles in the context of Distributed Computing and Apache Spark.

Fluquid Ltd.

September 07, 2015
Tweet

More Decks by Fluquid Ltd.

Other Decks in Technology

Transcript

  1. The Functional Model
    for Distributed Computing
    CorkDev, 2015-09-07
    Johannes Ahlmann

    View Slide

  2. Imagine your Application is effortlessly...
    • composable
    • testable
    • easy to reason about
    • parallelizable
    • easy concurrency
    maintainable
    distributable
    Image: http://www.mindingthebedside.com/wp-content/uploads/2012/12/Meditation-is-as-easy-as-weightlifting.jpg

    View Slide

  3. Challenges we are facing
    • Parallelism
    – similar operation on different data
    – centralized
    • Concurrency
    – multiple agents
    – decentralized
    • Maintainable code
    – composable
    – testable
    – easy to reason about
    Common Problem:
    Managing State
    Image: http://www.bluegreenit.com/wp-content/uploads/2012/01/it-consulting-cloud.jpg

    View Slide

  4. State of the Union
    • Many languages we use are based on C
    • Originally designed as a “portable
    assembly”
    • Imperative - How to do something
    • Manipulate (global) State
    • Fixed sequence of steps
    • Low-level abstractions - Everything is a
    byte
    => Not very suited for highly distributed
    systems
    Image: https://it.emcelettronica.com/files/node_images/ansic_programmazione_13.jpg

    View Slide

  5. Abstractions Reduce the Mental Load
    I will show you a sequence of
    10 digits for a short moment.
    See if you can memorize them
    Image: https://workplacenigeria.files.wordpress.com/2015/06/stress.jpg?w=240

    View Slide

  6. Abstractions Reduce the Mental Load
    1512251830

    View Slide

  7. Abstractions Reduce the Mental Load

    View Slide

  8. Abstractions Reduce the Mental Load
    • It is said we can handle 5 +/- 2 cognitive units
    • But we can trick our brain, by making the cognitive units larger ;)
    • Christmas Day, 2015, half past six
    • 15-12-25 18:30
    • Juggling the 10 digits without abstractions is the equivalent of
    how we develop software most of the time
    • Manually managing memory, loop counters, locks, etc.
    • “Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it.” – Brian Kernighan

    View Slide

  9. “Functional Model”
    A couple of weeks ago Yannis talked about
    “first-class functions”
    There is another side to the “Functional Model”.
    Unfortunately I don’t have a better name for it.
    1. Pure functions vs. Effects
    2. Immutable data
    3. Streams (generators, iterators)

    View Slide

  10. Pure Functions
    • f: a -> b
    • Takes an “a” and returns a “b”
    • Does not access global state and has no side-
    effects
    • Function invocation can be substituted with the
    function body
    • Can be used in an expression
    • Can be “memoized”
    • Is itempotent

    View Slide

  11. • stateless
    • no sequence, no time
    • non-strict
    • x = 1+4 (equality)
    • “x” can be substituted by
    the expression
    (referential transparency)
    • idempotent
    • expressions, algebra
    • stateful
    • fixed sequence, time
    • strict
    • x := x + 1 (assignment)
    • “x” = changeable memory
    “slot”
    Pure Effects
    Pure functions by themselves are useless.
    We want to interact with storage, network, screen etc.
    We need both pure functions and (controlled, contained) effects

    View Slide

  12. Immutable State
    append([1, 2, 3], 4) => [1, 2, 3, 4]
    • [1, 2, 3] remains unchanged
    • Inherently thread-safe
    • Can be shared freely
    • “Everything is atomic”
    Image: https://pbs.twimg.com/media/CotjQGDWIAAgpx2.jpg

    View Slide

  13. Pure
    f: a -> b
    Pure
    g: b -> c
    Pure
    h: (b, c) -> d
    IO a IO b
    IO b IO c
    IO d
    Effect-Land

    View Slide

  14. Streams (Generators, Iterators)
    xs = [1, 2, 3];
    return xs.map(x => x+1);
    Declarative Imperative
    xs = [1, 2, 3];
    res = []
    for (int i = 0; i < 10; i++) {
    res.append(xs[i] + 1);
    }
    return res;
    Which do you think is easier to parallelize?

    View Slide

  15. Stream Fusion
    xs
    .map(x => x+1)
    .map(y => y*2)
    Iff functions are pure, we can
    • combine
    • reorder
    • optimize the entire chain
    If application is lazy, we can optimize across functions as well
    xs
    .map(x => (x+1)*2)

    View Slide

  16. • Idea: “Express structure of workloads in terms of abstract
    algebra"
    • Parallel transformations (pure functions)
    – lazy
    – create arbitrary operator graph
    – should be “pure”, idempotent
    • Actions (effects)
    – forcing transformations
    – collapsing the tree
    • Resilient Distributed Data Set
    – created through transformations
    – Immutable

    View Slide

  17. Spark Example – Word Count
    • create RDD from HDFS text file lines
    • split lines into words, and collapse list-of-lists
    • create word tuples
    • sum word occurrences

    View Slide

  18. Operator Graph
    Image: http://bit.ly/2AZG0V1

    View Slide

  19. Takeaways
    • Parallelism and concurrency are
    here to stay!
    • Keep your functions pure
    • Keep your data immutable
    • Contain your state and effects as
    much as possible
    Image: http://www.ibuycarz.com/upload/1/71/171ea33ce2205d2f.jpg

    View Slide