Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to avoid safety hazards when using closures in Scala

Philipp Haller
September 23, 2022
180

How to avoid safety hazards when using closures in Scala

Philipp Haller

September 23, 2022
Tweet

Transcript

  1. How to avoid safety hazards when using
    closures in Scala
    Philipp Haller
    Associate Professor
    EECS School and Digital Futures
    KTH Royal Institute of Technology
    Stockholm, Sweden
    Strange Loop 2022
    September 23rd, 2022
    Union Station Hotel, St. Louis, Missouri, USA

    View full-size slide

  2. Philipp Haller
    Philipp Haller: Background
    • Since 2018 Associate Professor at KTH Royal Inst. of Tech.
    – 2014–2018 Assistant Professor at KTH
    • 2005–2014 Scala language team
    – PhD 2010 EPFL, Switzerland
    – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.)
    • Focus on concurrent and distributed programming
    – Creator of Scala Actors, co-author of Scala’s futures and Scala Async
    2
    2019: ACM SIGPLAN Programming Languages Software Award for Scala
    Core contributors:

    Martin Odersky, Adriaan Moors, Aleksandar Prokopec, Heather Miller, Iulian Dragos,
    Nada Amin, Philipp Haller, Sebastien Doeraene, Tiark Rompf

    View full-size slide

  3. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    3

    View full-size slide

  4. Philipp Haller
    What’s a closure?
    • Important: we assume lexically scoped name binding
    • The anonymous function refers to local variable “threshold” in its lexical context
    • Closure = anonymous function “whose open bindings (free variables)
    have been closed by the lexical environment” (Peter J. Landin)
    4
    val numbers = List(6, 3, 9, 2, 4)
    val threshold = 5
    val below =
    numbers.filter(num => num < threshold)
    assert(below == List(3, 2, 4))

    View full-size slide

  5. Philipp Haller
    Closures are essential
    • Context: data processing engines like Apache Spark™
    5
    val textFile = spark.read.textFile("README.md")
    textFile
    .map(line => line.split(" ").size)
    .reduce((a, b) => Math.max(a, b))

    View full-size slide

  6. Philipp Haller
    Closures are essential
    • Context: concurrent programming in Java
    6
    Future averageAgeAsync(List customers) {
    Callable task = () -> {
    var averageAge = customers.stream()
    .mapToInt(Customer::getAge)
    .average().getAsDouble();
    return averageAge;
    };
    return executor.submit(task);
    }

    View full-size slide

  7. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    7

    View full-size slide

  8. Philipp Haller
    Trouble in Paradise
    8
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    Using Apache Spark™
    Exception in thread "main" org.apache.spark.SparkException: Task not
    serializable
    ...

    View full-size slide

  9. Philipp Haller
    Trouble in Paradise
    9
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    • distData is a distributed data set

    → each remote worker has a piece of the data
    • map sends its argument closure to each worker

    → argument closure must be serialized
    Uses transform
    from enclosing scope

    View full-size slide

  10. Philipp Haller
    Trouble in Paradise
    10
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => this.transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    this must be serialized
    when the closure is shipped to
    the remote workers
    The type of this is SparkExample
    which is not serializable, hence...
    • distData is a distributed data set

    → each remote worker has a piece of the data
    • map sends its argument closure to each worker

    → argument closure must be serialized
    Actually: closure
    captures this!
    Exception in thread "main" org.apache.spark.SparkException: Task not
    serializable
    ...

    View full-size slide

  11. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • What about concurrency?
    11

    View full-size slide

  12. Philipp Haller
    Closures and concurrency
    • Let's revisit our earlier example in Java!
    12
    Future averageAgeAsync(List customers) {
    Callable task = () -> {
    var averageAge = customers.stream()
    .mapToInt(Customer::getAge)
    .average().getAsDouble();
    return averageAge;
    };
    return executor.submit(task);
    }
    Accesses
    customers which
    might be mutated
    concurrently
    Possible
    data race!

    View full-size slide

  13. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • What about concurrency?
    13

    View full-size slide

  14. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • Using closures in concurrent settings is a safety risk
    – Example: running a closure on a concurrent thread could cause a data
    race if a captured variable refers to a shared mutable object
    • Anything else?
    – Say, you want to send a closure from a frontend running on a
    JavaScript engine to a backend running on a JVM
    • requires a portable serialization scheme!
    14

    View full-size slide

  15. Philipp Haller
    Observations
    • Safety issues stem from unrestricted variable capture
    – Concurrency: capturing and accessing shared mutable objects
    – Distribution: capturing references to non-serializable objects
    • Potential remedies:
    – Restricting types of captured variables
    • For example, permit only types known to be serializable
    – Provide more capturing modes
    • For example, deeply clone mutable objects upon capture
    15

    View full-size slide

  16. Philipp Haller
    How to spot unsafe code using closures?
    • Key: the environment of the closure: its captured variables
    • Some closure code smells:
    – Capturing vars
    • Re-assigned within closure body?
    • In Java, captured variables must be final
    – Potentially unsafe types of captured variables
    • Mutable types
    • Types that don’t mesh well with distribution
    – Not serializable, not accessible remotely
    16

    View full-size slide

  17. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    17

    View full-size slide

  18. Philipp Haller
    How to write safer closure-using code?
    • Basics first: need to be able to spot the captured variables
    – Closures should not be too big
    – Closures should not capture too many variables
    • 2-4 variables OK, > 7 variables probably not
    • Types of captured variables must be safe
    – Prefer (deeply) immutable types
    – Required properties? Serializability? Concurrency safety?
    18

    View full-size slide

  19. Philipp Haller
    How to write safer closure-using code? cont’d
    • Verify creation of closure
    – What’s the logical snapshot of the memory that the closure should
    be initialized with?
    – Example:
    • is it sufficient to initialize the closure’s environment with a copy of a
    reference?
    • or should mutable objects be cloned first?
    • Verify semantics of closure’s execution
    – Mutation of environment? Transactional semantics?
    19

    View full-size slide

  20. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    20

    View full-size slide

  21. Philipp Haller
    Spores3
    Goals:
    An abstraction that makes closures safer and more flexible
    Requirements:
    – Enable constraining the environment (the captured variables) using
    types
    – Support serialization based on type classes
    – Enable a portable implementation, including serialization
    – Minimize the use of macros
    21

    View full-size slide

  22. Philipp Haller
    Idea: Spores
    • Introduce an abstraction, called “spore” [1], which can be seen as a special
    kind of closure
    • Spores:
    – have an explicit environment
    – track the type of their environment using a type refinement, enabling
    type-based constraints
    – enable operations on their environment, for example, for serialization
    and duplication/cloning
    22
    [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency
    and distribution. ECOOP 2014 (Google Scholar: 47 citations)

    View full-size slide

  23. Philipp Haller
    New: Spores3
    • A completely new implementation of spores for Scala 3
    • Addresses limitations of original spores for Scala 2:
    – Macro usage by Spores3 is simple and robust
    • Essentially limited to compile-time checks
    – Spores3 is portable from the beginning
    • Proposes a novel approach to serialization based on type classes
    – Flexible, portable and safe
    23

    View full-size slide

  24. Philipp Haller
    Overview
    • A simple spore without environment:
    • The above spore has the following type:
    • Spore types are subtypes of corresponding function types:
    24
    val s = Spore((x: Int) => x + 2)
    Spore[Int, Int] { type Env = Nothing }
    sealed trait Spore[-T, +R] extends (T => R) {
    type Env
    }
    Function literal
    not permitted to
    capture anything!

    View full-size slide

  25. Philipp Haller
    Spores with environments
    • The environment of a spore is initialized explicitly:
    • The above spore s2 has type:
    25
    val str = "anonymous function"
    val s2 = Spore(str) {
    env => (x: Int) => x + env.length
    }
    Environment
    initialized with
    argument str
    Environment accessed
    using extra parameter
    Spore[Int, Int] { type Env = String }

    View full-size slide

  26. Philipp Haller
    Why use an extra parameter for the environment?
    • User code needs one more parameter…
    • Yes, but it enables the use of pattern matching 

    (instead of, say, env._1, env._2, …):
    26
    val s = "anonymous function"
    val i = 5
    Spore((s, i)) {
    case (str, num) => (x: Int) => x + str.length - num
    }

    View full-size slide

  27. Philipp Haller
    Type-based constraints
    • The Env type member of the Spore trait enables expressing type-based
    constraints on the spore's environment using context parameters
    • Example: require a spore parameter to only capture thread-safe types:
    27
    /* Run spore `s` concurrently, immediately returning a future
    * which is eventually completed with the result of type `T`.
    */
    def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] =
    ...
    Thread-safe types are
    types for which instances of type
    class ThreadSafe exist

    View full-size slide

  28. Philipp Haller
    Serialization
    • One of the design goals for Spores3 is to support serialization based on
    type classes/contextual abstractions
    – Flexibility: enable integration with different serialization frameworks
    (uPickle, Java serialization, Kryo, Jackson, etc.)
    – Portability: support multiple backends/runtime environments
    – Safety: serializability is determined statically
    • Assumptions:
    – Serialization is primarily used for communication between remote nodes
    – Every node is running the same code
    – No transmission of byte code or source code
    28

    View full-size slide

  29. Philipp Haller
    Serialization in Spores3: Approach
    • Instead of serializing the code of a spore, what's serialized is
    – a unique identifier that enables instantiating the implementation of the
    spore; and
    – the spore's environment.
    • In practice:
    – Create spore using a named spore builder
    – Spore builder identifies the spore's implementation
    29

    View full-size slide

  30. Philipp Haller
    Serializing spores: Example
    • Step 1: define spore using spore builder:
    • Step 2: create serializable representation of spore:
    30
    object Prepend extends
    Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    Prepend environment
    to list parameter
    val num: Int = ...
    val data = SporeData(Prepend, Some(num))
    Environment

    View full-size slide

  31. Philipp Haller
    Serializing spores: Example (cont'd)
    • Step 3: pickle SporeData (here, using uPickle):
    • Output (JSON):
    31
    import upickle.default.*
    import com.phaller.spores.upickle.given
    val data = SporeData(Prepend, Some(num))
    val pickled = write(data)
    ["com.example.Prepend",1,""]
    1 = non-empty environment

    View full-size slide

  32. Philipp Haller
    Enforcing safe serialization
    • Example: A method sendOff that serializes a spore and sends it across the
    network to a remote executor
    • Solution:
    32
    def sendOff[N, S <: SporeData[T, T] { type Env = N }]

    (sporeData: S)(using ReadWriter[S]): Unit = {
    ...
    }

    View full-size slide

  33. Philipp Haller
    Enforcing safe serialization
    • Example: A method sendOff that serializes a spore and sends it across the
    network to a remote executor
    • Solution:
    33
    def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter]
    (sporeData: S): Unit = {
    ...
    }
    Context bound

    View full-size slide

  34. Philipp Haller
    Deserializing spores
    • Step 1: read pickled data with target type PackedSporeData:
    – Note: PackedSporeData abstracts from type of environment!
    • Step 2: convert PackedSporeData to spore:
    34
    val unpickledData = read[PackedSporeData](pickled)
    val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]

    View full-size slide

  35. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    35

    View full-size slide

  36. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    36
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)

    View full-size slide

  37. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    37
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)

    View full-size slide

  38. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    38
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)
    Deserialize environment

    View full-size slide

  39. Philipp Haller
    Implementing serialization (2)
    • Recall step 2: creating serializable form of a spore:
    • SporeData factory uses a macro to:
    – check that argument builder is a top-level object
    – obtain fully-qualified name of builder object
    • Serialization of SporeData instance consists of:
    – fully-qualified name of builder object
    – serialized environment
    39
    val data = SporeData(Prepend, Some(num))
    val pickled = write(data)

    View full-size slide

  40. Philipp Haller
    Spores3: Implementation status
    • Open source implementation (Apache License 2.0)
    – GitHub repository: https://github.com/phaller/spores3
    • Supports Scala/JVM and Scala.js
    – Scala Native planned
    • Out-of-the-box integration with uPickle
    – “lightweight JSON and binary (MessagePack) serialization library for
    Scala”
    • Integration with other serialization libraries planned
    40

    View full-size slide

  41. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – Any how to spot unsafe code
    • How to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing serialization in Spores3
    • Summary
    41

    View full-size slide

  42. Philipp Haller
    Summary
    • Using closures in distributed and concurrent settings is a safety risk
    • How to write safer closure-using code:
    – Check captured variables, their types, and capturing semantics
    • Spores3: safer and more flexible closures for Scala 3
    – A completely new implementation of spores for Scala 3
    – Explicit environment, tracked using type refinement
    – Type-based environment constraints
    – Flexible, portable and safe serialization
    42
    Thank You!
    @philippkhaller github.com/phaller/spores3

    View full-size slide