Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to avoid safety hazards when using closures in Scala

Philipp Haller
October 05, 2022
49

How to avoid safety hazards when using closures in Scala

Philipp Haller

October 05, 2022
Tweet

Transcript

  1. How to avoid safety hazards when using
    closures in Scala
    Philipp Haller
    Associate Professor
    EECS School and Digital Futures
    KTH Royal Institute of Technology
    Stockholm, Sweden
    ScalaCon 2022
    October 5th, 2022

    View Slide

  2. Philipp Haller
    Philipp Haller: Background
    • Since 2018 Associate Professor at KTH Royal Inst. of Tech.
    – 2014–2018 Assistant Professor at KTH
    • 2005–2014 Scala language team
    – PhD 2010 EPFL, Switzerland
    – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.)
    • Work on concurrent and distributed programming
    – Creator of Scala Actors, co-author of Scala’s futures and Scala Async
    • Work on type systems
    – Research on capabilities, affine types, consistency types, …
    2

    View Slide

  3. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – And how to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing Spores3
    • Summary
    3

    View Slide

  4. Philipp Haller
    What’s a closure?
    • The anonymous function refers to local variable “threshold” in its lexical
    context
    • Closure = anonymous function “whose open bindings (free variables)
    have been closed by the lexical environment” (Peter J. Landin)
    4
    val numbers = List(6, 3, 9, 2, 4)
    val threshold = 5
    val below =
    numbers.filter(num => num < threshold)
    assert(below == List(3, 2, 4))

    View Slide

  5. Philipp Haller
    Closures are essential
    • Context: data processing engines like Apache Spark™
    5
    val textFile = spark.read.textFile("README.md")
    textFile
    .map(line => line.split(" ").size)
    .reduce((a, b) => Math.max(a, b))

    View Slide

  6. Philipp Haller
    Closures are essential
    • Context: concurrent programming with futures
    6
    def averageAge(customers: List[Customer]): Future[Float] =
    Future {
    val infos = customers.flatMap { c =>
    customerData.get(c.customerNo) match
    case Some(info) => List(info)
    case None => List()
    }
    val sumAges = infos.foldLeft(0)(_ + _.age).toFloat
    if (infos.nonEmpty) sumAges / infos.size else 0.0f
    }

    View Slide

  7. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – And how to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing Spores3
    • Summary
    7

    View Slide

  8. Philipp Haller
    Trouble in Paradise
    8
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    Using Apache Spark™
    Exception in thread "main" org.apache.spark.SparkException: Task not
    serializable
    ...

    View Slide

  9. Philipp Haller
    Trouble in Paradise
    9
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    • distData is a distributed data set

    → each remote worker has a piece of the data
    • map sends its argument closure to each worker

    → argument closure must be serialized
    Uses transform
    from enclosing scope

    View Slide

  10. Philipp Haller
    Trouble in Paradise
    10
    class SparkExample {
    val distData = sc.parallelize(Array(1, 2, 3, 4, 5))
    def transform(x: Int): Int = x+1
    def test(): Unit = {
    val transformed = distData.map(elem => this.transform(elem))
    transformed.collect().foreach(elem => println(elem))
    }
    }
    this must be serialized
    when the closure is shipped to
    the remote workers
    The type of this is SparkExample
    which is not serializable, hence...
    • distData is a distributed data set

    → each remote worker has a piece of the data
    • map sends its argument closure to each worker

    → argument closure must be serialized
    Actually: closure
    captures this!
    Exception in thread "main" org.apache.spark.SparkException: Task not
    serializable
    ...

    View Slide

  11. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • What about concurrency?
    11

    View Slide

  12. Philipp Haller
    Example: Concurrency
    12
    val customerData: mutable.Map[Int, CustomerInfo] = ...
    def averageAge(customers: List[Customer]): Future[Float] =
    Future {
    val infos = customers.flatMap { c =>
    customerData.get(c.customerNo) match
    case Some(info) => List(info)
    case None => List()
    }
    val sumAges = infos.foldLeft(0)(_ + _.age).toFloat
    if (infos.nonEmpty) sumAges / infos.size else 0.0f
    }
    Possible
    data race!

    View Slide

  13. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • What about concurrency?
    13

    View Slide

  14. Philipp Haller
    Problematic uses of closures
    • Using closures in distributed settings is a safety risk
    – Example: serializing closures can result in runtime errors

    (e.g., java.io.NotSerializableException on the JVM)
    • Using closures in concurrent settings is a safety risk
    – Example: running a closure on a concurrent thread could cause a data
    race if a captured variable refers to a shared mutable object
    • Anything else?
    – Say, you want to send a closure from a frontend running on a
    JavaScript engine to a backend running on a JVM
    • requires a portable serialization scheme!
    14

    View Slide

  15. Philipp Haller
    Observations
    • Safety issues stem from unrestricted variable capture
    – Concurrency: capturing and accessing shared mutable objects
    – Distribution: capturing references to non-serializable objects
    • Potential remedies:
    – Restricting types of captured variables
    • For example, permit only types known to be serializable
    – Provide more capturing modes
    • For example, deeply clone mutable objects upon capture
    15

    View Slide

  16. Philipp Haller
    How to write safer closure-using code?
    • Check the captured variables
    – Closures should not capture too many variables
    • 2-4 variables OK, > 7 variables probably not
    • Types of captured variables must be safe
    – Prefer immutable types
    – Check required properties
    • Serializability? Concurrency safety?
    • What’s the logical memory snapshot that the closure should be
    initialized with?
    16
    Would be nice if the
    compiler could check this…

    View Slide

  17. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – And how to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing Spores3
    • Summary
    17

    View Slide

  18. Philipp Haller
    Spores3 — A Scala 3 library
    Goals:
    • Automate safety checking of closures
    • Add flexibility, e.g., portable serialization
    Main idea: “spores” [1] — a special kind of closure that
    – has an explicit environment
    – tracks the type of its environment using type refinements
    18
    [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency
    and distribution. ECOOP 2014 (Google Scholar: 47 citations)

    View Slide

  19. Philipp Haller
    Spores3: What’s New?
    • A completely new implementation of spores for Scala 3
    • Addresses limitations of original spores for Scala 2:
    – Macro usage limited to compile-time checks
    – Portable from the beginning
    • Proposes a novel approach to serialization
    – Based on type classes
    – Flexible, portable and safe
    19

    View Slide

  20. Philipp Haller
    Overview
    • A simple spore without environment:
    • The above spore has the following type:
    • Spore types are subtypes of corresponding function types:
    20
    val s = Spore((x: Int) => x + 2)
    Spore[Int, Int] { type Env = Nothing }
    sealed trait Spore[-T, +R] extends (T => R) {
    type Env
    }
    Function literal
    not permitted to
    capture anything!

    View Slide

  21. Philipp Haller
    Spores with environments
    • The environment of a spore is initialized explicitly:
    • The above spore s2 has type:
    21
    val str = "anonymous function"
    val s2 = Spore(str) {
    env => (x: Int) => x + env.length
    }
    Environment
    initialized with
    argument str
    Environment accessed
    using extra parameter
    Spore[Int, Int] { type Env = String }

    View Slide

  22. Philipp Haller
    Why use an extra parameter for the environment?
    • User code needs one more parameter…
    • Yes, but it enables the use of pattern matching 

    (instead of, say, env._1, env._2, …):
    22
    val s = "anonymous function"
    val i = 5
    Spore((s, i)) {
    case (str, num) => (x: Int) => x + str.length - num
    }

    View Slide

  23. Philipp Haller
    Type-based constraints
    • The Env type member of the Spore trait enables expressing type-based
    constraints on the spore's environment using context parameters
    • Example: require a spore parameter to only capture thread-safe types:
    23
    /* Run spore `s` concurrently, immediately returning a future
    * which is eventually completed with the result of type `T`.
    */
    def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] =
    ...
    Thread-safe types are
    types for which instances of type
    class ThreadSafe exist

    View Slide

  24. Philipp Haller
    Serialization
    • One of the design goals for Spores3 is to support serialization based on
    type classes
    – Flexibility: enable integration with different serialization frameworks
    (uPickle, Java serialization, Kryo, Jackson, etc.)
    – Portability: support multiple backends/runtime environments
    – Safety: serializability is determined statically
    • Assumptions:
    – Serialization is primarily used for communication between remote nodes
    – Every node is running the same code
    – No transmission of byte code or source code
    24

    View Slide

  25. Philipp Haller
    Serialization in Spores3: Approach
    • Instead of serializing the code of a spore, what's serialized is
    – a unique identifier that enables instantiating the implementation of the
    spore; and
    – the spore's environment.
    • In practice:
    – Create spore using a named spore builder
    – Spore builder identifies the spore's implementation
    25

    View Slide

  26. Philipp Haller
    Serializing spores: Example
    • Step 1: define spore using spore builder:
    • Step 2: create serializable representation of spore:
    26
    object Prepend extends
    Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    Prepend environment
    to list parameter
    val num: Int = ...
    val data = SporeData(Prepend, Some(num))
    Environment

    View Slide

  27. Philipp Haller
    Serializing spores: Example (cont'd)
    • Step 3: pickle SporeData (here, using uPickle):
    • Output (JSON):
    27
    import upickle.default.*
    import com.phaller.spores.upickle.given
    val data = SporeData(Prepend, Some(num))
    val pickled = write(data)
    ["com.example.Prepend",1,""]
    1 = non-empty environment

    View Slide

  28. Philipp Haller
    Enforcing safe serialization
    • Example: A method sendOff that serializes a spore and sends it across the
    network to a remote executor
    • Solution:
    28
    def sendOff[N, S (sporeData: S)(using ReadWriter[S]): Unit = {
    ...
    }

    View Slide

  29. Philipp Haller
    Enforcing safe serialization
    • Example: A method sendOff that serializes a spore and sends it across the
    network to a remote executor
    • Solution:
    29
    def sendOff[N, S (sporeData: S): Unit = {
    ...
    }
    Context bound

    View Slide

  30. Philipp Haller
    Deserializing spores
    • Step 1: read pickled data with target type PackedSporeData:
    – Note: PackedSporeData abstracts from type of environment!
    • Step 2: convert PackedSporeData to spore:
    30
    val unpickledData = read[PackedSporeData](pickled)
    val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]

    View Slide

  31. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – And how to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing Spores3
    • Summary
    31

    View Slide

  32. Philipp Haller
    Implementation
    Spore creation:
    – check that body of spore does not capture any variable
    – use macro for (compile-time) capture checking
    32
    object Spore {
    inline def apply[E, T, R](inline initEnv: E)
    (inline body: E => T => R):
    Spore[T, R] { type Env = E } =
    ${ applyCode('initEnv)('body) }

    View Slide

  33. Philipp Haller
    Macro for capture checking
    33
    private def applyCode[E, T, R]
    (envExpr: Expr[E], bodyExpr: Expr[E => T => R])
    (using Type[E], Type[T], Type[R], Quotes):
    Expr[Spore[T, R] { type Env = E }] = {
    checkBodyExpr(bodyExpr)
    '{
    new Spore[T, R] {
    type Env = E
    def apply(x: T): R = $bodyExpr($envExpr)(x)
    ...
    }
    }
    }
    Check that bodyExpr does
    not capture anything

    View Slide

  34. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    34
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)

    View Slide

  35. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    35
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)

    View Slide

  36. Philipp Haller
    Implementing serialization
    • Recall creation of spore builder:
    • Environment serializer + deserializer obtained when builder is constructed:
    36
    class Builder[E, T, R](body: E => T => R)
    (using ReadWriter[E]) extends TypedBuilder[E, T, R]:
    object Prepend extends Spore.Builder[Int, List[Int], List[Int]](
    env => (xs: List[Int]) => env :: xs
    )
    def createSpore(envOpt: Option[String]): Spore[T, R] =
    val initEnv = read[E](envOpt.get)
    apply(initEnv)
    Deserialize environment

    View Slide

  37. Philipp Haller
    Implementing serialization (2)
    • Recall step 2: creating serializable form of a spore:
    • SporeData factory uses a macro to:
    – check that argument builder is a top-level object
    – obtain fully-qualified name of builder object
    • Serialization of SporeData instance consists of:
    – fully-qualified name of builder object
    – serialized environment
    37
    val data = SporeData(Prepend, Some(num))
    val pickled = write(data)

    View Slide

  38. Philipp Haller
    Spores3: Implementation status
    • Open source implementation (Apache License 2.0)
    – Pre-release: "com.phaller" %% "spores3" % "0.1.0"
    – GitHub repository:

    https://github.com/phaller/spores3
    • Supports Scala/JVM and Scala.js
    – Scala Native planned
    • Out-of-the-box integration with uPickle
    – Integrations with other serialization libraries planned
    38

    View Slide

  39. Philipp Haller
    Outline
    • Closures are essential
    • The many ways in which closure-using code can go wrong
    – And how to write safer closure-using code
    • Spores3: safer and more flexible closures for Scala 3
    • Implementing Spores3
    • Summary
    39

    View Slide

  40. Philipp Haller
    Summary
    • Using closures in distributed and concurrent settings is a safety risk
    • Spores3: safer and more flexible closures for Scala 3
    – A completely new implementation of spores for Scala 3
    – Explicit environment, tracked using type refinement
    – Type-based environment constraints
    – Flexible, portable and safe serialization
    40
    Thank You!
    @philippkhaller https://www.phaller.com
    https://github.com/phaller/spores3

    View Slide