Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to avoid safety hazards when using closures in Scala

Philipp Haller
October 05, 2022
140

How to avoid safety hazards when using closures in Scala

Philipp Haller

October 05, 2022
Tweet

Transcript

  1. How to avoid safety hazards when using closures in Scala

    Philipp Haller Associate Professor EECS School and Digital Futures KTH Royal Institute of Technology Stockholm, Sweden ScalaCon 2022 October 5th, 2022
  2. Philipp Haller Philipp Haller: Background • Since 2018 Associate Professor

    at KTH Royal Inst. of Tech. – 2014–2018 Assistant Professor at KTH • 2005–2014 Scala language team – PhD 2010 EPFL, Switzerland – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.) • Work on concurrent and distributed programming – Creator of Scala Actors, co-author of Scala’s futures and Scala Async • Work on type systems – Research on capabilities, affine types, consistency types, … 2
  3. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 3
  4. Philipp Haller What’s a closure? • The anonymous function refers

    to local variable “threshold” in its lexical context • Closure = anonymous function “whose open bindings (free variables) have been closed by the lexical environment” (Peter J. Landin) 4 val numbers = List(6, 3, 9, 2, 4) val threshold = 5 val below = numbers.filter(num => num < threshold) assert(below == List(3, 2, 4))
  5. Philipp Haller Closures are essential • Context: data processing engines

    like Apache Spark™ 5 val textFile = spark.read.textFile("README.md") textFile .map(line => line.split(" ").size) .reduce((a, b) => Math.max(a, b))
  6. Philipp Haller Closures are essential • Context: concurrent programming with

    futures 6 def averageAge(customers: List[Customer]): Future[Float] = Future { val infos = customers.flatMap { c => customerData.get(c.customerNo) match case Some(info) => List(info) case None => List() } val sumAges = infos.foldLeft(0)(_ + _.age).toFloat if (infos.nonEmpty) sumAges / infos.size else 0.0f }
  7. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 7
  8. Philipp Haller Trouble in Paradise 8 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } Using Apache Spark™ Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
  9. Philipp Haller Trouble in Paradise 9 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Uses transform from enclosing scope
  10. Philipp Haller Trouble in Paradise 10 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => this.transform(elem)) transformed.collect().foreach(elem => println(elem)) } } this must be serialized when the closure is shipped to the remote workers The type of this is SparkExample which is not serializable, hence... • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Actually: closure captures this! Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
  11. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 11
  12. Philipp Haller Example: Concurrency 12 val customerData: mutable.Map[Int, CustomerInfo] =

    ... def averageAge(customers: List[Customer]): Future[Float] = Future { val infos = customers.flatMap { c => customerData.get(c.customerNo) match case Some(info) => List(info) case None => List() } val sumAges = infos.foldLeft(0)(_ + _.age).toFloat if (infos.nonEmpty) sumAges / infos.size else 0.0f } Possible data race!
  13. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 13
  14. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • Using closures in concurrent settings is a safety risk – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Anything else? – Say, you want to send a closure from a frontend running on a JavaScript engine to a backend running on a JVM • requires a portable serialization scheme! 14
  15. Philipp Haller Observations • Safety issues stem from unrestricted variable

    capture – Concurrency: capturing and accessing shared mutable objects – Distribution: capturing references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone mutable objects upon capture 15
  16. Philipp Haller How to write safer closure-using code? • Check

    the captured variables – Closures should not capture too many variables • 2-4 variables OK, > 7 variables probably not • Types of captured variables must be safe – Prefer immutable types – Check required properties • Serializability? Concurrency safety? • What’s the logical memory snapshot that the closure should be initialized with? 16 Would be nice if the compiler could check this…
  17. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 17
  18. Philipp Haller Spores3 — A Scala 3 library Goals: •

    Automate safety checking of closures • Add flexibility, e.g., portable serialization Main idea: “spores” [1] — a special kind of closure that – has an explicit environment – tracks the type of its environment using type refinements 18 [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 (Google Scholar: 47 citations)
  19. Philipp Haller Spores3: What’s New? • A completely new implementation

    of spores for Scala 3 • Addresses limitations of original spores for Scala 2: – Macro usage limited to compile-time checks – Portable from the beginning • Proposes a novel approach to serialization – Based on type classes – Flexible, portable and safe 19
  20. Philipp Haller Overview • A simple spore without environment: •

    The above spore has the following type: • Spore types are subtypes of corresponding function types: 20 val s = Spore((x: Int) => x + 2) Spore[Int, Int] { type Env = Nothing } sealed trait Spore[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!
  21. Philipp Haller Spores with environments • The environment of a

    spore is initialized explicitly: • The above spore s2 has type: 21 val str = "anonymous function" val s2 = Spore(str) { env => (x: Int) => x + env.length } Environment initialized with argument str Environment accessed using extra parameter Spore[Int, Int] { type Env = String }
  22. Philipp Haller Why use an extra parameter for the environment?

    • User code needs one more parameter… • Yes, but it enables the use of pattern matching 
 (instead of, say, env._1, env._2, …): 22 val s = "anonymous function" val i = 5 Spore((s, i)) { case (str, num) => (x: Int) => x + str.length - num }
  23. Philipp Haller Type-based constraints • The Env type member of

    the Spore trait enables expressing type-based constraints on the spore's environment using context parameters • Example: require a spore parameter to only capture thread-safe types: 23 /* Run spore `s` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist
  24. Philipp Haller Serialization • One of the design goals for

    Spores3 is to support serialization based on type classes – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 24
  25. Philipp Haller Serialization in Spores3: Approach • Instead of serializing

    the code of a spore, what's serialized is – a unique identifier that enables instantiating the implementation of the spore; and – the spore's environment. • In practice: – Create spore using a named spore builder – Spore builder identifies the spore's implementation 25
  26. Philipp Haller Serializing spores: Example • Step 1: define spore

    using spore builder: • Step 2: create serializable representation of spore: 26 object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) Prepend environment to list parameter val num: Int = ... val data = SporeData(Prepend, Some(num)) Environment
  27. Philipp Haller Serializing spores: Example (cont'd) • Step 3: pickle

    SporeData (here, using uPickle): • Output (JSON): 27 import upickle.default.* import com.phaller.spores.upickle.given val data = SporeData(Prepend, Some(num)) val pickled = write(data) ["com.example.Prepend",1,"<num>"] 1 = non-empty environment
  28. Philipp Haller Enforcing safe serialization • Example: A method sendOff

    that serializes a spore and sends it across the network to a remote executor • Solution: 28 def sendOff[N, S <: SporeData[T, T] { type Env = N }]
 (sporeData: S)(using ReadWriter[S]): Unit = { ... }
  29. Philipp Haller Enforcing safe serialization • Example: A method sendOff

    that serializes a spore and sends it across the network to a remote executor • Solution: 29 def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter] (sporeData: S): Unit = { ... } Context bound
  30. Philipp Haller Deserializing spores • Step 1: read pickled data

    with target type PackedSporeData: – Note: PackedSporeData abstracts from type of environment! • Step 2: convert PackedSporeData to spore: 30 val unpickledData = read[PackedSporeData](pickled) val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]
  31. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 31
  32. Philipp Haller Implementation Spore creation: – check that body of

    spore does not capture any variable – use macro for (compile-time) capture checking 32 object Spore { inline def apply[E, T, R](inline initEnv: E) (inline body: E => T => R): Spore[T, R] { type Env = E } = ${ applyCode('initEnv)('body) }
  33. Philipp Haller Macro for capture checking 33 private def applyCode[E,

    T, R] (envExpr: Expr[E], bodyExpr: Expr[E => T => R]) (using Type[E], Type[T], Type[R], Quotes): Expr[Spore[T, R] { type Env = E }] = { checkBodyExpr(bodyExpr) '{ new Spore[T, R] { type Env = E def apply(x: T): R = $bodyExpr($envExpr)(x) ... } } } Check that bodyExpr does not capture anything
  34. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 34 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)
  35. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 35 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)
  36. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 36 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv) Deserialize environment
  37. Philipp Haller Implementing serialization (2) • Recall step 2: creating

    serializable form of a spore: • SporeData factory uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of SporeData instance consists of: – fully-qualified name of builder object – serialized environment 37 val data = SporeData(Prepend, Some(num)) val pickled = write(data)
  38. Philipp Haller Spores3: Implementation status • Open source implementation (Apache

    License 2.0) – Pre-release: "com.phaller" %% "spores3" % "0.1.0" – GitHub repository:
 https://github.com/phaller/spores3 • Supports Scala/JVM and Scala.js – Scala Native planned • Out-of-the-box integration with uPickle – Integrations with other serialization libraries planned 38
  39. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 39
  40. Philipp Haller Summary • Using closures in distributed and concurrent

    settings is a safety risk • Spores3: safer and more flexible closures for Scala 3 – A completely new implementation of spores for Scala 3 – Explicit environment, tracked using type refinement – Type-based environment constraints – Flexible, portable and safe serialization 40 Thank You! @philippkhaller https://www.phaller.com https://github.com/phaller/spores3