at KTH Royal Inst. of Tech. – 2014–2018 Assistant Professor at KTH • 2005–2014 Scala language team – PhD 2010 EPFL, Switzerland – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.) • Work on concurrent and distributed programming – Creator of Scala Actors, co-author of Scala’s futures and Scala Async • Work on type systems – Research on capabilities, affine types, consistency types, … 2
ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 3
to local variable “threshold” in its lexical context • Closure = anonymous function “whose open bindings (free variables) have been closed by the lexical environment” (Peter J. Landin) 4 val numbers = List(6, 3, 9, 2, 4) val threshold = 5 val below = numbers.filter(num => num < threshold) assert(below == List(3, 2, 4))
ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 7
distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } • distData is a distributed data set → each remote worker has a piece of the data • map sends its argument closure to each worker → argument closure must be serialized Uses transform from enclosing scope
distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => this.transform(elem)) transformed.collect().foreach(elem => println(elem)) } } this must be serialized when the closure is shipped to the remote workers The type of this is SparkExample which is not serializable, hence... • distData is a distributed data set → each remote worker has a piece of the data • map sends its argument closure to each worker → argument closure must be serialized Actually: closure captures this! Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 11
... def averageAge(customers: List[Customer]): Future[Float] = Future { val infos = customers.flatMap { c => customerData.get(c.customerNo) match case Some(info) => List(info) case None => List() } val sumAges = infos.foldLeft(0)(_ + _.age).toFloat if (infos.nonEmpty) sumAges / infos.size else 0.0f } Possible data race!
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 13
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • Using closures in concurrent settings is a safety risk – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Anything else? – Say, you want to send a closure from a frontend running on a JavaScript engine to a backend running on a JVM • requires a portable serialization scheme! 14
capture – Concurrency: capturing and accessing shared mutable objects – Distribution: capturing references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone mutable objects upon capture 15
the captured variables – Closures should not capture too many variables • 2-4 variables OK, > 7 variables probably not • Types of captured variables must be safe – Prefer immutable types – Check required properties • Serializability? Concurrency safety? • What’s the logical memory snapshot that the closure should be initialized with? 16 Would be nice if the compiler could check this…
ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 17
Automate safety checking of closures • Add flexibility, e.g., portable serialization Main idea: “spores” [1] — a special kind of closure that – has an explicit environment – tracks the type of its environment using type refinements 18 [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 (Google Scholar: 47 citations)
of spores for Scala 3 • Addresses limitations of original spores for Scala 2: – Macro usage limited to compile-time checks – Portable from the beginning • Proposes a novel approach to serialization – Based on type classes – Flexible, portable and safe 19
The above spore has the following type: • Spore types are subtypes of corresponding function types: 20 val s = Spore((x: Int) => x + 2) Spore[Int, Int] { type Env = Nothing } sealed trait Spore[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!
spore is initialized explicitly: • The above spore s2 has type: 21 val str = "anonymous function" val s2 = Spore(str) { env => (x: Int) => x + env.length } Environment initialized with argument str Environment accessed using extra parameter Spore[Int, Int] { type Env = String }
• User code needs one more parameter… • Yes, but it enables the use of pattern matching (instead of, say, env._1, env._2, …): 22 val s = "anonymous function" val i = 5 Spore((s, i)) { case (str, num) => (x: Int) => x + str.length - num }
the Spore trait enables expressing type-based constraints on the spore's environment using context parameters • Example: require a spore parameter to only capture thread-safe types: 23 /* Run spore `s` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist
Spores3 is to support serialization based on type classes – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 24
the code of a spore, what's serialized is – a unique identifier that enables instantiating the implementation of the spore; and – the spore's environment. • In practice: – Create spore using a named spore builder – Spore builder identifies the spore's implementation 25
that serializes a spore and sends it across the network to a remote executor • Solution: 28 def sendOff[N, S <: SporeData[T, T] { type Env = N }] (sporeData: S)(using ReadWriter[S]): Unit = { ... }
that serializes a spore and sends it across the network to a remote executor • Solution: 29 def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter] (sporeData: S): Unit = { ... } Context bound
with target type PackedSporeData: – Note: PackedSporeData abstracts from type of environment! • Step 2: convert PackedSporeData to spore: 30 val unpickledData = read[PackedSporeData](pickled) val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]
ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 31
spore does not capture any variable – use macro for (compile-time) capture checking 32 object Spore { inline def apply[E, T, R](inline initEnv: E) (inline body: E => T => R): Spore[T, R] { type Env = E } = ${ applyCode('initEnv)('body) }
serializable form of a spore: • SporeData factory uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of SporeData instance consists of: – fully-qualified name of builder object – serialized environment 37 val data = SporeData(Prepend, Some(num)) val pickled = write(data)
ways in which closure-using code can go wrong – And how to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing Spores3 • Summary 39
settings is a safety risk • Spores3: safer and more flexible closures for Scala 3 – A completely new implementation of spores for Scala 3 – Explicit environment, tracked using type refinement – Type-based environment constraints – Flexible, portable and safe serialization 40 Thank You! @philippkhaller https://www.phaller.com https://github.com/phaller/spores3