Philipp Haller Associate Professor EECS School and Digital Futures KTH Royal Institute of Technology Stockholm, Sweden Strange Loop 2022 September 23rd, 2022 Union Station Hotel, St. Louis, Missouri, USA
at KTH Royal Inst. of Tech. – 2014–2018 Assistant Professor at KTH • 2005–2014 Scala language team – PhD 2010 EPFL, Switzerland – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.) • Focus on concurrent and distributed programming – Creator of Scala Actors, co-author of Scala’s futures and Scala Async 2 2019: ACM SIGPLAN Programming Languages Software Award for Scala Core contributors: Martin Odersky, Adriaan Moors, Aleksandar Prokopec, Heather Miller, Iulian Dragos, Nada Amin, Philipp Haller, Sebastien Doeraene, Tiark Rompf
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 3
scoped name binding • The anonymous function refers to local variable “threshold” in its lexical context • Closure = anonymous function “whose open bindings (free variables) have been closed by the lexical environment” (Peter J. Landin) 4 val numbers = List(6, 3, 9, 2, 4) val threshold = 5 val below = numbers.filter(num => num < threshold) assert(below == List(3, 2, 4))
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 7
distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } • distData is a distributed data set → each remote worker has a piece of the data • map sends its argument closure to each worker → argument closure must be serialized Uses transform from enclosing scope
distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => this.transform(elem)) transformed.collect().foreach(elem => println(elem)) } } this must be serialized when the closure is shipped to the remote workers The type of this is SparkExample which is not serializable, hence... • distData is a distributed data set → each remote worker has a piece of the data • map sends its argument closure to each worker → argument closure must be serialized Actually: closure captures this! Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 11
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 13
distributed settings is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) • Using closures in concurrent settings is a safety risk – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Anything else? – Say, you want to send a closure from a frontend running on a JavaScript engine to a backend running on a JVM • requires a portable serialization scheme! 14
capture – Concurrency: capturing and accessing shared mutable objects – Distribution: capturing references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone mutable objects upon capture 15
Key: the environment of the closure: its captured variables • Some closure code smells: – Capturing vars • Re-assigned within closure body? • In Java, captured variables must be final – Potentially unsafe types of captured variables • Mutable types • Types that don’t mesh well with distribution – Not serializable, not accessible remotely 16
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 17
first: need to be able to spot the captured variables – Closures should not be too big – Closures should not capture too many variables • 2-4 variables OK, > 7 variables probably not • Types of captured variables must be safe – Prefer (deeply) immutable types – Required properties? Serializability? Concurrency safety? 18
Verify creation of closure – What’s the logical snapshot of the memory that the closure should be initialized with? – Example: • is it sufficient to initialize the closure’s environment with a copy of a reference? • or should mutable objects be cloned first? • Verify semantics of closure’s execution – Mutation of environment? Transactional semantics? 19
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 20
and more flexible Requirements: – Enable constraining the environment (the captured variables) using types – Support serialization based on type classes – Enable a portable implementation, including serialization – Minimize the use of macros 21
[1], which can be seen as a special kind of closure • Spores: – have an explicit environment – track the type of their environment using a type refinement, enabling type-based constraints – enable operations on their environment, for example, for serialization and duplication/cloning 22 [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 (Google Scholar: 47 citations)
spores for Scala 3 • Addresses limitations of original spores for Scala 2: – Macro usage by Spores3 is simple and robust • Essentially limited to compile-time checks – Spores3 is portable from the beginning • Proposes a novel approach to serialization based on type classes – Flexible, portable and safe 23
The above spore has the following type: • Spore types are subtypes of corresponding function types: 24 val s = Spore((x: Int) => x + 2) Spore[Int, Int] { type Env = Nothing } sealed trait Spore[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!
spore is initialized explicitly: • The above spore s2 has type: 25 val str = "anonymous function" val s2 = Spore(str) { env => (x: Int) => x + env.length } Environment initialized with argument str Environment accessed using extra parameter Spore[Int, Int] { type Env = String }
• User code needs one more parameter… • Yes, but it enables the use of pattern matching (instead of, say, env._1, env._2, …): 26 val s = "anonymous function" val i = 5 Spore((s, i)) { case (str, num) => (x: Int) => x + str.length - num }
the Spore trait enables expressing type-based constraints on the spore's environment using context parameters • Example: require a spore parameter to only capture thread-safe types: 27 /* Run spore `s` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist
Spores3 is to support serialization based on type classes/contextual abstractions – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 28
the code of a spore, what's serialized is – a unique identifier that enables instantiating the implementation of the spore; and – the spore's environment. • In practice: – Create spore using a named spore builder – Spore builder identifies the spore's implementation 29
that serializes a spore and sends it across the network to a remote executor • Solution: 32 def sendOff[N, S <: SporeData[T, T] { type Env = N }] (sporeData: S)(using ReadWriter[S]): Unit = { ... }
that serializes a spore and sends it across the network to a remote executor • Solution: 33 def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter] (sporeData: S): Unit = { ... } Context bound
with target type PackedSporeData: – Note: PackedSporeData abstracts from type of environment! • Step 2: convert PackedSporeData to spore: 34 val unpickledData = read[PackedSporeData](pickled) val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 35
serializable form of a spore: • SporeData factory uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of SporeData instance consists of: – fully-qualified name of builder object – serialized environment 39 val data = SporeData(Prepend, Some(num)) val pickled = write(data)
ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 41
settings is a safety risk • How to write safer closure-using code: – Check captured variables, their types, and capturing semantics • Spores3: safer and more flexible closures for Scala 3 – A completely new implementation of spores for Scala 3 – Explicit environment, tracked using type refinement – Type-based environment constraints – Flexible, portable and safe serialization 42 Thank You! @philippkhaller github.com/phaller/spores3