School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm, Sweden 13th ACM SIGPLAN Scala Symposium June 6, 2022 Berlin, Germany
concurrent settings presents safety hazards – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Using closures in distributed settings exposes limitations – Example: sending a closure from a frontend running on a JavaScript engine to a backend running on a JVM requires a portable serialization scheme • … and is a safety risk – Example: serializing closures can result in runtime errors (e.g., java.io.NotSerializableException on the JVM) 2
... def averageAge(customers: List[Customer]): Future[Float] = Future { val infos = customers.flatMap { c => customerData.get(c.customerNo) match case Some(info) => List(info) case None => List() } val sumAges = infos.foldLeft(0)(_ + _.age).toFloat if (infos.nonEmpty) sumAges / infos.size else 0.0f } Possible data race!
factor: Double = 1.2 def example(): Unit = { val fun = { (persons: List[Person]) => val sumAges = persons.map(_.age).reduce(_ + _) (sumAges / persons.size) * factor } val bytes = serialize(fun) } def serialize(f: List[Person] => Double): Array[Byte] = { ... Exception in thread "main" java.io.NotSerializableException: Example at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185) at java.base/java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1379) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:15 at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) Uses factor from enclosing scope
factor: Double = 1.2 def example(): Unit = { val fun = { (persons: List[Person]) => val sumAges = persons.map(_.age).reduce(_ + _) (sumAges / persons.size) * this.factor } val bytes = serialize(fun) } def serialize(f: List[Person] => Double): Array[Byte] = { ... Actually: capturing this (of type Example)! Exception in thread "main" java.io.NotSerializableException: Example at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185) at java.base/java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1379) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:15 at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
capture – Concurrency: capture and access shared mutable objects – Serialization: capture references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone a mutable object upon capture 6
closures safer and more flexible Requirements: – Enable constraining the environment (the captured variables) using types – Support serialization based on type classes – Enable a portable implementation, including serialization – Minimize the use of macros 9
can be seen as a special kind of closure • Blocks: – have an explicit environment; – restrict variable capture to a single variable; – track the type of their environment using a type refinement; – enable operations on their environment, for example, for serialization and duplication/cloning 10
The above block has the following type: • Block types are subtypes of corresponding function types: 12 val b = Block((x: Int) => x + 2) Block[Int, Int] { type Env = Nothing } sealed trait Block[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!
block is initialized explicitly: • The above block b2 has type: 13 val s = "anonymous function" val b2 = Block(s) { (x: Int) => x + env.length } Environment initialized with argument s Environment accessed using env Block[Int, Int] { type Env = String }
the Block trait enables expressing type-based constraints on the block's environment using context parameters • Example: require a block parameter to only capture thread-safe types: 14 /* Run block `b` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](b: Block[Unit, T])(using ThreadSafe[b.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist
blocks is to support flexible, portable, and safe serialization based on type classes/contextual abstractions – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 15
code of a block, what's serialized is – a unique identifier that enables instantiating the implementation of the block; and – the block's environment. • In practice: – Create block using a named block builder – Block builder identifies the block's implementation 16
with target type PackedBlockData: – Note: PackedBlockData abstracts from type of environment! • Step 2: convert PackedBlockData to block: 19 val unpickledData = read[PackedBlockData](pickled) val unpickledBlock = unpickledData.toBlock[List[Int], List[Int]]
body is a function returning a context function [1] • This means: EnvAsParam[E] parameter does not appear in user code! 21 object Block: def apply[E, T, R](initEnv: E) (body: T => EnvAsParam[E] ?=> R) = new Block[T, R] { type Env = E def apply(x: T): R = body(x)(using initEnv) ... } Environment passed to context parameter
E compatible with context parameter of type EnvAsParam[E]? • Environment access: • Why use an opaque type alias? – To permit any type for the environment, including types for which there are user- defined givens (implicits) in scope! – Without the opaque type alias, the env method could return a user-defined given that happens to have the same type as the environment 22 def env[E](using ep: EnvAsParam[E]): E = ep opaque type EnvAsParam[T] = T (within Block object)
checking that the body of the block does not capture any variable – Environment is accessed using env (which is not captured!) • Capture checking is done using a macro [2,3]: 23 inline def apply[E, T, R](inline initEnv: E) (inline body: T => EnvAsParam[E] ?=> R): Block[T, R] { type Env = E } = ${ applyCode('initEnv)('body) }
serializable form of a block: • BlockData construction uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of BlockData object consists of: – fully-qualified name of builder object – serialized environment 25 val data = BlockData(PrependBuilder, Some(num)) val pickled = write(data)
special env member • This is suboptimal when environment is a tuple – Needed when environment consists of multiple values • Example: 27 val s = "anonymous function" val i = 5 Block((s, i)) { (x: Int) => x + env._1.length - env._2 }
the environment as an explicit parameter instead of a context parameter • Thus, user code needs one more parameter • However, it enables the use of pattern matching instead of env._1, env._2, ...: 28 val s = "anonymous function" val i = 5 Block((s, i)) { case (l, r) => (x: Int) => x + l.length - r } Can also use parameter untupling
closely related work: Spores [4] – Implementation of Spores depends on experimental macro system of Scala 2.11 • Scala 3 has a new, incompatible macro system – Portability: there was only experimental Scala.js support for Spores which excluded serialization – Blocks explore a unique combination of language features introduced in Scala 3 (context functions, opaque types, and unified macro/multi- stage programming system) in order to reduce complexity of macros 30
introduces a new type constructor Static: – A value of type Static t can be serialized without knowing how to serialize t (e.g., a function without free variables can be serialized as a symbolic code address; type Static (a -> b)) • A term static e has type Static t iff e has type t and all of e's free variables are top-level – Key idea for serialization: at closure construction time, serialize environment and look up deserializer (which must be a top-level definition) • Blocks: (a) environment does not have to be serialized at closure construction time (since type of environment tracked) and (b) deserializer does not have to be a top-level definition 31
tracks captures in types – Introduces pure closures which don't capture any capabilities • Capability = variables with capability type – Blocks require checking a stronger property: body closure of block must not capture any variable; restriction to capabilities not sufficient 32
with explicit environments – Environment type is part of block type – Safety ensured using macros – Safe and portable serialization based on type classes • Implemented1 as a library for Scala 3 – Explores unique combination of language features – Designed to be portable: • Scala.js support from the beginning, Scala Native support planned 34 1 https://github.com/phaller/blocks Thank You!
a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 [5] Epstein, Black, Peyton Jones. Towards Haskell in the cloud. Haskell Symposium 2011 [1] Odersky, Blanvillain, Liu, Biboudis, Miller, Stucki. Simplicitly: foundations and applications of implicit function types. Proc. ACM Program. Lang. 2(POPL): 42:1-42:29 (2018) [3] Stucki, Brachthäuser, Odersky. Virtual ADTs for portable metaprogramming. MPLR 2021 [2] Stucki, Biboudis, Odersky. A practical unification of multi-stage programming and macros. GPCE 2018 [6] Boruch-Gruszecki, Brachthäuser, Lee, Lhoták, Odersky. Tracking Captured Variables in Types. CoRR abs/2105.11896 (2021)