Slide 1

Slide 1 text

How to avoid safety hazards when using closures in Scala Philipp Haller Associate Professor EECS School and Digital Futures KTH Royal Institute of Technology Stockholm, Sweden Strange Loop 2022 September 23rd, 2022 Union Station Hotel, St. Louis, Missouri, USA

Slide 2

Slide 2 text

Philipp Haller Philipp Haller: Background • Since 2018 Associate Professor at KTH Royal Inst. of Tech. – 2014–2018 Assistant Professor at KTH • 2005–2014 Scala language team – PhD 2010 EPFL, Switzerland – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.) • Focus on concurrent and distributed programming – Creator of Scala Actors, co-author of Scala’s futures and Scala Async 2 2019: ACM SIGPLAN Programming Languages Software Award for Scala Core contributors:
 Martin Odersky, Adriaan Moors, Aleksandar Prokopec, Heather Miller, Iulian Dragos, Nada Amin, Philipp Haller, Sebastien Doeraene, Tiark Rompf

Slide 3

Slide 3 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 3

Slide 4

Slide 4 text

Philipp Haller What’s a closure? • Important: we assume lexically scoped name binding • The anonymous function refers to local variable “threshold” in its lexical context • Closure = anonymous function “whose open bindings (free variables) have been closed by the lexical environment” (Peter J. Landin) 4 val numbers = List(6, 3, 9, 2, 4) val threshold = 5 val below = numbers.filter(num => num < threshold) assert(below == List(3, 2, 4))

Slide 5

Slide 5 text

Philipp Haller Closures are essential • Context: data processing engines like Apache Spark™ 5 val textFile = spark.read.textFile("README.md") textFile .map(line => line.split(" ").size) .reduce((a, b) => Math.max(a, b))

Slide 6

Slide 6 text

Philipp Haller Closures are essential • Context: concurrent programming in Java 6 Future averageAgeAsync(List customers) { Callable task = () -> { var averageAge = customers.stream() .mapToInt(Customer::getAge) .average().getAsDouble(); return averageAge; }; return executor.submit(task); }

Slide 7

Slide 7 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 7

Slide 8

Slide 8 text

Philipp Haller Trouble in Paradise 8 class SparkExample { val distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } Using Apache Spark™ Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...

Slide 9

Slide 9 text

Philipp Haller Trouble in Paradise 9 class SparkExample { val distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Uses transform from enclosing scope

Slide 10

Slide 10 text

Philipp Haller Trouble in Paradise 10 class SparkExample { val distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => this.transform(elem)) transformed.collect().foreach(elem => println(elem)) } } this must be serialized when the closure is shipped to the remote workers The type of this is SparkExample which is not serializable, hence... • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Actually: closure captures this! Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...

Slide 11

Slide 11 text

Philipp Haller Problematic uses of closures • Using closures in distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 11

Slide 12

Slide 12 text

Philipp Haller Closures and concurrency • Let's revisit our earlier example in Java! 12 Future averageAgeAsync(List customers) { Callable task = () -> { var averageAge = customers.stream() .mapToInt(Customer::getAge) .average().getAsDouble(); return averageAge; }; return executor.submit(task); } Accesses customers which might be mutated concurrently Possible data race!

Slide 13

Slide 13 text

Philipp Haller Problematic uses of closures • Using closures in distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 13

Slide 14

Slide 14 text

Philipp Haller Problematic uses of closures • Using closures in distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • Using closures in concurrent settings is a safety risk – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Anything else? – Say, you want to send a closure from a frontend running on a JavaScript engine to a backend running on a JVM • requires a portable serialization scheme! 14

Slide 15

Slide 15 text

Philipp Haller Observations • Safety issues stem from unrestricted variable capture – Concurrency: capturing and accessing shared mutable objects – Distribution: capturing references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone mutable objects upon capture 15

Slide 16

Slide 16 text

Philipp Haller How to spot unsafe code using closures? • Key: the environment of the closure: its captured variables • Some closure code smells: – Capturing vars • Re-assigned within closure body? • In Java, captured variables must be final – Potentially unsafe types of captured variables • Mutable types • Types that don’t mesh well with distribution – Not serializable, not accessible remotely 16

Slide 17

Slide 17 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 17

Slide 18

Slide 18 text

Philipp Haller How to write safer closure-using code? • Basics first: need to be able to spot the captured variables – Closures should not be too big – Closures should not capture too many variables • 2-4 variables OK, > 7 variables probably not • Types of captured variables must be safe – Prefer (deeply) immutable types – Required properties? Serializability? Concurrency safety? 18

Slide 19

Slide 19 text

Philipp Haller How to write safer closure-using code? cont’d • Verify creation of closure – What’s the logical snapshot of the memory that the closure should be initialized with? – Example: • is it sufficient to initialize the closure’s environment with a copy of a reference? • or should mutable objects be cloned first? • Verify semantics of closure’s execution – Mutation of environment? Transactional semantics? 19

Slide 20

Slide 20 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 20

Slide 21

Slide 21 text

Philipp Haller Spores3 Goals: An abstraction that makes closures safer and more flexible Requirements: – Enable constraining the environment (the captured variables) using types – Support serialization based on type classes – Enable a portable implementation, including serialization – Minimize the use of macros 21

Slide 22

Slide 22 text

Philipp Haller Idea: Spores • Introduce an abstraction, called “spore” [1], which can be seen as a special kind of closure • Spores: – have an explicit environment – track the type of their environment using a type refinement, enabling type-based constraints – enable operations on their environment, for example, for serialization and duplication/cloning 22 [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 (Google Scholar: 47 citations)

Slide 23

Slide 23 text

Philipp Haller New: Spores3 • A completely new implementation of spores for Scala 3 • Addresses limitations of original spores for Scala 2: – Macro usage by Spores3 is simple and robust • Essentially limited to compile-time checks – Spores3 is portable from the beginning • Proposes a novel approach to serialization based on type classes – Flexible, portable and safe 23

Slide 24

Slide 24 text

Philipp Haller Overview • A simple spore without environment: • The above spore has the following type: • Spore types are subtypes of corresponding function types: 24 val s = Spore((x: Int) => x + 2) Spore[Int, Int] { type Env = Nothing } sealed trait Spore[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!

Slide 25

Slide 25 text

Philipp Haller Spores with environments • The environment of a spore is initialized explicitly: • The above spore s2 has type: 25 val str = "anonymous function" val s2 = Spore(str) { env => (x: Int) => x + env.length } Environment initialized with argument str Environment accessed using extra parameter Spore[Int, Int] { type Env = String }

Slide 26

Slide 26 text

Philipp Haller Why use an extra parameter for the environment? • User code needs one more parameter… • Yes, but it enables the use of pattern matching 
 (instead of, say, env._1, env._2, …): 26 val s = "anonymous function" val i = 5 Spore((s, i)) { case (str, num) => (x: Int) => x + str.length - num }

Slide 27

Slide 27 text

Philipp Haller Type-based constraints • The Env type member of the Spore trait enables expressing type-based constraints on the spore's environment using context parameters • Example: require a spore parameter to only capture thread-safe types: 27 /* Run spore `s` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist

Slide 28

Slide 28 text

Philipp Haller Serialization • One of the design goals for Spores3 is to support serialization based on type classes/contextual abstractions – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 28

Slide 29

Slide 29 text

Philipp Haller Serialization in Spores3: Approach • Instead of serializing the code of a spore, what's serialized is – a unique identifier that enables instantiating the implementation of the spore; and – the spore's environment. • In practice: – Create spore using a named spore builder – Spore builder identifies the spore's implementation 29

Slide 30

Slide 30 text

Philipp Haller Serializing spores: Example • Step 1: define spore using spore builder: • Step 2: create serializable representation of spore: 30 object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) Prepend environment to list parameter val num: Int = ... val data = SporeData(Prepend, Some(num)) Environment

Slide 31

Slide 31 text

Philipp Haller Serializing spores: Example (cont'd) • Step 3: pickle SporeData (here, using uPickle): • Output (JSON): 31 import upickle.default.* import com.phaller.spores.upickle.given val data = SporeData(Prepend, Some(num)) val pickled = write(data) ["com.example.Prepend",1,""] 1 = non-empty environment

Slide 32

Slide 32 text

Philipp Haller Enforcing safe serialization • Example: A method sendOff that serializes a spore and sends it across the network to a remote executor • Solution: 32 def sendOff[N, S <: SporeData[T, T] { type Env = N }]
 (sporeData: S)(using ReadWriter[S]): Unit = { ... }

Slide 33

Slide 33 text

Philipp Haller Enforcing safe serialization • Example: A method sendOff that serializes a spore and sends it across the network to a remote executor • Solution: 33 def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter] (sporeData: S): Unit = { ... } Context bound

Slide 34

Slide 34 text

Philipp Haller Deserializing spores • Step 1: read pickled data with target type PackedSporeData: – Note: PackedSporeData abstracts from type of environment! • Step 2: convert PackedSporeData to spore: 34 val unpickledData = read[PackedSporeData](pickled) val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]

Slide 35

Slide 35 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 35

Slide 36

Slide 36 text

Philipp Haller Implementing serialization • Recall creation of spore builder: • Environment serializer + deserializer obtained when builder is constructed: 36 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)

Slide 37

Slide 37 text

Philipp Haller Implementing serialization • Recall creation of spore builder: • Environment serializer + deserializer obtained when builder is constructed: 37 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)

Slide 38

Slide 38 text

Philipp Haller Implementing serialization • Recall creation of spore builder: • Environment serializer + deserializer obtained when builder is constructed: 38 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv) Deserialize environment

Slide 39

Slide 39 text

Philipp Haller Implementing serialization (2) • Recall step 2: creating serializable form of a spore: • SporeData factory uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of SporeData instance consists of: – fully-qualified name of builder object – serialized environment 39 val data = SporeData(Prepend, Some(num)) val pickled = write(data)

Slide 40

Slide 40 text

Philipp Haller Spores3: Implementation status • Open source implementation (Apache License 2.0) – GitHub repository: https://github.com/phaller/spores3 • Supports Scala/JVM and Scala.js – Scala Native planned • Out-of-the-box integration with uPickle – “lightweight JSON and binary (MessagePack) serialization library for Scala” • Integration with other serialization libraries planned 40

Slide 41

Slide 41 text

Philipp Haller Outline • Closures are essential • The many ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 41

Slide 42

Slide 42 text

Philipp Haller Summary • Using closures in distributed and concurrent settings is a safety risk • How to write safer closure-using code: – Check captured variables, their types, and capturing semantics • Spores3: safer and more flexible closures for Scala 3 – A completely new implementation of spores for Scala 3 – Explicit environment, tracked using type refinement – Type-based environment constraints – Flexible, portable and safe serialization 42 Thank You! @philippkhaller github.com/phaller/spores3