Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to avoid safety hazards when using closures in Scala

Philipp Haller
September 23, 2022
190

How to avoid safety hazards when using closures in Scala

Philipp Haller

September 23, 2022
Tweet

Transcript

  1. How to avoid safety hazards when using closures in Scala

    Philipp Haller Associate Professor EECS School and Digital Futures KTH Royal Institute of Technology Stockholm, Sweden Strange Loop 2022 September 23rd, 2022 Union Station Hotel, St. Louis, Missouri, USA
  2. Philipp Haller Philipp Haller: Background • Since 2018 Associate Professor

    at KTH Royal Inst. of Tech. – 2014–2018 Assistant Professor at KTH • 2005–2014 Scala language team – PhD 2010 EPFL, Switzerland – 2012–2014 Typesafe, Inc. (now Lightbend, Inc.) • Focus on concurrent and distributed programming – Creator of Scala Actors, co-author of Scala’s futures and Scala Async 2 2019: ACM SIGPLAN Programming Languages Software Award for Scala Core contributors:
 Martin Odersky, Adriaan Moors, Aleksandar Prokopec, Heather Miller, Iulian Dragos, Nada Amin, Philipp Haller, Sebastien Doeraene, Tiark Rompf
  3. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 3
  4. Philipp Haller What’s a closure? • Important: we assume lexically

    scoped name binding • The anonymous function refers to local variable “threshold” in its lexical context • Closure = anonymous function “whose open bindings (free variables) have been closed by the lexical environment” (Peter J. Landin) 4 val numbers = List(6, 3, 9, 2, 4) val threshold = 5 val below = numbers.filter(num => num < threshold) assert(below == List(3, 2, 4))
  5. Philipp Haller Closures are essential • Context: data processing engines

    like Apache Spark™ 5 val textFile = spark.read.textFile("README.md") textFile .map(line => line.split(" ").size) .reduce((a, b) => Math.max(a, b))
  6. Philipp Haller Closures are essential • Context: concurrent programming in

    Java 6 Future<Double> averageAgeAsync(List<Customer> customers) { Callable<Double> task = () -> { var averageAge = customers.stream() .mapToInt(Customer::getAge) .average().getAsDouble(); return averageAge; }; return executor.submit(task); }
  7. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 7
  8. Philipp Haller Trouble in Paradise 8 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } Using Apache Spark™ Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
  9. Philipp Haller Trouble in Paradise 9 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => transform(elem)) transformed.collect().foreach(elem => println(elem)) } } • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Uses transform from enclosing scope
  10. Philipp Haller Trouble in Paradise 10 class SparkExample { val

    distData = sc.parallelize(Array(1, 2, 3, 4, 5)) def transform(x: Int): Int = x+1 def test(): Unit = { val transformed = distData.map(elem => this.transform(elem)) transformed.collect().foreach(elem => println(elem)) } } this must be serialized when the closure is shipped to the remote workers The type of this is SparkExample which is not serializable, hence... • distData is a distributed data set
 → each remote worker has a piece of the data • map sends its argument closure to each worker
 → argument closure must be serialized Actually: closure captures this! Exception in thread "main" org.apache.spark.SparkException: Task not serializable ...
  11. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 11
  12. Philipp Haller Closures and concurrency • Let's revisit our earlier

    example in Java! 12 Future<Double> averageAgeAsync(List<Customer> customers) { Callable<Double> task = () -> { var averageAge = customers.stream() .mapToInt(Customer::getAge) .average().getAsDouble(); return averageAge; }; return executor.submit(task); } Accesses customers which might be mutated concurrently Possible data race!
  13. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • What about concurrency? 13
  14. Philipp Haller Problematic uses of closures • Using closures in

    distributed settings is a safety risk – Example: serializing closures can result in runtime errors
 (e.g., java.io.NotSerializableException on the JVM) • Using closures in concurrent settings is a safety risk – Example: running a closure on a concurrent thread could cause a data race if a captured variable refers to a shared mutable object • Anything else? – Say, you want to send a closure from a frontend running on a JavaScript engine to a backend running on a JVM • requires a portable serialization scheme! 14
  15. Philipp Haller Observations • Safety issues stem from unrestricted variable

    capture – Concurrency: capturing and accessing shared mutable objects – Distribution: capturing references to non-serializable objects • Potential remedies: – Restricting types of captured variables • For example, permit only types known to be serializable – Provide more capturing modes • For example, deeply clone mutable objects upon capture 15
  16. Philipp Haller How to spot unsafe code using closures? •

    Key: the environment of the closure: its captured variables • Some closure code smells: – Capturing vars • Re-assigned within closure body? • In Java, captured variables must be final – Potentially unsafe types of captured variables • Mutable types • Types that don’t mesh well with distribution – Not serializable, not accessible remotely 16
  17. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 17
  18. Philipp Haller How to write safer closure-using code? • Basics

    first: need to be able to spot the captured variables – Closures should not be too big – Closures should not capture too many variables • 2-4 variables OK, > 7 variables probably not • Types of captured variables must be safe – Prefer (deeply) immutable types – Required properties? Serializability? Concurrency safety? 18
  19. Philipp Haller How to write safer closure-using code? cont’d •

    Verify creation of closure – What’s the logical snapshot of the memory that the closure should be initialized with? – Example: • is it sufficient to initialize the closure’s environment with a copy of a reference? • or should mutable objects be cloned first? • Verify semantics of closure’s execution – Mutation of environment? Transactional semantics? 19
  20. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 20
  21. Philipp Haller Spores3 Goals: An abstraction that makes closures safer

    and more flexible Requirements: – Enable constraining the environment (the captured variables) using types – Support serialization based on type classes – Enable a portable implementation, including serialization – Minimize the use of macros 21
  22. Philipp Haller Idea: Spores • Introduce an abstraction, called “spore”

    [1], which can be seen as a special kind of closure • Spores: – have an explicit environment – track the type of their environment using a type refinement, enabling type-based constraints – enable operations on their environment, for example, for serialization and duplication/cloning 22 [1] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 (Google Scholar: 47 citations)
  23. Philipp Haller New: Spores3 • A completely new implementation of

    spores for Scala 3 • Addresses limitations of original spores for Scala 2: – Macro usage by Spores3 is simple and robust • Essentially limited to compile-time checks – Spores3 is portable from the beginning • Proposes a novel approach to serialization based on type classes – Flexible, portable and safe 23
  24. Philipp Haller Overview • A simple spore without environment: •

    The above spore has the following type: • Spore types are subtypes of corresponding function types: 24 val s = Spore((x: Int) => x + 2) Spore[Int, Int] { type Env = Nothing } sealed trait Spore[-T, +R] extends (T => R) { type Env } Function literal not permitted to capture anything!
  25. Philipp Haller Spores with environments • The environment of a

    spore is initialized explicitly: • The above spore s2 has type: 25 val str = "anonymous function" val s2 = Spore(str) { env => (x: Int) => x + env.length } Environment initialized with argument str Environment accessed using extra parameter Spore[Int, Int] { type Env = String }
  26. Philipp Haller Why use an extra parameter for the environment?

    • User code needs one more parameter… • Yes, but it enables the use of pattern matching 
 (instead of, say, env._1, env._2, …): 26 val s = "anonymous function" val i = 5 Spore((s, i)) { case (str, num) => (x: Int) => x + str.length - num }
  27. Philipp Haller Type-based constraints • The Env type member of

    the Spore trait enables expressing type-based constraints on the spore's environment using context parameters • Example: require a spore parameter to only capture thread-safe types: 27 /* Run spore `s` concurrently, immediately returning a future * which is eventually completed with the result of type `T`. */ def future[T](s: Spore[Unit, T])(using ThreadSafe[s.Env]): Future[T] = ... Thread-safe types are types for which instances of type class ThreadSafe exist
  28. Philipp Haller Serialization • One of the design goals for

    Spores3 is to support serialization based on type classes/contextual abstractions – Flexibility: enable integration with different serialization frameworks (uPickle, Java serialization, Kryo, Jackson, etc.) – Portability: support multiple backends/runtime environments – Safety: serializability is determined statically • Assumptions: – Serialization is primarily used for communication between remote nodes – Every node is running the same code – No transmission of byte code or source code 28
  29. Philipp Haller Serialization in Spores3: Approach • Instead of serializing

    the code of a spore, what's serialized is – a unique identifier that enables instantiating the implementation of the spore; and – the spore's environment. • In practice: – Create spore using a named spore builder – Spore builder identifies the spore's implementation 29
  30. Philipp Haller Serializing spores: Example • Step 1: define spore

    using spore builder: • Step 2: create serializable representation of spore: 30 object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) Prepend environment to list parameter val num: Int = ... val data = SporeData(Prepend, Some(num)) Environment
  31. Philipp Haller Serializing spores: Example (cont'd) • Step 3: pickle

    SporeData (here, using uPickle): • Output (JSON): 31 import upickle.default.* import com.phaller.spores.upickle.given val data = SporeData(Prepend, Some(num)) val pickled = write(data) ["com.example.Prepend",1,"<num>"] 1 = non-empty environment
  32. Philipp Haller Enforcing safe serialization • Example: A method sendOff

    that serializes a spore and sends it across the network to a remote executor • Solution: 32 def sendOff[N, S <: SporeData[T, T] { type Env = N }]
 (sporeData: S)(using ReadWriter[S]): Unit = { ... }
  33. Philipp Haller Enforcing safe serialization • Example: A method sendOff

    that serializes a spore and sends it across the network to a remote executor • Solution: 33 def sendOff[N, S <: SporeData[T, T] { type Env = N } : ReadWriter] (sporeData: S): Unit = { ... } Context bound
  34. Philipp Haller Deserializing spores • Step 1: read pickled data

    with target type PackedSporeData: – Note: PackedSporeData abstracts from type of environment! • Step 2: convert PackedSporeData to spore: 34 val unpickledData = read[PackedSporeData](pickled) val unpickledSpore = unpickledData.toSpore[List[Int], List[Int]]
  35. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 35
  36. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 36 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)
  37. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 37 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv)
  38. Philipp Haller Implementing serialization • Recall creation of spore builder:

    • Environment serializer + deserializer obtained when builder is constructed: 38 class Builder[E, T, R](body: E => T => R) (using ReadWriter[E]) extends TypedBuilder[E, T, R]: object Prepend extends Spore.Builder[Int, List[Int], List[Int]]( env => (xs: List[Int]) => env :: xs ) def createSpore(envOpt: Option[String]): Spore[T, R] = val initEnv = read[E](envOpt.get) apply(initEnv) Deserialize environment
  39. Philipp Haller Implementing serialization (2) • Recall step 2: creating

    serializable form of a spore: • SporeData factory uses a macro to: – check that argument builder is a top-level object – obtain fully-qualified name of builder object • Serialization of SporeData instance consists of: – fully-qualified name of builder object – serialized environment 39 val data = SporeData(Prepend, Some(num)) val pickled = write(data)
  40. Philipp Haller Spores3: Implementation status • Open source implementation (Apache

    License 2.0) – GitHub repository: https://github.com/phaller/spores3 • Supports Scala/JVM and Scala.js – Scala Native planned • Out-of-the-box integration with uPickle – “lightweight JSON and binary (MessagePack) serialization library for Scala” • Integration with other serialization libraries planned 40
  41. Philipp Haller Outline • Closures are essential • The many

    ways in which closure-using code can go wrong – Any how to spot unsafe code • How to write safer closure-using code • Spores3: safer and more flexible closures for Scala 3 • Implementing serialization in Spores3 • Summary 41
  42. Philipp Haller Summary • Using closures in distributed and concurrent

    settings is a safety risk • How to write safer closure-using code: – Check captured variables, their types, and capturing semantics • Spores3: safer and more flexible closures for Scala 3 – A completely new implementation of spores for Scala 3 – Explicit environment, tracked using type refinement – Type-based environment constraints – Flexible, portable and safe serialization 42 Thank You! @philippkhaller github.com/phaller/spores3