Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How do we Optimise Scala Build Times?

How do we Optimise Scala Build Times?

Slides from a talk I gave at Func Prog Sweden. First I discuss the incremental compilation algorithm used to compile Scala code by build tools such as sbt and Mill.

Next I explore other ways to optimise build times, such as build pipelining, and outline typing. I provide benchmarks and profiling data to demonstrate the effects of these modes of compilation.

Jamie Thompson

April 18, 2024
Tweet

More Decks by Jamie Thompson

Other Decks in Technology

Transcript

  1. Agenda Explaining the Scala Build What happens when you build

    a project? How do build tools optimise? Incremental compilation, Pipelined builds Takeaways Which steps can you take today to improve build times? What does the future hold? Can we add more innovations to speed up builds?
  2. • Created in 2016 at EPFL • Not for profit

    • Team of 10 people: ◦ administration ◦ communication ◦ engineering • Advisory Board: ◦ 2 Community Representatives ◦ 5 Companies
  3. Example: Small Server App 50 source files 10 library dependencies

    A.scala lib.jar 1 project directory webservice
  4. 1. Fetch dependencies 2. Generate source files Example: Small Server

    App once per config update ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM)
  5. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform slow!!! ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) once per config update Example: Small Server App
  6. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App
  7. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App Compiling project (Scala 3.3.1, JVM)
  8. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App Compiling project (Scala 3.3.1, JVM)
  9. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App Compiling project (Scala 3.3.1, JVM)
  10. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App Compiling project (Scala 3.3.1, JVM)
  11. 1. Fetch dependencies 2. Generate source files 3. Compile source

    files to runtime platform 4. Create a launcher and execute it fast ~/workspace » scala run webservice Compiling project (Scala 3.3.1, JVM) Compiled project (Scala 3.3.1, JVM) running server on localhost:8080 once per config update slow!!! Example: Small Server App Compiling project (Scala 3.3.1, JVM) * goal *
  12. Incremental Compilation First, detect A.scala has changed, compile it. C.scala

    D.scala E.scala F.scala G.scala H.scala B.scala A.scala
  13. Incremental Compilation Next, compile the dependencies of any changed definitions

    A.scala C.scala D.scala E.scala F.scala G.scala H.scala B.scala
  14. Incremental Compilation Correct Performant compile everything that is necessary don’t

    compile more than necessary efficient invalidation The two unbreakable rules
  15. Introducing Zinc Zinc is an incremental compiler for the Scala

    language, it maximizes correctness and performance with the name hashing algorithm.
  16. Incremental Compilation Zinc scalac A.scala A.scala A.scala lib.jar “there is

    no cache, we must compile everything!” build tool
  17. Incremental Compilation Zinc scalac A.scala A.scala A.scala lib.jar Analysis A.class

    B.class C.class … “compile succeeded” build tool
  18. Incremental Compilation Zinc scalac “compile 50 scala files” A.scala A.scala

    A.scala lib.jar Analysis A.class B.class C.class … build tool
  19. Incremental Compilation Zinc scalac lib.jar Analysis A.class B.class C.class …

    “Actually, only A.scala changed” A.scala A.scal a B.scala A.scala build tool
  20. Incremental Compilation Zinc scalac lib.jar Analysis “recompile D.scala and F.scala,

    they depend on definitions of A.scala that changed.” D.scala F.scala A.scala A.scal a A.scala A.class build tool
  21. Incremental Compilation Zinc scalac lib.jar Analysis No more changes detected!

    A.scala A.scala A.scala D.class F.class build tool
  22. Takeaways Tip No. 1 Use small files! A more granular

    dependency graph avoids unnecessary recompilation
  23. Incremental Compilation Analysis APIs Dependencies tree data structure, representing signatures

    pairs of used-name, file of origin Stamps timestamps, file hashes
  24. Incremental Compilation Analysis Stamps timestamps, file hashes “A.scala has different

    bytes than the last time I saw it, it should be recompiled.”
  25. Incremental Compilation Analysis APIs APIs in A.scala: class A: def

    foo: Int def foo(x: Int): Int def foo(x: Int, y: String): Int val bar: Boolean tree data structure, representing signatures
  26. Incremental Compilation Analysis APIs - Name Hashes class A defines

    names: foo = 0x8523a23 bar = 0x4d65e65 computed by Zinc aggregate all “foo” API aggregate all “bar” API
  27. Incremental Compilation Analysis APIs - Name Hashes class A defines

    names: foo = 0xfb191c7 bar = 0x4d65e65 “Some definition A.foo has a changed API”
  28. Incremental Compilation Analysis Dependencies pairs of used-name, class of origin

    class D inherits from class A class F uses name foo class F uses a member of class A “both class D and class F depend on changed API’s of class A!”
  29. Incremental Compilation Analysis Summary The combination of stamps, name hashes

    and dependencies are sufficient to maximise performance and correctness of the name hashing algorithm.
  30. Incremental Compilation Zinc A.scala genBCode pickler reads text into AST

    checks AST is valid, adds types writes AST to TASTy outputs class files A.class A.tasty typer parser
  31. Incremental Compilation Zinc A.scala genBCode reads text into AST checks

    AST is valid, adds types outputs class files A.class A.tasty typer parser pickler writes AST to TASTy
  32. Incremental Compilation Zinc A.scala genBCode A.class A.tasty typer parser pickler

    deps api callback callback record dependencies record API trees Analysis
  33. Multi-project builds structure your project as a collection of modular

    libraries that cooperate to form a cohesive whole.
  34. Multi-project Builds each group can be compiled in separate stages

    A.scala B.scala C.scala D.scala E.scala F.scala G.scala H.scala
  35. Takeaways Tip No. 2 With a more granular project graph,

    you can introduce parallelism. Split apps into smaller projects!
  36. Multi-project Builds on can we do even better? saved time!

    service common auth model common database multithreading
  37. Pipelined Builds - Morgan Stanley OBT (optimus platform) - Bloop

    with Zinc fork (Jorge Vicente Cantero) - Experimental support today in sbt Prior Work
  38. Pipelined Builds Project B Project C saved time! on pipelining

    on multithreading Project A pipelined multithreaded build no macros no macros
  39. Zinc Pipeline Compiler A.scala A.class A.tasty typer parser pickler deps

    api saved time! rest of compile is now a background task write TASTy parallel thread TASTy files written early early.jar NEW!
  40. Testing on MacBook Pro 2019 (i9 8-core 2.3GHz 16GB RAM)

    From cold sbt start: - clean; compile 2x to warm up - then take mean time of next 7 cycles. Benchmarks - pipelining
  41. “clean compile” time & memory 308,829 LOC lichess-org/lila The key

    takeaway seems to be that you trade time overall for peak memory. 72s 6GB 55s 8.3GB 3.3.0 (standard) 3.3.2-SNAPSHOT (pipelined) 5600 lines/s Benchmarks - pipelining
  42. Scaladex 31% improved 3.3.2-SNAPSHOT (pipelined) Time to finish “clean compile”

    time Your mileage may vary Other projects Benchmarks - pipelining
  43. facia faciaPress article applications sport discussion diagnostics admin identity commercial

    onward archive rss 77 common guardian frontend project layout preview Benchmarks - pipelining
  44. 79 pipelining on multithreading facia common faciaPress … preview on

    article Benchmarks - pipelining is there another way?
  45. Outline Compile NEW! inlining second type check pipelining start (incremental

    compile) lowering, erasure backend pipelining start (full compile) outline type checking api, pickler deps
  46. Outline Build Example 158,000 LOC 30s 1.54x faster! we could

    still do better… 19.5s 3.3.2-SNAPSHOT (2-pass, 3 workers) 3.3.1 (single pass) 8119 lines/s 5277 lines/s lampepfl/dotty