scala-ildl.org Vlad URECHE PhD student in the Scala Team @ EPFL Working on program transformations in the Scala programming language, focusing on data representation. @ @VladUreche @VladUreche [email protected]
scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...) ID NAME SALARY class Vector[T] { … } The Vector collection in the Scala library
scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...) ID NAME SALARY Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...) ID NAME SALARY Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } Traversal requires dereferencing a pointer for each employee.
scala-ildl.org A Better Representation A Better Representation NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
scala-ildl.org A Better Representation A Better Representation ● Individually, Vector[T] and Employee can't be optimized ● Together, Vector[Employee] can be optimized NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
scala-ildl.org A Better Representation A Better Representation ● Individually, Vector[T] and Employee can't be optimized ● Together, Vector[Employee] can be optimized NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster Current challenge: No means of communicating this to the compiler
scala-ildl.org Data-centric Optimzations Data-centric Optimzations ● Optimization rules – written in the host language – entry point: data (targeted via types) – changes: data representation and operations
scala-ildl.org Data-centric Optimzations Data-centric Optimzations ● Optimization rules – written in the host language – entry point: data (targeted via types) – changes: data representation and operations object VectorOfEmployeeSoA extends Transformation { type Target = Vector[Employee] type Result = VectorOfEmployee // conversions, operations, ... }
scala-ildl.org Transformation Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort
scala-ildl.org Transformation Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone
scala-ildl.org Transformation Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated)
scala-ildl.org Diversity Diversity NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY VectorOfEmployeeJSON { id: 123, name: “John Doe” salary: 100 }
scala-ildl.org Diversity Diversity NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY VectorOfEmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector
scala-ildl.org Challenges of Scopes Challenges of Scopes ● Separate compilation – Storing transformation metadata ● Overriding and the object model – Different signatures may not override ● Passing values between scopes (composition) – Redundant conversions – Safety
scala-ildl.org Challenges of Scopes Challenges of Scopes ● Separate compilation – Storing transformation metadata ● Overriding and the object model – Different signatures may not override ● Passing values between scopes (composition) – Redundant conversions – Safety ● Addressed in the compiler :)
scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
scala-ildl.org Specialization Specialization 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects + stack allocation
scala-ildl.org + stack allocation Specialization Specialization 14x faster reduced memory footprint 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects
scala-ildl.org Conclusion Conclusion ● Problem: optimized representations ● Solution: data-centric meta-programming – Splitting the responsibility: ● Defining the Transformation programmer → ● Applying the Transformation compiler → – Scopes ● Adapt the data representation to the operation ● Allow speculating properties of the scope
scala-ildl.org Multi-Stage Programming Multi-Stage Programming ● Multi-Stage Programming – “Abstraction without regret” - Tiark Rompf – DSLs small enough to be staged → ● 10000x speed improvements – Scala too large to obtain any benefit → ● Separate compilation/modularization ● Dynamic dispatch ● Aliasing ● Reflection not supported by staging. If we add support, we lose the ability to optimize
scala-ildl.org Low-level Optimizers Low-level Optimizers ● JIT optimizers with virtual machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path ● Limited profiles ● Limited inlining ● Limited analysis – Biggest opportunities are high-level - O(n2) O(n) → ● Incoming code is low-level ● Rarely possible to recover them
scala-ildl.org Low-level Optimizers Low-level Optimizers ● JIT optimizers with virtual machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path ● Limited profiles ● Limited inlining ● Limited analysis – Biggest opportunities are high-level - O(n2) O(n) → ● Incoming code is low-level ● Rarely possible to recover them Typical solution: Metaprogramming
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power def optimize(tree: Tree): Tree = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility def optimize(tree: Tree): Tree = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility ● Compiler invariants ● Object-oriented model ● Modularity def optimize(tree: Tree): Tree = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility ● Compiler invariants ● Object-oriented model ● Modularity def optimize(tree: Tree): Tree = { ... } Can we make metaprogramming “high-level”?