scala-ildl.org Vlad URECHE PhD student in the Scala Team @ EPFL Working on program transformations focusing on data representation. Miniboxing guy. Scala compiler geek. @ @VladUreche @VladUreche [email protected]
scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY class Vector[T] { … } The Vector collection in the Scala library
scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } Traversal requires dereferencing a pointer for each employee.
scala-ildl.org A Better Representation A Better Representation NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
scala-ildl.org A Better Representation A Better Representation NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY iteration is 5x faster
scala-ildl.org A Better Representation A Better Representation NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY iteration is 5x faster C++ would produce a better representation here, but there are still cases where the C++ representation could be improved over.
scala-ildl.org A Better Representation A Better Representation ● In isolation Vector[T] and Employee are optimal ● Together, Vector[Employee] can be optimized
scala-ildl.org A Better Representation A Better Representation ● In isolation Vector[T] and Employee are optimal ● Together, Vector[Employee] can be optimized Challenge: No means of communicating this to the compiler
scala-ildl.org A Better Representation A Better Representation ● In isolation Vector[T] and Employee are optimal ● Together, Vector[Employee] can be optimized Challenge: No means of communicating this to the compiler You may disagree. We'll have a related work section later.
scala-ildl.org Transformation Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort
scala-ildl.org Transformation Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort programmer
scala-ildl.org Transformation Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone programmer
scala-ildl.org Transformation Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated)
scala-ildl.org Transformation Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated)
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { ... } An object that describes a Transformation. A marker trait for transformations.
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { ... } What does the compiler need to know?
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { ... } What does the compiler need to know? The target of the transformation and its representation.
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector ... } The transformation is type-driven we indicate → the type of the target and of the representation.
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector ... } The transformation is type-driven we indicate → the type of the target and of the representation. The improved representation is defined in the host language.
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector ... } How to transform Vector[Employee] into an EmployeeVector?
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... ... } So far so good, but how to execute Vector[Employee] operations on EmployeeVector?
scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated)
scala-ildl.org Transformation Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated)
scala-ildl.org Transformation Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and simple ● affects code readability ● is verbose ● is error-prone compiler (automated) In the paper
scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY
scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT
scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT Oooops...
scala-ildl.org Open World Assumption Open World Assumption ● Globally anything can happen ● Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values
scala-ildl.org Open World Assumption Open World Assumption ● Globally anything can happen ● Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values How to use this information?
scala-ildl.org Open World Assumption Open World Assumption ● Globally anything can happen ● Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values How to use this information? Scopes
scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Now the method operates on the EmployeeVector representation. Programmers can freely choose which parts of their code to transform.
scala-ildl.org Scopes Scopes ● Can wrap statements, methods even entire classes – Inlined immediately after the parser – Definitions are visible outside the "scope"
scala-ildl.org Scopes Scopes ● Can wrap statements, methods even entire classes – Inlined immediately after the parser – Definitions are visible outside the "scope" ● Locally closed world – Incoming/outgoing values go through conversions – Programmer can reject unexpected values
scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY CompactVector
scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY EmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector
scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY EmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector
scala-ildl.org Scope Composition Scope Composition ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations Coercions
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations Coercions
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations No coercions
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations No coercions Even across separate compilation
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations Two coercions Repr1 Target Repr2 → →
scala-ildl.org Scope Composition Scope Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling overriding ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations
scala-ildl.org Scope Composition Scope Composition calling overriding ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations The transformation has to preserve the object model.
scala-ildl.org Scope Composition Scope Composition calling overriding ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation ● Code can be – Left untransformed (using the original repr.) – Transformed using different representations The transformation has to preserve the object model. Handled automatically by the ildl transformation!
scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
scala-ildl.org Specialization Specialization and stack allocation and stack allocation 3 5 (3,5) Tuples in Scala are generic so they need to use pointers and objects
scala-ildl.org Specialization Specialization and stack allocation and stack allocation 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects + stack allocation
scala-ildl.org + stack allocation Specialization Specialization and stack allocation and stack allocation 14x faster reduced memory footprint 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects
scala-ildl.org Multi-Stage Programming Multi-Stage Programming ● Multi-Stage Programming – “Abstraction without regret” - Tiark Rompf – DSLs small enough to be staged → ● 10000x speed improvements – Scala many features not supported by LMS: → ● Separate compilation/modularization ● Dynamic dispatch ● Aliasing ● Reflection If we add support, we lose the ability to optimize code :(
scala-ildl.org Low-level Optimizers Low-level Optimizers ● JIT optimizers with virtual machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis
scala-ildl.org Low-level Optimizers Low-level Optimizers ● JIT optimizers with virtual machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis – Biggest opportunities are high-level - O(n2) O(n) → ● Incoming code is low-level ● Rarely possible to recover opportunities
scala-ildl.org Low-level Optimizers Low-level Optimizers ● JIT optimizers with virtual machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis – Biggest opportunities are high-level - O(n2) O(n) → ● Incoming code is low-level ● Rarely possible to recover opportunities Typical solution: Metaprogramming
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power def optimize(tree: AST): AST = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility def optimize(tree: AST): AST = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility ● Compiler invariants ● Object-oriented model ● Modularity def optimize(tree: AST): AST = { ... }
scala-ildl.org Metaprogramming Metaprogramming ● Not your grandpa's C preprocessor ● Full-fledged program transformers – :) Lots of power – :( Lots of responsibility ● Compiler invariants ● Object-oriented model ● Modularity def optimize(tree: AST): AST = { ... } Can we make metaprogramming “high-level”?
scala-ildl.org Conclusion Conclusion ● Object-oriented composition inefcient representation → ● Solution: data-centric metaprogramming – Splitting the responsibility: ● Defining the Transformation programmer → ● Applying the Transformation compiler → – Scopes ● Adapt the data representation to the operation ● Allow speculating properties of the scope ● We've just begun to scratch the surface – Many interesting research questions lie ahead
scala-ildl.org Conclusion Conclusion ● Object-oriented composition inefcient representation → ● Solution: data-centric metaprogramming – Splitting the responsibility: ● Defining the Transformation programmer → ● Applying the Transformation compiler → – Scopes ● Adapt the data representation to the operation ● Allow speculating properties of the scope ● We've just begun to scratch the surface – Many interesting research questions lie ahead
scala-ildl.org Conclusion Conclusion ● Object-oriented composition inefcient representation → ● Solution: data-centric metaprogramming – Splitting the responsibility: ● Defining the Transformation programmer → ● Applying the Transformation compiler → – Scopes ● Adapt the data representation to the operation ● Allow speculating properties of the scope ● We've just begun to scratch the surface – Many interesting research questions lie ahead
scala-ildl.org Conclusion Conclusion ● Object-oriented composition inefcient representation → ● Solution: data-centric metaprogramming – Splitting the responsibility: ● Defining the Transformation programmer → ● Applying the Transformation compiler → – Scopes ● Adapt the data representation to the operation ● Allow speculating properties of the scope ● We've just begun to scratch the surface – Many interesting research questions lie ahead