Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-centric Metaprogramming @ VMM 2015

Vlad Ureche
September 11, 2015

Data-centric Metaprogramming @ VMM 2015

The data-centric metaprogramming presentation at the VMM '15 (Virtual Machine Meetup), Zurich, Switzerland. Website: http://vmmeetup.github.io/2015/

Project website: http://scala-ildl.org

Vlad Ureche

September 11, 2015
Tweet

More Decks by Vlad Ureche

Other Decks in Programming

Transcript

  1. scala-ildl.org Vlad URECHE PhD student in the Scala Team @

    EPFL Working on program transformations focusing on data representation. Miniboxing guy. Scala compiler geek. @ @VladUreche @VladUreche [email protected]
  2. scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY

    class Vector[T] { … } The Vector collection in the Scala library
  3. scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY

    Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
  4. scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY

    Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
  5. scala-ildl.org Object Composition Object Composition class Employee(...) ID NAME SALARY

    Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } Traversal requires dereferencing a pointer for each employee.
  6. scala-ildl.org A Better Representation A Better Representation NAME ... NAME

    EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  7. scala-ildl.org A Better Representation A Better Representation NAME ... NAME

    EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY iteration is 5x faster
  8. scala-ildl.org A Better Representation A Better Representation NAME ... NAME

    EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY iteration is 5x faster C++ would produce a better representation here, but there are still cases where the C++ representation could be improved over.
  9. scala-ildl.org A Better Representation A Better Representation • In isolation

    Vector[T] and Employee are optimal • Together, Vector[Employee] can be optimized
  10. scala-ildl.org A Better Representation A Better Representation • In isolation

    Vector[T] and Employee are optimal • Together, Vector[Employee] can be optimized Challenge: No means of communicating this to the compiler
  11. scala-ildl.org A Better Representation A Better Representation • In isolation

    Vector[T] and Employee are optimal • Together, Vector[Employee] can be optimized Challenge: No means of communicating this to the compiler You may disagree. We'll have a related work section later.
  12. scala-ildl.org • Transforming the code by hand – Makes maintenance

    difficult – Changes are not contained Can we automate this?
  13. scala-ildl.org Transformation Transformation Definition Application • can't be automated •

    based on experience • based on speculation • one-time effort
  14. scala-ildl.org Transformation Transformation Definition Application • can't be automated •

    based on experience • based on speculation • one-time effort programmer
  15. scala-ildl.org Transformation Transformation Definition Application • can't be automated •

    based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone programmer
  16. scala-ildl.org Transformation Transformation programmer Definition Application • can't be automated

    • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated)
  17. scala-ildl.org Transformation Transformation programmer Definition Application • can't be automated

    • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated)
  18. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    ... } An object that describes a Transformation. A marker trait for transformations.
  19. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    ... } What does the compiler need to know? The target of the transformation and its representation.
  20. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector ... } The transformation is type-driven we indicate → the type of the target and of the representation.
  21. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector ... } The transformation is type-driven we indicate → the type of the target and of the representation. The improved representation is defined in the host language.
  22. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector ... } How to transform Vector[Employee] into an EmployeeVector?
  23. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... ... }
  24. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... ... } Conversions to/from Vector[Employee] that consume/produce a EmployeeVector?
  25. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... ... }
  26. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... ... } So far so good, but how to execute Vector[Employee] operations on EmployeeVector?
  27. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming object VectorOfEmployeeOpt extends Transformation {

    type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... def bypass_length: Int = ... def bypass_apply(i: Int): Employee = ... def bypass_update(i: Int, v: Employee) = ... def bypass_toString: String = ... ... }
  28. scala-ildl.org Data-centric Metaprogramming Data-centric Metaprogramming programmer Definition Application • can't

    be automated • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated)
  29. scala-ildl.org Transformation Transformation programmer Definition Application • can't be automated

    • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated)
  30. scala-ildl.org Transformation Transformation programmer Definition Application • can't be automated

    • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated) In the paper
  31. scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID

    NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
  32. scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID

    NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY
  33. scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID

    NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT
  34. scala-ildl.org Open World Assumption Open World Assumption class Employee(...) ID

    NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ... NAME EmployeeVector ID ID ... ... SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT Oooops...
  35. scala-ildl.org Open World Assumption Open World Assumption • Globally anything

    can happen • Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values
  36. scala-ildl.org Open World Assumption Open World Assumption • Globally anything

    can happen • Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values How to use this information?
  37. scala-ildl.org Open World Assumption Open World Assumption • Globally anything

    can happen • Locally the programmer has full control: – Knows the values that will be used – Can reject non-conforming values How to use this information? Scopes
  38. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary )
  39. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) Method operating on Scala collections (familiar to programmers)
  40. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary )
  41. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  42. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Now the method operates on the EmployeeVector representation.
  43. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Now the method operates on the EmployeeVector representation. Programmers can freely choose which parts of their code to transform.
  44. scala-ildl.org Scopes Scopes • Can wrap statements, methods even entire

    classes – Inlined immediately after the parser – Definitions are visible outside the "scope"
  45. scala-ildl.org Scopes Scopes • Can wrap statements, methods even entire

    classes – Inlined immediately after the parser – Definitions are visible outside the "scope" • Locally closed world – Incoming/outgoing values go through conversions – Programmer can reject unexpected values
  46. scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID

    ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  47. scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID

    ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY CompactVector <compressed binary blob>
  48. scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID

    ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY EmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector <compressed binary blob>
  49. scala-ildl.org Best ...? Best ...? NAME ... NAME EmployeeVector ID

    ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY EmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector <compressed binary blob> Scopes!
  50. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  51. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on column-based storage
  52. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeBinary) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on binary data
  53. scala-ildl.org Scopes Scopes transform(VectorOfEmployeeJSON) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on JSON data
  54. scala-ildl.org Scope Composition Scope Composition • Code can be –

    Left untransformed (using the original repr.) – Transformed using different representations
  55. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  56. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  57. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations Coercions
  58. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations Coercions
  59. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  60. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  61. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations No coercions
  62. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations No coercions Even across separate compilation
  63. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  64. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  65. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations Two coercions Repr1 Target Repr2 → →
  66. scala-ildl.org Scope Composition Scope Composition calling • Original code •

    Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  67. scala-ildl.org Scope Composition Scope Composition calling overriding • Original code

    • Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations
  68. scala-ildl.org Scope Composition Scope Composition calling overriding • Original code

    • Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations The transformation has to preserve the object model.
  69. scala-ildl.org Scope Composition Scope Composition calling overriding • Original code

    • Transformed code • Original code • Transformed code • Same transformation • Different transformation • Code can be – Left untransformed (using the original repr.) – Transformed using different representations The transformation has to preserve the object model. Handled automatically by the ildl transformation!
  70. scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME

    ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  71. scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME

    ... NAME EmployeeVector ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
  72. scala-ildl.org Specialization Specialization and stack allocation and stack allocation 3

    5 (3,5) Tuples in Scala are generic so they need to use pointers and objects
  73. scala-ildl.org Specialization Specialization and stack allocation and stack allocation 3

    5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects + stack allocation
  74. scala-ildl.org + stack allocation Specialization Specialization and stack allocation and

    stack allocation 14x faster reduced memory footprint 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects
  75. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum }
  76. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function
  77. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function
  78. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18
  79. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18 6x faster
  80. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements
  81. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements – Scala many features not supported by LMS: → • Separate compilation/modularization • Dynamic dispatch • Aliasing • Reflection
  82. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements – Scala many features not supported by LMS: → • Separate compilation/modularization • Dynamic dispatch • Aliasing • Reflection If we add support, we lose the ability to optimize code :(
  83. scala-ildl.org Low-level Optimizers Low-level Optimizers • JIT optimizers with virtual

    machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis
  84. scala-ildl.org Low-level Optimizers Low-level Optimizers • JIT optimizers with virtual

    machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis – Biggest opportunities are high-level - O(n2) O(n) → • Incoming code is low-level • Rarely possible to recover opportunities
  85. scala-ildl.org Low-level Optimizers Low-level Optimizers • JIT optimizers with virtual

    machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path – limited analysis – Biggest opportunities are high-level - O(n2) O(n) → • Incoming code is low-level • Rarely possible to recover opportunities Typical solution: Metaprogramming
  86. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power def optimize(tree: AST): AST = { ... }
  87. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility def optimize(tree: AST): AST = { ... }
  88. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility • Compiler invariants • Object-oriented model • Modularity def optimize(tree: AST): AST = { ... }
  89. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility • Compiler invariants • Object-oriented model • Modularity def optimize(tree: AST): AST = { ... } Can we make metaprogramming “high-level”?
  90. scala-ildl.org Conclusion Conclusion • Object-oriented composition inefcient representation → •

    Solution: data-centric metaprogramming – Splitting the responsibility: • Defining the Transformation programmer → • Applying the Transformation compiler → – Scopes • Adapt the data representation to the operation • Allow speculating properties of the scope • We've just begun to scratch the surface – Many interesting research questions lie ahead
  91. scala-ildl.org Conclusion Conclusion • Object-oriented composition inefcient representation → •

    Solution: data-centric metaprogramming – Splitting the responsibility: • Defining the Transformation programmer → • Applying the Transformation compiler → – Scopes • Adapt the data representation to the operation • Allow speculating properties of the scope • We've just begun to scratch the surface – Many interesting research questions lie ahead
  92. scala-ildl.org Conclusion Conclusion • Object-oriented composition inefcient representation → •

    Solution: data-centric metaprogramming – Splitting the responsibility: • Defining the Transformation programmer → • Applying the Transformation compiler → – Scopes • Adapt the data representation to the operation • Allow speculating properties of the scope • We've just begun to scratch the surface – Many interesting research questions lie ahead
  93. scala-ildl.org Conclusion Conclusion • Object-oriented composition inefcient representation → •

    Solution: data-centric metaprogramming – Splitting the responsibility: • Defining the Transformation programmer → • Applying the Transformation compiler → – Scopes • Adapt the data representation to the operation • Allow speculating properties of the scope • We've just begun to scratch the surface – Many interesting research questions lie ahead