Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-centric Metaprogramming @ EcoCloud 2015

Data-centric Metaprogramming @ EcoCloud 2015

Data-centric metaprogramming at the annual EcoCloud gathering 2015, Lausanne, Switzerland Website: http://www.ecocloud.ch/

Project website: http://scala-ildl.org

456d1d6154efe50e950b65f966f63a50?s=128

Vlad Ureche

June 23, 2015
Tweet

Transcript

  1. scala-ildl.org Language and Compiler Support for Custom Data-centric Optimizations

  2. scala-ildl.org Vlad URECHE PhD student in the Scala Team @

    EPFL Working on program transformations in the Scala programming language, focusing on data representation. @ @VladUreche @VladUreche vlad.ureche@epfl.ch
  3. scala-ildl.org Language and Compiler Support for Custom Data-centric Optimizations

  4. scala-ildl.org • written by the programmer in the host language

    Language and Compiler Support for Custom Data-centric Optimizations
  5. scala-ildl.org • targering the data: representation and operations Language and

    Compiler Support for Custom Data-centric Optimizations
  6. scala-ildl.org • performance • latency • memory footprint • energy

    footprint Language and Compiler Support for Custom Data-centric Optimizations
  7. scala-ildl.org across general-purpose libraries Language and Compiler Support for Custom

    Data-centric Optimizations
  8. scala-ildl.org Motivation Data-centric Optimizations Conclusion Applications

  9. scala-ildl.org Data Representation Challenge Data Representation Challenge class Vector[T] {

    … }
  10. scala-ildl.org Data Representation Challenge Data Representation Challenge class Vector[T] {

    … } The Vector collection in the Scala library
  11. scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...)

    ID NAME SALARY class Vector[T] { … } The Vector collection in the Scala library
  12. scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...)

    ID NAME SALARY Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
  13. scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...)

    ID NAME SALARY Auto-generated, corresponds to a table row class Vector[T] { … } The Vector collection in the Scala library
  14. scala-ildl.org Data Representation Challenge Data Representation Challenge class Vector[T] {

    … } case class Employee(...) ID NAME SALARY
  15. scala-ildl.org Data Representation Challenge Data Representation Challenge class Vector[T] {

    … } case class Employee(...) ID NAME SALARY
  16. scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...)

    ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
  17. scala-ildl.org Data Representation Challenge Data Representation Challenge case class Employee(...)

    ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } Traversal requires dereferencing a pointer for each employee.
  18. scala-ildl.org A Better Representation A Better Representation Vector[Employee] ID NAME

    SALARY ID NAME SALARY
  19. scala-ildl.org A Better Representation A Better Representation NAME ... NAME

    VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
  20. scala-ildl.org A Better Representation A Better Representation • Individually, Vector[T]

    and Employee can't be optimized • Together, Vector[Employee] can be optimized NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
  21. scala-ildl.org A Better Representation A Better Representation • Individually, Vector[T]

    and Employee can't be optimized • Together, Vector[Employee] can be optimized NAME ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster Current challenge: No means of communicating this to the compiler
  22. scala-ildl.org

  23. scala-ildl.org • Transforming the code by hand – Loses high-level

    intent – Changes ripple outside
  24. scala-ildl.org • Transforming the code by hand – Loses high-level

    intent – Changes ripple outside Can we automate this?
  25. scala-ildl.org Motivation Data-centric Optimizations Conclusion Applications

  26. scala-ildl.org Data-centric Optimzations Data-centric Optimzations • Optimization rules – written

    in the host language – entry point: data (targeted via types) – changes: data representation and operations
  27. scala-ildl.org Data-centric Optimzations Data-centric Optimzations • Optimization rules – written

    in the host language – entry point: data (targeted via types) – changes: data representation and operations object VectorOfEmployeeSoA extends Transformation { type Target = Vector[Employee] type Result = VectorOfEmployee // conversions, operations, ... }
  28. scala-ildl.org Why Does it Matter? Why Does it Matter?

  29. scala-ildl.org Transformation Transformation Definition Application

  30. scala-ildl.org Transformation Transformation Definition Application • can't be automated •

    based on experience • based on speculation • one-time effort
  31. scala-ildl.org Transformation Transformation Definition Application • can't be automated •

    based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone
  32. scala-ildl.org Transformation Transformation programmer Definition Application • can't be automated

    • based on experience • based on speculation • one-time effort • repetitive and simple • affects code readability • is verbose • is error-prone compiler (automated)
  33. scala-ildl.org

  34. scala-ildl.org Is that all? Is that all?

  35. scala-ildl.org Diversity Diversity Vector[Employee] ID NAME SALARY ID NAME SALARY

  36. scala-ildl.org Diversity Diversity NAME ... NAME VectorOfEmployee ID ID ...

    ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  37. scala-ildl.org Diversity Diversity NAME ... NAME VectorOfEmployee ID ID ...

    ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY VectorOfEmployeeJSON { id: 123, name: “John Doe” salary: 100 }
  38. scala-ildl.org Diversity Diversity NAME ... NAME VectorOfEmployee ID ID ...

    ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY VectorOfEmployeeJSON { id: 123, name: “John Doe” salary: 100 } CompactVector <compressed binary blob>
  39. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary )
  40. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) Method operating on Scala collections (familiar to programmers)
  41. scala-ildl.org Scopes Scopes def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] =

    for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary )
  42. scala-ildl.org Scopes Scopes adrt(VectorOfEmployeeSoA) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  43. scala-ildl.org Scopes Scopes adrt(VectorOfEmployeeSoA) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on column-based storage
  44. scala-ildl.org Scopes Scopes adrt(VectorOfEmployeeJSON) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on JSON data
  45. scala-ildl.org Scopes Scopes adrt(VectorOfEmployeeBinary) { def indexSalary(employees: Vector[Employee], by: Float):

    Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Method operating on binary data
  46. scala-ildl.org Challenges of Scopes Challenges of Scopes • Separate compilation

    – Storing transformation metadata • Overriding and the object model – Different signatures may not override • Passing values between scopes (composition) – Redundant conversions – Safety
  47. scala-ildl.org Challenges of Scopes Challenges of Scopes • Separate compilation

    – Storing transformation metadata • Overriding and the object model – Different signatures may not override • Passing values between scopes (composition) – Redundant conversions – Safety • Addressed in the compiler :)
  48. scala-ildl.org

  49. scala-ildl.org Motivation Data-centric Optimizations Conclusion Applications

  50. scala-ildl.org Array of Stuct Array of Stuct (Column-oriented) (Column-oriented) NAME

    ... NAME VectorOfEmployee ID ID ... ... SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY 5x faster
  51. scala-ildl.org Specialization Specialization 3 5 (3,5) Tuples in Scala are

    generic so they need to use pointers and objects
  52. scala-ildl.org Specialization Specialization 3 5 3 5 (3,5) (3,5) Tuples

    in Scala are generic so they need to use pointers and objects + stack allocation
  53. scala-ildl.org + stack allocation Specialization Specialization 14x faster reduced memory

    footprint 3 5 3 5 (3,5) (3,5) Tuples in Scala are generic so they need to use pointers and objects
  54. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum

  55. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4)

  56. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

  57. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18
  58. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18
  59. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 adrt(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum }
  60. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 adrt(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function
  61. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 adrt(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function
  62. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 adrt(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18
  63. scala-ildl.org Deforestation Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)

    18 adrt(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18 6x faster
  64. scala-ildl.org Motivation Data-centric Optimizations Conclusion Applications

  65. scala-ildl.org Conclusion Conclusion • Problem: optimized representations • Solution: data-centric

    meta-programming – Splitting the responsibility: • Defining the Transformation programmer → • Applying the Transformation compiler → – Scopes • Adapt the data representation to the operation • Allow speculating properties of the scope
  66. scala-ildl.org

  67. scala-ildl.org

  68. scala-ildl.org

  69. scala-ildl.org Thank you!

  70. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements
  71. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements – Scala too large to obtain any benefit → • Separate compilation/modularization • Dynamic dispatch • Aliasing • Reflection
  72. scala-ildl.org Multi-Stage Programming Multi-Stage Programming • Multi-Stage Programming – “Abstraction

    without regret” - Tiark Rompf – DSLs small enough to be staged → • 10000x speed improvements – Scala too large to obtain any benefit → • Separate compilation/modularization • Dynamic dispatch • Aliasing • Reflection not supported by staging. If we add support, we lose the ability to optimize
  73. scala-ildl.org Low-level Optimizers Low-level Optimizers • JIT optimizers with virtual

    machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path • Limited profiles • Limited inlining • Limited analysis – Biggest opportunities are high-level - O(n2) O(n) → • Incoming code is low-level • Rarely possible to recover them
  74. scala-ildl.org Low-level Optimizers Low-level Optimizers • JIT optimizers with virtual

    machine support – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles – On the critical path • Limited profiles • Limited inlining • Limited analysis – Biggest opportunities are high-level - O(n2) O(n) → • Incoming code is low-level • Rarely possible to recover them Typical solution: Metaprogramming
  75. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor

  76. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor def

    optimize(tree: Tree): Tree = { ... }
  77. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power def optimize(tree: Tree): Tree = { ... }
  78. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility def optimize(tree: Tree): Tree = { ... }
  79. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility • Compiler invariants • Object-oriented model • Modularity def optimize(tree: Tree): Tree = { ... }
  80. scala-ildl.org Metaprogramming Metaprogramming • Not your grandpa's C preprocessor •

    Full-fledged program transformers – :) Lots of power – :( Lots of responsibility • Compiler invariants • Object-oriented model • Modularity def optimize(tree: Tree): Tree = { ... } Can we make metaprogramming “high-level”?