Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-centric Metaprogramming @ EcoCloud 2015

Data-centric Metaprogramming @ EcoCloud 2015

Data-centric metaprogramming at the annual EcoCloud gathering 2015, Lausanne, Switzerland Website: http://www.ecocloud.ch/

Project website: http://scala-ildl.org

Vlad Ureche

June 23, 2015
Tweet

More Decks by Vlad Ureche

Other Decks in Programming

Transcript

  1. scala-ildl.org
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  2. scala-ildl.org
    Vlad URECHE
    PhD student in the
    Scala Team @ EPFL
    Working on program transformations
    in the Scala programming language,
    focusing on data representation.
    @
    @VladUreche
    @VladUreche
    [email protected]

    View full-size slide

  3. scala-ildl.org
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  4. scala-ildl.org

    written by the
    programmer in the
    host language
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  5. scala-ildl.org

    targering the data:
    representation and
    operations
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  6. scala-ildl.org

    performance

    latency

    memory footprint

    energy footprint
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  7. scala-ildl.org
    across general-purpose libraries
    Language and Compiler Support for
    Custom Data-centric Optimizations

    View full-size slide

  8. scala-ildl.org
    Motivation
    Data-centric Optimizations
    Conclusion
    Applications

    View full-size slide

  9. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    class Vector[T] { … }

    View full-size slide

  10. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    class Vector[T] { … } The Vector collection
    in the Scala library

    View full-size slide

  11. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    case class Employee(...)
    ID NAME SALARY
    class Vector[T] { … } The Vector collection
    in the Scala library

    View full-size slide

  12. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    case class Employee(...)
    ID NAME SALARY
    Auto-generated,
    corresponds to a
    table row
    class Vector[T] { … } The Vector collection
    in the Scala library

    View full-size slide

  13. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    case class Employee(...)
    ID NAME SALARY
    Auto-generated,
    corresponds to a
    table row
    class Vector[T] { … } The Vector collection
    in the Scala library

    View full-size slide

  14. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    class Vector[T] { … }
    case class Employee(...)
    ID NAME SALARY

    View full-size slide

  15. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    class Vector[T] { … }
    case class Employee(...)
    ID NAME SALARY

    View full-size slide

  16. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    case class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  17. scala-ildl.org
    Data Representation Challenge
    Data Representation Challenge
    case class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    Traversal requires
    dereferencing a pointer
    for each employee.

    View full-size slide

  18. scala-ildl.org
    A Better Representation
    A Better Representation
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  19. scala-ildl.org
    A Better Representation
    A Better Representation
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    5x faster

    View full-size slide

  20. scala-ildl.org
    A Better Representation
    A Better Representation

    Individually, Vector[T] and Employee can't be optimized

    Together, Vector[Employee] can be optimized
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    5x faster

    View full-size slide

  21. scala-ildl.org
    A Better Representation
    A Better Representation

    Individually, Vector[T] and Employee can't be optimized

    Together, Vector[Employee] can be optimized
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    5x faster
    Current challenge: No means of
    communicating this to the compiler

    View full-size slide

  22. scala-ildl.org

    View full-size slide

  23. scala-ildl.org

    Transforming the code by hand
    – Loses high-level intent
    – Changes ripple outside

    View full-size slide

  24. scala-ildl.org

    Transforming the code by hand
    – Loses high-level intent
    – Changes ripple outside
    Can we
    automate this?

    View full-size slide

  25. scala-ildl.org
    Motivation
    Data-centric Optimizations
    Conclusion
    Applications

    View full-size slide

  26. scala-ildl.org
    Data-centric Optimzations
    Data-centric Optimzations

    Optimization rules
    – written in the host language
    – entry point: data (targeted via types)
    – changes: data representation and operations

    View full-size slide

  27. scala-ildl.org
    Data-centric Optimzations
    Data-centric Optimzations

    Optimization rules
    – written in the host language
    – entry point: data (targeted via types)
    – changes: data representation and operations
    object VectorOfEmployeeSoA extends Transformation {
    type Target = Vector[Employee]
    type Result = VectorOfEmployee
    // conversions, operations, ...
    }

    View full-size slide

  28. scala-ildl.org
    Why Does it Matter?
    Why Does it Matter?

    View full-size slide

  29. scala-ildl.org
    Transformation
    Transformation
    Definition Application

    View full-size slide

  30. scala-ildl.org
    Transformation
    Transformation
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    View full-size slide

  31. scala-ildl.org
    Transformation
    Transformation
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and simple

    affects code readability

    is verbose

    is error-prone

    View full-size slide

  32. scala-ildl.org
    Transformation
    Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and simple

    affects code readability

    is verbose

    is error-prone
    compiler
    (automated)

    View full-size slide

  33. scala-ildl.org

    View full-size slide

  34. scala-ildl.org
    Is that all?
    Is that all?

    View full-size slide

  35. scala-ildl.org
    Diversity
    Diversity
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  36. scala-ildl.org
    Diversity
    Diversity
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  37. scala-ildl.org
    Diversity
    Diversity
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    VectorOfEmployeeJSON
    {
    id: 123,
    name: “John Doe”
    salary: 100
    }

    View full-size slide

  38. scala-ildl.org
    Diversity
    Diversity
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    VectorOfEmployeeJSON
    {
    id: 123,
    name: “John Doe”
    salary: 100
    }
    CompactVector

    View full-size slide

  39. scala-ildl.org
    Scopes
    Scopes
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )

    View full-size slide

  40. scala-ildl.org
    Scopes
    Scopes
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    Method operating on
    Scala collections
    (familiar to programmers)

    View full-size slide

  41. scala-ildl.org
    Scopes
    Scopes
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )

    View full-size slide

  42. scala-ildl.org
    Scopes
    Scopes
    adrt(VectorOfEmployeeSoA) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }

    View full-size slide

  43. scala-ildl.org
    Scopes
    Scopes
    adrt(VectorOfEmployeeSoA) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Method operating on
    column-based storage

    View full-size slide

  44. scala-ildl.org
    Scopes
    Scopes
    adrt(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Method operating on
    JSON data

    View full-size slide

  45. scala-ildl.org
    Scopes
    Scopes
    adrt(VectorOfEmployeeBinary) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Method operating on
    binary data

    View full-size slide

  46. scala-ildl.org
    Challenges of Scopes
    Challenges of Scopes

    Separate compilation
    – Storing transformation metadata

    Overriding and the object model
    – Different signatures may not override

    Passing values between scopes (composition)
    – Redundant conversions
    – Safety

    View full-size slide

  47. scala-ildl.org
    Challenges of Scopes
    Challenges of Scopes

    Separate compilation
    – Storing transformation metadata

    Overriding and the object model
    – Different signatures may not override

    Passing values between scopes (composition)
    – Redundant conversions
    – Safety

    Addressed in the compiler :)

    View full-size slide

  48. scala-ildl.org

    View full-size slide

  49. scala-ildl.org
    Motivation
    Data-centric Optimizations
    Conclusion
    Applications

    View full-size slide

  50. scala-ildl.org
    Array of Stuct
    Array of Stuct (Column-oriented)
    (Column-oriented)
    NAME ...
    NAME
    VectorOfEmployee
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    5x faster

    View full-size slide

  51. scala-ildl.org
    Specialization
    Specialization
    3 5
    (3,5)
    Tuples in Scala
    are generic so
    they need to use
    pointers and objects

    View full-size slide

  52. scala-ildl.org
    Specialization
    Specialization
    3 5
    3 5
    (3,5) (3,5)
    Tuples in Scala
    are generic so
    they need to use
    pointers and objects
    + stack
    allocation

    View full-size slide

  53. scala-ildl.org
    + stack
    allocation
    Specialization
    Specialization
    14x faster
    reduced memory footprint
    3 5
    3 5
    (3,5) (3,5)
    Tuples in Scala
    are generic so
    they need to use
    pointers and objects

    View full-size slide

  54. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum

    View full-size slide

  55. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4)

    View full-size slide

  56. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8)

    View full-size slide

  57. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18

    View full-size slide

  58. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18

    View full-size slide

  59. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    adrt(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }

    View full-size slide

  60. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    adrt(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function

    View full-size slide

  61. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    adrt(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function

    View full-size slide

  62. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    adrt(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function
    compute:
    18

    View full-size slide

  63. scala-ildl.org
    Deforestation
    Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    adrt(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function
    compute:
    18
    6x faster

    View full-size slide

  64. scala-ildl.org
    Motivation
    Data-centric Optimizations
    Conclusion
    Applications

    View full-size slide

  65. scala-ildl.org
    Conclusion
    Conclusion

    Problem: optimized representations

    Solution: data-centric meta-programming
    – Splitting the responsibility:

    Defining the Transformation programmer


    Applying the Transformation compiler

    – Scopes

    Adapt the data representation to the operation

    Allow speculating properties of the scope

    View full-size slide

  66. scala-ildl.org

    View full-size slide

  67. scala-ildl.org

    View full-size slide

  68. scala-ildl.org

    View full-size slide

  69. scala-ildl.org
    Thank you!

    View full-size slide

  70. scala-ildl.org
    Multi-Stage Programming
    Multi-Stage Programming

    Multi-Stage Programming
    – “Abstraction without regret” - Tiark Rompf
    – DSLs small enough to be staged


    10000x speed improvements

    View full-size slide

  71. scala-ildl.org
    Multi-Stage Programming
    Multi-Stage Programming

    Multi-Stage Programming
    – “Abstraction without regret” - Tiark Rompf
    – DSLs small enough to be staged


    10000x speed improvements
    – Scala too large to obtain any benefit


    Separate compilation/modularization

    Dynamic dispatch

    Aliasing

    Reflection

    View full-size slide

  72. scala-ildl.org
    Multi-Stage Programming
    Multi-Stage Programming

    Multi-Stage Programming
    – “Abstraction without regret” - Tiark Rompf
    – DSLs small enough to be staged


    10000x speed improvements
    – Scala too large to obtain any benefit


    Separate compilation/modularization

    Dynamic dispatch

    Aliasing

    Reflection
    not supported by
    staging. If we add
    support, we lose the
    ability to optimize

    View full-size slide

  73. scala-ildl.org
    Low-level Optimizers
    Low-level Optimizers

    JIT optimizers with virtual machine support
    – Access to the low-level code
    – Can assume a (local) closed world
    – Can speculate based on profiles
    – On the critical path

    Limited profiles

    Limited inlining

    Limited analysis
    – Biggest opportunities are high-level - O(n2) O(n)


    Incoming code is low-level

    Rarely possible to recover them

    View full-size slide

  74. scala-ildl.org
    Low-level Optimizers
    Low-level Optimizers

    JIT optimizers with virtual machine support
    – Access to the low-level code
    – Can assume a (local) closed world
    – Can speculate based on profiles
    – On the critical path

    Limited profiles

    Limited inlining

    Limited analysis
    – Biggest opportunities are high-level - O(n2) O(n)


    Incoming code is low-level

    Rarely possible to recover them
    Typical solution:
    Metaprogramming

    View full-size slide

  75. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor

    View full-size slide

  76. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor
    def optimize(tree: Tree): Tree = {
    ...
    }

    View full-size slide

  77. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor

    Full-fledged program transformers
    – :) Lots of power
    def optimize(tree: Tree): Tree = {
    ...
    }

    View full-size slide

  78. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor

    Full-fledged program transformers
    – :) Lots of power
    – :( Lots of responsibility
    def optimize(tree: Tree): Tree = {
    ...
    }

    View full-size slide

  79. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor

    Full-fledged program transformers
    – :) Lots of power
    – :( Lots of responsibility

    Compiler invariants

    Object-oriented model

    Modularity
    def optimize(tree: Tree): Tree = {
    ...
    }

    View full-size slide

  80. scala-ildl.org
    Metaprogramming
    Metaprogramming

    Not your grandpa's C preprocessor

    Full-fledged program transformers
    – :) Lots of power
    – :( Lots of responsibility

    Compiler invariants

    Object-oriented model

    Modularity
    def optimize(tree: Tree): Tree = {
    ...
    }
    Can we make
    metaprogramming
    “high-level”?

    View full-size slide