Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-centric Metaprogramming - Scala Days 2016

Data-centric Metaprogramming - Scala Days 2016

Presentation at Scala Days 2016

Vlad Ureche

June 16, 2016
Tweet

More Decks by Vlad Ureche

Other Decks in Programming

Transcript

  1. Data-centric
    Metaprogramming
    Vlad Ureche

    View full-size slide

  2. Vlad Ureche
    @
    @VladUreche
    @VladUreche
    [email protected]

    View full-size slide

  3. Vlad Ureche
    Software Engineer at Cyberhaven.io
    scala-miniboxing.org
    Ex-Scala Team at EPFL

    View full-size slide

  4. STOP
    Please ask if things
    are not clear!

    View full-size slide

  5. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions

    View full-size slide

  6. Object Composition

    View full-size slide

  7. Object Composition
    class Vector[T] { … }

    View full-size slide

  8. Object Composition
    class Vector[T] { … }
    The Vector collection
    in the Scala library

    View full-size slide

  9. Object Composition
    class Employee(...)
    ID NAME SALARY
    class Vector[T] { … }
    The Vector collection
    in the Scala library

    View full-size slide

  10. Object Composition
    class Employee(...)
    ID NAME SALARY
    class Vector[T] { … }
    The Vector collection
    in the Scala library
    Corresponds to
    a table row

    View full-size slide

  11. Object Composition
    class Employee(...)
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  12. Object Composition
    class Employee(...)
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  13. Object Composition
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  14. Object Composition
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    Traversal requires
    dereferencing a pointer
    for each employee.

    View full-size slide

  15. A Better Representation
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  16. A Better Representation
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  17. A Better Representation

    more efficient heap usage

    faster iteration
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  18. The Problem

    Vector[T] is unaware of Employee

    View full-size slide

  19. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    View full-size slide

  20. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected

    View full-size slide

  21. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected
    – Generics (including all collections)

    View full-size slide

  22. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected
    – Generics (including all collections) and Functions

    View full-size slide

  23. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected
    – Generics (including all collections) and Functions

    We know better representations

    View full-size slide

  24. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected
    – Generics (including all collections) and Functions

    We know better representations
    – Manual changes don't scale

    View full-size slide

  25. The Problem

    Vector[T] is unaware of Employee
    – Which makes Vector[Employee] suboptimal

    Not limited to Vector, other constructs also affected
    – Generics (including all collections) and Functions

    We know better representations
    – Manual changes don't scale
    – The compiler should do that

    View full-size slide

  26. Current Optimizers

    View full-size slide

  27. Current Optimizers
    What about the
    Scala.js optimizer?

    View full-size slide

  28. Current Optimizers
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?

    View full-size slide

  29. Current Optimizers
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  30. Current Optimizers

    They do a great job
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  31. Current Optimizers

    They do a great job
    – But have to respect semantics
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  32. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  33. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    – Have to be conservative :(
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  34. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    – Have to be conservative :(

    Programmers have control
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  35. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    – Have to be conservative :(

    Programmers have control
    – What/When/How is accessed
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  36. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    – Have to be conservative :(

    Programmers have control
    – What/When/How is accessed
    – Can break semantics (speculate)
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?

    View full-size slide

  37. Current Optimizers

    They do a great job
    – But have to respect semantics
    – Support every corner case
    – Have to be conservative :(

    Programmers have control
    – What/When/How is accessed
    – Can break semantics (speculate)
    What about the
    Scala.js optimizer?
    What about the
    Dotty Linker?
    Scala Native?
    Challenge: No means of
    telling the compiler
    what/when to speculate

    View full-size slide

  38. Choice: Safe or Fast

    View full-size slide

  39. Choice: Safe or Fast
    This is where my
    work comes in...

    View full-size slide

  40. Data-Centric Metaprogramming

    compiler plug-in that allows

    Tuning data representation

    Website: scala-ildl.org

    View full-size slide

  41. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions

    View full-size slide

  42. Transformation
    Definition Application

    View full-size slide

  43. Transformation
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    View full-size slide

  44. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    View full-size slide

  45. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and complex

    affects code
    readability

    is verbose

    is error-prone

    View full-size slide

  46. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and complex

    affects code
    readability

    is verbose

    is error-prone
    compiler (automated)

    View full-size slide

  47. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and complex

    affects code
    readability

    is verbose

    is error-prone
    compiler (automated)

    View full-size slide

  48. Data-Centric Metaprogramming
    object VectorOfEmployeeOpt extends Transformation {
    type Target = Vector[Employee]
    type Result = EmployeeVector
    def toResult(t: Target): Result = ...
    def toTarget(t: Result): Target = ...
    def bypass_length: Int = ...
    def bypass_apply(i: Int): Employee = ...
    def bypass_update(i: Int, v: Employee) = ...
    def bypass_toString: String = ...
    ...
    }

    View full-size slide

  49. object VectorOfEmployeeOpt extends Transformation {
    type Target = Vector[Employee]
    type Result = EmployeeVector
    def toResult(t: Target): Result = ...
    def toTarget(t: Result): Target = ...
    def bypass_length: Int = ...
    def bypass_apply(i: Int): Employee = ...
    def bypass_update(i: Int, v: Employee) = ...
    def bypass_toString: String = ...
    ...
    }
    Data-Centric Metaprogramming
    What to transform?
    What to transform to?

    View full-size slide

  50. object VectorOfEmployeeOpt extends Transformation {
    type Target = Vector[Employee]
    type Result = EmployeeVector
    def toResult(t: Target): Result = ...
    def toTarget(t: Result): Target = ...
    def bypass_length: Int = ...
    def bypass_apply(i: Int): Employee = ...
    def bypass_update(i: Int, v: Employee) = ...
    def bypass_toString: String = ...
    ...
    }
    Data-Centric Metaprogramming
    How to
    transform?

    View full-size slide

  51. Data-Centric Metaprogramming
    object VectorOfEmployeeOpt extends Transformation {
    type Target = Vector[Employee]
    type Result = EmployeeVector
    def toResult(t: Target): Result = ...
    def toTarget(t: Result): Target = ...
    def bypass_length: Int = ...
    def bypass_apply(i: Int): Employee = ...
    def bypass_update(i: Int, v: Employee) = ...
    def bypass_toString: String = ...
    ...
    } How to run methods on the updated representation?

    View full-size slide

  52. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and complex

    affects code
    readability

    is verbose

    is error-prone
    compiler (automated)

    View full-size slide

  53. Transformation
    programmer
    Definition Application

    can't be automated

    based on experience

    based on speculation

    one-time effort

    repetitive and complex

    affects code
    readability

    is verbose

    is error-prone
    compiler (automated)

    View full-size slide

  54. http://infoscience.epfl.ch/record/207050?ln=en

    View full-size slide

  55. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions

    View full-size slide

  56. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  57. Scenario
    class Employee(...)
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  58. Scenario
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }

    View full-size slide

  59. Scenario
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY

    View full-size slide

  60. Scenario
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    class NewEmployee(...)
    extends Employee(...)
    ID NAME SALARY DEPT

    View full-size slide

  61. Scenario
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    class NewEmployee(...)
    extends Employee(...)
    ID NAME SALARY DEPT

    View full-size slide

  62. Scenario
    class Employee(...)
    ID NAME SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    class Vector[T] { … }
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    class NewEmployee(...)
    extends Employee(...)
    ID NAME SALARY DEPT Oooops...

    View full-size slide

  63. Open World Assumption

    Globally anything can happen

    View full-size slide

  64. Open World Assumption

    Globally anything can happen

    Locally you have full control:
    – Make class Employee final or
    – Limit the transformation to code that uses Employee

    View full-size slide

  65. Open World Assumption

    Globally anything can happen

    Locally you have full control:
    – Make class Employee final or
    – Limit the transformation to code that uses Employee
    How?

    View full-size slide

  66. Open World Assumption

    Globally anything can happen

    Locally you have full control:
    – Make class Employee final or
    – Limit the transformation to code that uses Employee
    How?
    Using
    Scopes!

    View full-size slide

  67. Scopes
    transform(VectorOfEmployeeOpt) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )

    View full-size slide

  68. Scopes
    transform(VectorOfEmployeeOpt) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }

    View full-size slide

  69. Scopes
    transform(VectorOfEmployeeOpt) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Now the method operates
    on the EmployeeVector
    representation.

    View full-size slide

  70. Scopes

    Can wrap statements, methods, even entire classes
    – Inlined immediately after the parser
    – Definitions are visible outside the "scope"

    View full-size slide

  71. Scopes

    Can wrap statements, methods, even entire classes
    – Inlined immediately after the parser
    – Definitions are visible outside the "scope"
    No, it's not a macro. It's a
    marker for the compiler plugin.
    (You can't do this with macros)

    View full-size slide

  72. Scopes

    Can wrap statements, methods, even entire classes
    – Inlined immediately after the parser
    – Definitions are visible outside the "scope"

    View full-size slide

  73. Scopes

    Can wrap statements, methods, even entire classes
    – Inlined immediately after the parser
    – Definitions are visible outside the "scope"

    Mark locally closed parts of the code
    – Incoming/outgoing values go through conversions
    – You can reject unexpected values

    View full-size slide

  74. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  75. Best Representation?
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  76. Best Representation?
    It depends.
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  77. Best ...?
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    It depends.
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  78. Best ...?
    Compact binary repr.

    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    It depends.
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  79. Best ...?
    EmployeeJSON
    {
    id: 123,
    name: “John Doe”
    salary: 100
    }
    Compact binary repr.

    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    It depends.
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  80. Scopes allow mixing data representations
    transform(VectorOfEmployeeOpt) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }

    View full-size slide

  81. Scopes
    transform(VectorOfEmployeeOpt) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Operating on the
    EmployeeVector
    representation.

    View full-size slide

  82. Scopes
    transform(VectorOfEmployeeCompact) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Operating on the
    compact binary
    representation.

    View full-size slide

  83. Scopes
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    for (employee ← employees)
    yield employee.copy(
    salary = (1 + by) * employee.salary
    )
    }
    Operating on the
    JSON-based
    representation.

    View full-size slide

  84. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  85. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)

    View full-size slide

  86. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    View full-size slide

  87. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    View full-size slide

  88. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    Original code (using the default representation)

    View full-size slide

  89. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    Original code (using the default representation)

    Transformed code (using a different representation)

    View full-size slide

  90. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    Original code (using the default representation)

    Transformed code (using a different representation)

    Calls between them

    View full-size slide

  91. Composition
    def index1Percent(employees: Vector[Employee]) =
    indexSalary(employees, 0.01)
    transform(VectorOfEmployeeJSON) {
    def indexSalary(employees: Vector[Employee],
    by: Float): Vector[Employee] =
    ...
    }

    Original code (using the default representation)

    Transformed code (using a different representation)

    Calls between them
    ???

    View full-size slide

  92. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  93. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  94. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation
    Easy one. Do nothing

    View full-size slide

  95. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  96. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  97. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  98. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation
    Automatically introduce conversions
    between values in the two representations
    e.g. EmployeeVector Vector[Employee] or back

    View full-size slide

  99. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  100. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  101. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  102. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation
    Hard one. Do not introduce any conversions.
    Even across separate compilation

    View full-size slide

  103. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  104. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation
    Hard one. Automatically introduce double
    conversions (and warn the programmer)
    e.g. EmployeeVector Vector[Employee] CompactEmpVector
    → →

    View full-size slide

  105. Composition
    calling

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  106. Composition
    calling
    overriding

    Original code

    Transformed code

    Original code

    Transformed code

    Same transformation

    Different transformation

    View full-size slide

  107. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }

    View full-size slide

  108. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }

    View full-size slide

  109. trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }
    Scopes
    Method print in the class
    implements
    method print in the trait

    View full-size slide

  110. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }

    View full-size slide

  111. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    transform(VectorOfEmployeeOpt) {
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }
    }

    View full-size slide

  112. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    transform(VectorOfEmployeeOpt) {
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }
    } The signature of method print changes
    according to the transformation
    → it no longer implements the trait

    View full-size slide

  113. Scopes
    trait Printer[T] {
    def print(elements: Vector[T]): Unit
    }
    transform(VectorOfEmployeeOpt) {
    class EmployeePrinter extends Printer[Employee] {
    def print(elements: Vector[Employee]) = ...
    }
    } The signature of method print changes
    according to the transformation
    → it no longer implements the trait
    Taken care by the
    compiler for you!

    View full-size slide

  114. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  115. Column-oriented Storage
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY

    View full-size slide

  116. Column-oriented Storage
    NAME ...
    NAME
    EmployeeVector
    ID ID ...
    ...
    SALARY SALARY
    Vector[Employee]
    ID NAME SALARY
    ID NAME SALARY
    iteration is 5x faster

    View full-size slide

  117. Retrofitting value class status
    (3,5)
    3 5
    Header
    reference

    View full-size slide

  118. Retrofitting value class status
    Tuples in Scala are specialized but
    are still objects (not value classes)
    = not as optimized as they could be
    (3,5)
    3 5
    Header
    reference

    View full-size slide

  119. Retrofitting value class status
    0l + 3 << 32 + 5
    (3,5)
    Tuples in Scala are specialized but
    are still objects (not value classes)
    = not as optimized as they could be
    (3,5)
    3 5
    Header
    reference

    View full-size slide

  120. Retrofitting value class status
    0l + 3 << 32 + 5
    (3,5)
    Tuples in Scala are specialized but
    are still objects (not value classes)
    = not as optimized as they could be
    (3,5)
    3 5
    Header
    reference
    14x faster, lower
    heap requirements

    View full-size slide

  121. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum

    View full-size slide

  122. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4)

    View full-size slide

  123. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8)

    View full-size slide

  124. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18

    View full-size slide

  125. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18

    View full-size slide

  126. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    transform(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }

    View full-size slide

  127. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    transform(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function

    View full-size slide

  128. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    transform(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function

    View full-size slide

  129. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    transform(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function
    compute:
    18

    View full-size slide

  130. Deforestation
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    List(2,3,4) List(4,6,8) 18
    transform(ListDeforestation) {
    List(1,2,3).map(_ + 1).map(_ * 2).sum
    }
    accumulate
    function
    accumulate
    function
    compute:
    18
    6x faster

    View full-size slide

  131. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  132. Research ahead*
    !
    * This may not make it into a product.
    But you can play with it nevertheless.

    View full-size slide

  133. Spark
    RDD
    (Reliable Distributed Dataset)

    View full-size slide

  134. Spark
    RDD
    (Reliable Distributed Dataset)
    Key abstraction
    in Spark

    View full-size slide

  135. Spark
    RDD
    (Reliable Distributed Dataset)

    View full-size slide

  136. Spark
    RDD
    (Reliable Distributed Dataset)

    View full-size slide

  137. Spark
    RDD
    (Reliable Distributed Dataset)
    Primary Data
    (e.g. CSV file)

    View full-size slide

  138. Spark
    RDD
    (Reliable Distributed Dataset)
    Primary Data
    (e.g. CSV file)
    Primary Data
    (e.g. CSV file)
    Derived Data
    (e.g. primary.map(f))
    Primary Data
    (e.g. CSV file)

    View full-size slide

  139. Spark
    RDD
    (Reliable Distributed Dataset)
    Primary Data
    (e.g. CSV file)
    Primary Data
    (e.g. CSV file)
    Derived Data
    (e.g. primary.map(f))
    Primary Data
    (e.g. CSV file)
    How does
    mapping work?

    View full-size slide

  140. Mapping an RDD
    X Y
    user
    function
    f

    View full-size slide

  141. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    user
    function
    f
    decode

    View full-size slide

  142. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    encoded
    data
    user
    function
    f
    decode encode

    View full-size slide

  143. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    encoded
    data
    user
    function
    f
    decode encode
    Allocate object Allocate object

    View full-size slide

  144. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    encoded
    data
    user
    function
    f
    decode encode
    Allocate object Allocate object

    View full-size slide

  145. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    encoded
    data
    user
    function
    f
    decode encode

    View full-size slide

  146. Mapping an RDD
    serialized
    data
    encoded
    data
    X Y
    encoded
    data
    user
    function
    f
    decode encode
    Modified user function
    (automatically derived
    by the compiler)

    View full-size slide

  147. Mapping an RDD
    serialized
    data
    encoded
    data
    encoded
    data
    Modified user function
    (automatically derived
    by the compiler)

    View full-size slide

  148. Mapping an RDD
    serialized
    data
    encoded
    data
    encoded
    data
    Modified user function
    (automatically derived
    by the compiler)
    Nowhere near as
    simple as it looks

    View full-size slide

  149. Challenge: Transformation not possible

    Example: Calling outside (untransformed) method

    View full-size slide

  150. Challenge: Transformation not possible

    Example: Calling outside (untransformed) method

    Solution: Issue compiler warnings

    View full-size slide

  151. Challenge: Transformation not possible

    Example: Calling outside (untransformed) method

    Solution: Issue compiler warnings
    – Explain why it's not possible: due to the method call

    View full-size slide

  152. Challenge: Transformation not possible

    Example: Calling outside (untransformed) method

    Solution: Issue compiler warnings
    – Explain why it's not possible: due to the method call
    – Suggest how to fix it: enclose the method in a scope

    View full-size slide

  153. Challenge: Transformation not possible

    Example: Calling outside (untransformed) method

    Solution: Issue compiler warnings
    – Explain why it's not possible: due to the method call
    – Suggest how to fix it: enclose the method in a scope

    Reuse the machinery in miniboxing
    scala-miniboxing.org

    View full-size slide

  154. Challenge: Internal API

    View full-size slide

  155. Challenge: Internal API

    Spark internals rely on Iterator[T]
    – Requires materializing values
    – Needs to be replaced throughout the code base
    – By rather complex buffers

    View full-size slide

  156. Challenge: Internal API

    Spark internals rely on Iterator[T]
    – Requires materializing values
    – Needs to be replaced throughout the code base
    – By rather complex buffers

    Solution: Extensive refactoring/rewrite

    View full-size slide

  157. Prototype Hack

    View full-size slide

  158. Prototype Hack

    Modified version of Spark core
    – RDD data representation is configurable

    View full-size slide

  159. Prototype Hack

    Modified version of Spark core
    – RDD data representation is configurable

    It's very limited:
    – Custom data repr. only in map, filter and flatMap
    – Otherwise we revert to costly objects
    – Large parts of the automation still need to be done

    View full-size slide

  160. Prototype Hack
    sc.parallelize(/* 1 million */ records).
    map(x => ...).
    filter(x => ...).
    collect()

    View full-size slide

  161. sc.parallelize(/* 1 million */ records).
    map(x => ...).
    filter(x => ...).
    collect()
    Prototype Hack
    More details in my talk at
    Spark Summit EU 2015

    View full-size slide

  162. Motivation
    Transformation
    Applications
    Challenges
    Conclusion
    Functions
    Open World
    Best Representation?
    Composition

    View full-size slide

  163. Conclusion

    Object-oriented composition → inefficient representation

    View full-size slide

  164. Conclusion

    Object-oriented composition → inefficient representation

    Solution: data-centric metaprogramming

    View full-size slide

  165. Conclusion

    Object-oriented composition → inefficient representation

    Solution: data-centric metaprogramming
    – Use the best representation for your data!

    View full-size slide

  166. Conclusion

    Object-oriented composition → inefficient representation

    Solution: data-centric metaprogramming
    – Use the best representation for your data!
    – Is it possible? Yes.

    View full-size slide

  167. Conclusion

    Object-oriented composition → inefficient representation

    Solution: data-centric metaprogramming
    – Use the best representation for your data!
    – Is it possible? Yes.
    – Is it easy? Not really.

    View full-size slide

  168. Conclusion

    Object-oriented composition → inefficient representation

    Solution: data-centric metaprogramming
    – Use the best representation for your data!
    – Is it possible? Yes.
    – Is it easy? Not really.
    – Is it worth it? You tell me!

    View full-size slide

  169. Thank you!
    Check out scala-ildl.org.

    View full-size slide

  170. Thank you!
    Check out scala-ildl.org.

    View full-size slide

  171. Deforestation and Language Semantics

    Notice that we changed language semantics:
    – Before: collections were eager
    – After: collections are lazy
    – This can lead to effects reordering

    View full-size slide

  172. Deforestation and Language Semantics

    Such transformations are only acceptable with
    programmer consent
    – JIT compilers/staged DSLs can't change semantics
    – metaprogramming (macros) can, but it should be
    documented/opt-in

    View full-size slide

  173. Code Generation

    Also known as
    – Deep Embedding
    – Multi-Stage Programming

    Awesome speedups, but restricted to small DSLs

    SparkSQL uses code gen to improve performance
    – By 2-4x over Spark

    View full-size slide

  174. Low-level Optimizers

    Java JIT Compiler
    – Access to the low-level code
    – Can assume a (local) closed world
    – Can speculate based on profiles

    View full-size slide

  175. Low-level Optimizers

    Java JIT Compiler
    – Access to the low-level code
    – Can assume a (local) closed world
    – Can speculate based on profiles

    Best optimizations break semantics
    – You can't do this in the JIT compiler!
    – Only the programmer can decide to break semantics

    View full-size slide

  176. Scala Macros

    Many optimizations can be done with macros
    – :) Lots of power
    – :( Lots of responsibility

    Scala compiler invariants

    Object-oriented model

    Modularity

    View full-size slide

  177. Scala Macros

    Many optimizations can be done with macros
    – :) Lots of power
    – :( Lots of responsibility

    Scala compiler invariants

    Object-oriented model

    Modularity

    Can we restrict macros so they're safer?
    – Data-centric metaprogramming

    View full-size slide