Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ScalaBlitz

 ScalaBlitz

Optimizing the parallel collection operations.

Aleksandar Prokopec

December 02, 2013
Tweet

More Decks by Aleksandar Prokopec

Other Decks in Programming

Transcript

  1. ScalaBlitz
    Efficient Collections Framework

    View Slide

  2. What’s a Blitz?

    View Slide

  3. Blitz-chess is a style of
    rapid chess play.

    View Slide

  4. Blitz-chess is a style of
    rapid chess play.

    View Slide

  5. Knights have horses.

    View Slide

  6. Horses run fast.

    View Slide

  7. def mean(xs: Array[Float]): Float =
    xs.par.reduce(_ + _) / xs.length

    View Slide

  8. View Slide

  9. View Slide

  10. With Lists, operations
    can only be executed
    from left to right

    View Slide

  11. 1 2 4 8

    View Slide

  12. 1 2 4 8
    Not your typical list.

    View Slide

  13. Bon app.

    View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. Apparently not enough

    View Slide

  19. View Slide

  20. No amount of
    documentation is
    apparently enough

    View Slide

  21. View Slide

  22. The reduceLeft
    guarantees operations are
    executed from left to right

    View Slide

  23. Parallel and sequential
    collections sharing operations

    View Slide

  24. There are several
    problems here

    View Slide

  25. How we see users

    View Slide

  26. How users
    see the docs

    View Slide

  27. Bending the truth.

    View Slide

  28. And sometimes we
    were just slow

    View Slide

  29. So, we have a new API now
    def findDoe(names: Array[String]): Option[String] =
    {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  30. Wait, you renamed a
    method?
    def findDoe(names: Array[String]): Option[String] =
    {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  31. Yeah, par already exists.
    But, toPar is different.
    def findDoe(names: Array[String]): Option[String] =
    {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  32. def findDoe(names: Array[String]): Option[String] = {
    names.toPar.find(_.endsWith(“Doe”))
    }
    implicit class ParOps[Repr](val r: Repr) extends AnyVal {
    def toPar = new Par(r)
    }

    View Slide

  33. def findDoe(names: Array[String]): Option[String] = {
    ParOps(names).toPar.find(_.endsWith(“Doe”))
    }
    implicit class ParOps[Repr](val r: Repr) extends AnyVal {
    def toPar = new Par(r)
    }

    View Slide

  34. def findDoe(names: Array[String]): Option[String] = {
    ParOps(names).toPar.find(_.endsWith(“Doe”))
    }
    implicit class ParOps[Repr](val r: Repr) extends AnyVal {
    def toPar = new Par(r)
    }
    class Par[Repr](r: Repr)

    View Slide

  35. def findDoe(names: Array[String]): Option[String] = {
    (new Par(names)).find(_.endsWith(“Doe”))
    }
    implicit class ParOps[Repr](val r: Repr) extends AnyVal {
    def toPar = new Par(r)
    }
    class Par[Repr](r: Repr)

    View Slide

  36. def findDoe(names: Array[String]): Option[String] = {
    (new Par(names)).find(_.endsWith(“Doe”))
    }
    class Par[Repr](r: Repr)
    But, Par[Repr] does not
    have the find method!

    View Slide

  37. True, but Par[Array[String]]
    does have a find method.
    def findDoe(names: Array[String]): Option[String] = {
    (new Par(names)).find(_.endsWith(“Doe”))
    }
    class Par[Repr](r: Repr)

    View Slide

  38. def findDoe(names: Array[String]): Option[String] = {
    (new Par(names)).find(_.endsWith(“Doe”))
    }
    class Par[Repr](r: Repr)
    implicit class ParArrayOps[T](pa: Par[Array[T]]) {
    ...
    def find(p: T => Boolean): Option[T]
    ...
    }

    View Slide

  39. More flexible!

    View Slide

  40. More flexible!
    ● does not have to implement methods that
    make no sense in parallel

    View Slide

  41. More flexible!
    ● does not have to implement methods that
    make no sense in parallel
    ● slow conversions explicit

    View Slide

  42. No standard library collections were
    hurt doing this.
    No standard library collections were
    hurt doing this.

    View Slide

  43. More flexible!
    ● does not have to implement methods that
    make no sense in parallel
    ● slow conversions explicit
    ● non-intrusive addition to standard library

    View Slide

  44. More flexible!
    ● does not have to implement methods that
    make no sense in parallel
    ● slow conversions explicit
    ● non-intrusive addition to standard library
    ● easy to add new methods and collections

    View Slide

  45. More flexible!
    ● does not have to implement methods that
    make no sense in parallel
    ● slow conversions explicit
    ● non-intrusive addition to standard library
    ● easy to add new methods and collections
    ● import switches between implementations

    View Slide

  46. def findDoe(names: Seq[String]): Option[String] = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  47. def findDoe(names: Seq[String]): Option[String] = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  48. But how do I write generic code?
    def findDoe(names: Seq[String]): Option[String] = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  49. def findDoe[Repr[_]](names: Par[Repr[String]]) = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  50. Par[Repr[String]]does not
    have a find
    def findDoe[Repr[_]](names: Par[Repr[String]]) = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  51. def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = {
    names.toPar.find(_.endsWith(“Doe”))
    }

    View Slide

  52. def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = {
    names.toPar.find(_.endsWith(“Doe”))
    }
    We don’t do this.

    View Slide

  53. Make everything as simple as
    possible, but not simpler.

    View Slide

  54. def findDoe(names: Reducable[String])= {
    names.find(_.endsWith(“Doe”))
    }

    View Slide

  55. def findDoe(names: Reducable[String])= {
    names.find(_.endsWith(“Doe”))
    }
    findDoe(Array(1, 2, 3).toPar)

    View Slide

  56. def findDoe(names: Reducable[String])= {
    names.find(_.endsWith(“Doe”))
    }
    findDoe(toReducable(Array(1, 2, 3).toPar))

    View Slide

  57. def findDoe(names: Reducable[String])= {
    names.find(_.endsWith(“Doe”))
    }
    findDoe(toReducable(Array(1, 2, 3).toPar))
    def arrayIsReducable[T]: IsReducable[T] = { … }

    View Slide

  58. So let’s write a program!

    View Slide

  59. View Slide

  60. import scala.collection.par._
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    }

    View Slide

  61. import scala.collection.par._
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    val x = idx % wdt
    val y = idx / wdt
    }

    View Slide

  62. import scala.collection.par._
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    val x = idx % wdt
    val y = idx / wdt
    pixels(idx) = computeColor(x, y)
    }

    View Slide

  63. import scala.collection.par._
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    val x = idx % wdt
    val y = idx / wdt
    pixels(idx) = computeColor(x, y)
    }
    Scheduler not found!

    View Slide

  64. import scala.collection.par._
    import Scheduler.Implicits.global
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    val x = idx % wdt
    val y = idx / wdt
    pixels(idx) = computeColor(x, y)
    }

    View Slide

  65. import scala.collection.par._
    import Scheduler.Implicits.global
    val pixels = new Array[Int](wdt * hgt)
    for (idx <- (0 until (wdt * hgt)).toPar) {
    val x = idx % wdt
    val y = idx / wdt
    pixels(idx) = computeColor(x, y)
    }

    View Slide

  66. New parallel collections
    33% faster!
    Now
    103 ms
    Previously
    148 ms

    View Slide

  67. Workstealing tree scheduler
    rocks!

    View Slide

  68. Workstealing tree scheduler
    rocks!
    But, are there other interesting

    View Slide

  69. Fine-grained uniform
    workloads are on the opposite
    side of the spectrum.

    View Slide

  70. def mean(xs: Array[Float]): Float = {
    val sum = xs.toPar.fold(0)(_ + _)
    sum / xs.length
    }

    View Slide

  71. def mean(xs: Array[Float]): Float = {
    val sum = xs.toPar.fold(0)(_ + _)
    sum / xs.length
    }
    Now
    15 ms
    Previously
    565 ms

    View Slide

  72. But how?

    View Slide

  73. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = {
    var it = a.iterator
    var acc = z
    while (it.hasNext) {
    acc = op(acc, it.next)
    }
    acc
    }

    View Slide

  74. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = {
    var it = a.iterator
    var acc = z
    while (it.hasNext) {
    acc = box(op(acc, it.next))
    }
    acc
    }

    View Slide

  75. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = {
    var it = a.iterator
    var acc = z
    while (it.hasNext) {
    acc = box(op(acc, it.next))
    }
    acc
    }
    Generic methods cause boxing of primitives

    View Slide

  76. def mean(xs: Array[Float]): Float = {
    val sum = xs.toPar.fold(0)(_ + _)
    sum / xs.length
    }

    View Slide

  77. def mean(xs: Array[Float]): Float = {
    val sum = xs.toPar.fold(0)(_ + _)
    sum / xs.length
    }
    Generic methods hurt performance
    What can we do instead?

    View Slide

  78. def mean(xs: Array[Float]): Float = {
    val sum = xs.toPar.fold(0)(_ + _)
    sum / xs.length
    }
    Generic methods hurt performance
    What can we do instead?
    Inline method body!

    View Slide

  79. def mean(xs: Array[Float]): Float = {
    val sum = {
    var it = xs.iterator
    var acc = 0
    while (it.hasNext) {
    acc = acc + it.next
    }
    acc
    }
    sum / xs.length
    }

    View Slide

  80. def mean(xs: Array[Float]): Float = {
    val sum = {
    var it = xs.iterator
    var acc = 0
    while (it.hasNext) {
    acc = acc + it.next
    }
    acc
    }
    sum / xs.length
    }
    Specific type
    No boxing!
    No memory allocation!

    View Slide

  81. def mean(xs: Array[Float]): Float = {
    val sum = {
    var it = xs.iterator
    var acc = 0
    while (it.hasNext) {
    acc = acc + it.next
    }
    acc
    }
    sum / xs.length
    }
    Specific type
    No boxing!
    No memory allocation!
    2X speedup
    565 ms → 281 ms

    View Slide

  82. def mean(xs: Array[Float]): Float = {
    val sum = {
    var it = xs.iterator
    var acc = 0
    while (it.hasNext) {
    acc = acc + it.next
    }
    acc
    }
    sum / xs.length
    }

    View Slide

  83. def mean(xs: Array[Float]): Float = {
    val sum = {
    var it = xs.iterator
    var acc = 0
    while (it.hasNext) {
    acc = acc + it.next
    }
    acc
    }
    sum / xs.length
    }
    Iterators? For Array?
    We don’t need them!

    View Slide

  84. def mean(xs: Array[Float]): Float = {
    val sum = {
    var i = 0
    val until = xs.size
    var acc = 0
    while (i < until) {
    acc = acc + a(i)
    i = i + 1
    }
    acc
    }
    sum / xs.length
    }
    Use index-based access!

    View Slide

  85. def mean(xs: Array[Float]): Float = {
    val sum = {
    var i = 0
    val until = xs.size
    var acc = 0
    while (i < until) {
    acc = acc + a(i)
    i = i + 1
    }
    acc
    }
    sum / xs.length
    }
    19x speedup
    Use index-based access!
    281 ms → 15 ms

    View Slide

  86. Are those optimizations parallel-collections specific?

    View Slide

  87. Are those optimizations parallel-collections specific?
    No

    View Slide

  88. Are those optimizations parallel-collections specific?
    No
    You can use them on sequential collections

    View Slide

  89. def mean(xs: Array[Float]): Float = {
    val sum = xs.fold(0)(_ + _)
    sum / xs.length
    }

    View Slide

  90. import scala.collections.optimizer._
    def mean(xs: Array[Float]): Float = optimize{
    val sum = xs.fold(0)(_ + _)
    sum / xs.length
    }

    View Slide

  91. import scala.collections.optimizer._
    def mean(xs: Array[Float]): Float = optimize{
    val sum = xs.fold(0)(_ + _)
    sum / xs.length
    }
    You get 38 times speedup!

    View Slide

  92. Future work

    View Slide

  93. @specialized collections
    ● Maps
    ● Sets
    ● Lists
    ● Vectors
    Both faster &
    consuming less
    memory

    View Slide

  94. @specialized collections
    ● Maps
    ● Sets
    ● Lists
    ● Vectors
    Both faster &
    consuming less
    memory
    Expect to get this for free inside
    optimize{} block

    View Slide

  95. jdk8-style streams(parallel views)
    ● Fast
    ● Lightweight
    ● Expressive API
    ● Optimized
    Lazy data-parallel
    operations made
    easy

    View Slide

  96. Future’s based asynchronous API
    val sum = future{ xs.sum }
    val normalized = sum.andThen(sum => sum/xs.size)
    Boilerplate code, ugly

    View Slide

  97. Future’s based asynchronous API
    val sum = xs.toFuture.sum
    val scaled = xs.map(_ / sum)
    ● Simple to use
    ● Lightweight
    ● Expressive API
    ● Optimized
    Asynchronous dat
    parallel operations
    made easy

    View Slide

  98. Current research: operation fusion
    val minMaleAge = people.filter(_.isMale)
    .map(_.age).min
    val minFemaleAge = people.filter(_.isFemale)
    .map(_.age).min

    View Slide

  99. Current research: operation fusion
    val minMaleAge = people.filter(_.isMale)
    .map(_.age).min
    val minFemaleAge = people.filter(_.isFemale)
    .map(_.age).min
    ● Requires up to 3 times more memory than original collection
    ● Requires 6 traversals of collections

    View Slide

  100. Current research: operation fusion
    val minMaleAge = people.filter(_.isMale)
    .map(_.age).min
    val minFemaleAge = people.filter(_.isFemale)
    .map(_.age).min
    ● Requires up to 3 times more memory than original collection
    ● Requires 6 traversals of collections
    We aim to reduce this to single traversal with no
    additional memory.
    Without you changing your code

    View Slide