Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking Arrays in R

Davis Vaughan
April 18, 2019
1.8k

Rethinking Arrays in R

Davis Vaughan

April 18, 2019
Tweet

Transcript

  1. Rethinking Arrays in R
    Davis Vaughan
    @dvaughan32
    Software Engineer, RStudio
    April 2019

    View full-size slide

  2. Array manipulation in R is
    inconsistent and doesn’t follow
    natural intuition.

    View full-size slide

  3. Arrays are:
    1. frustrating to work with.
    2. difficult to program around.
    3. underpowered.

    View full-size slide

  4. Subsetting
    Broadcasting
    Manipulation

    View full-size slide

  5. dimensionality:
    The number of dimensions in an
    array.
    dimensions:
    The set of lengths describing the
    shape of the array.

    View full-size slide

  6. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    4
    2 6
    5
    3
    1
    (2, 3)

    View full-size slide

  7. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    The entire set makes up the dimensions
    4
    2 6
    5
    3
    1
    (2, 3)

    View full-size slide

  8. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    The entire set makes up the dimensions
    The dimensionality is 2
    (2D object)
    4
    2 6
    5
    3
    1
    (2, 3)

    View full-size slide

  9. Enter the matrix.
    4
    2 6
    5
    3
    1
    x

    View full-size slide

  10. Column selection
    4
    2 6
    5
    3
    1
    x x[, 1:2]
    3
    1
    4
    2

    View full-size slide

  11. One column?
    4
    2 6
    5
    3
    1
    x x[, 1]
    ?

    View full-size slide

  12. One column?
    4
    2 6
    5
    3
    1
    x x[, 1]
    1 2

    View full-size slide

  13. Oh! Let me fix that for you…
    4
    2 6
    5
    3
    1
    x x[, 1, drop = FALSE]
    2
    1

    View full-size slide

  14. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    ?

    View full-size slide

  15. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    Error:
    incorrect number
    of dimensions

    View full-size slide

  16. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    Error:
    incorrect number
    of dimensions
    http://gph.is/1kA5eNi

    View full-size slide

  17. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2,]
    1 3
    2 4
    7
    8
    9
    10

    View full-size slide

  18. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    ?

    View full-size slide

  19. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View full-size slide

  20. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View full-size slide

  21. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View full-size slide

  22. 4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1, , drop = FALSE]
    1
    2
    7
    8
    Oh! Let me fix that for you…

    View full-size slide

  23. The confusion?
    Subsetting is not
    dimensionality-stable.

    View full-size slide

  24. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View full-size slide

  25. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View full-size slide

  26. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View full-size slide

  27. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View full-size slide

  28. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View full-size slide

  29. rray is designed to provide a
    stricter array class.

    View full-size slide

  30. Create an rray
    library(rray)
    x "<- matrix(1:6, nrow = 2)
    x
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6
    x_rray "<- as_rray(x)
    x_rray
    #> [,3][6]>
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6

    View full-size slide

  31. Column subsetting…round two
    4
    2 6
    5
    3
    1
    x_rray x_rray[, 1]
    2
    1
    x_rray[, 1:2]
    3
    1
    4
    2
    4
    2 6
    5
    3
    1
    y_rray
    10
    8 12
    11
    9
    7
    2
    1
    y_rray[, 1]
    7
    8
    4
    2
    3
    1
    y_rray[, 1:2]
    7
    10
    8
    9

    View full-size slide

  32. rray_extract() always drops to 1D
    4
    2 6
    5
    3
    1
    y_rray
    10
    8 12
    11
    9
    7
    rray_extract(y_rray, , 1)
    8
    7
    2
    1

    View full-size slide

  33. Broadcasting

    View full-size slide

  34. Broadcasting has to do with
    increasing dimensionality and
    recycling dimensions.

    View full-size slide

  35. Let’s do some math
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1)

    View full-size slide

  36. Let’s do some math
    4
    2 6
    5
    3
    1
    1
    + =
    5
    3 7
    6
    4
    2
    (2, 3)
    x: (2, 3) y: (1)

    View full-size slide

  37. How is y reshaped
    so that this works?

    View full-size slide

  38. x: (2, 3)
    y: (1)
    —————————
    (?, ?)
    Step 1 - increase dimensionality
    Dimensionality of 2
    Dimensionality of 1
    Append 1’s to the dimensionality of y
    until it matches the dimensionality of x
    x: (2, 3)
    y: (1, 1)
    —————————
    (?, ?)

    View full-size slide

  39. Step 1 - increase dimensionality
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View full-size slide

  40. Step 2 - recycle dimensions
    If the rows of y were recycled to length 2, it would
    match the length of the rows of x
    x: (2, 3)
    y: (2, 1)
    —————————
    (2, ?)
    x: (2, 3)
    y: (1, 1)
    —————————
    (?, ?)

    View full-size slide

  41. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View full-size slide

  42. Step 2 - recycle dimensions
    If the columns of y were recycled to length 3, it would
    match the length of the columns of x
    x: (2, 3)
    y: (2, 3)
    —————————
    (2, 3)
    x: (2, 3)
    y: (2, 1)
    —————————
    (2, ?)

    View full-size slide

  43. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    1
    1
    1
    1
    +
    y: (2, 3)
    =

    View full-size slide

  44. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    1
    1
    1
    1
    +
    y: (2, 3)
    =
    5
    3 7
    6
    4
    2
    (2, 3)

    View full-size slide

  45. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View full-size slide

  46. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)
    Error:
    non-conformable
    arrays

    View full-size slide

  47. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)
    Error:
    non-conformable
    arrays
    https://tenor.com/view/ronswanson-throw-computer-gif-9550833

    View full-size slide

  48. R doesn’t broadcast.
    We just got lucky that it worked
    with scalars.

    View full-size slide

  49. We know the result
    4
    2 6
    5
    3
    1
    1
    + =
    5
    3 7
    6
    4
    2
    (2, 3)
    x: (2, 3) y: (1, 1)
    1
    1
    1
    1
    1
    1
    y: (2, 3)

    View full-size slide

  50. Match dimensionality by appending 1’s
    Match dimensions by recycling dimensions of length 1
    Broadcasting rules:

    View full-size slide

  51. rray broadcasts
    library(rray)
    x "<- matrix(1:6, nrow = 2)
    x
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6
    z "<- matrix(1)
    z
    #> [,1]
    #> [1,] 1
    x + z
    #> Error in x + z : non-conformable arrays
    as_rray(x) + z
    #> [,3][6]>
    #> [,1] [,2] [,3]
    #> [1,] 2 4 6
    #> [2,] 3 5 7

    View full-size slide

  52. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View full-size slide

  53. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View full-size slide

  54. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View full-size slide

  55. Let’s go 3D
    1
    3
    2
    + =
    y: (3, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5

    View full-size slide

  56. Let’s go 3D
    1
    3
    2
    + =
    y: (3, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5
    Can you even
    do that!"

    View full-size slide

  57. x: (1, 3, 2)
    y: (3, 1)
    ————————————
    (?, ?, ?)
    Step 1 - increase dimensionality
    Dimensionality of 3
    Dimensionality of 2
    Append 1’s to the dimensionality of y
    until it matches the dimensionality of x
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)

    View full-size slide

  58. Step 1 - increase dimensionality
    1
    3
    2
    + =
    y: (3, 1, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5

    View full-size slide

  59. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View full-size slide

  60. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View full-size slide

  61. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View full-size slide

  62. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View full-size slide

  63. Step 2 - recycle dimensions
    1
    2
    3
    2
    1
    3
    1
    3
    2
    + =
    y: (3, 3, 2)
    1 3
    2
    3
    1 2
    1 2 3
    x: (3, 3, 2)
    6
    4 5
    5 6
    4
    6
    4 5 1
    2
    3
    2
    1
    3
    1
    3
    2

    View full-size slide

  64. Step 2 - recycle dimensions
    1
    2
    3
    2
    1
    3
    1
    3
    2
    + =
    y: (3, 3, 2)
    1 3
    2
    3
    1 2
    1 2 3
    x: (3, 3, 2)
    6
    4 5
    5 6
    4
    6
    4 5 1
    2
    3
    2
    1
    3
    1
    3
    2
    4 6
    5
    5
    3 4
    2 3 4
    9
    7 8
    7 8
    6
    7
    5 6
    (3, 3, 2)

    View full-size slide

  65. Manipulation

    View full-size slide

  66. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View full-size slide

  67. The best part?
    They all work with base R.

    View full-size slide

  68. The best part?
    They all work with base R.

    View full-size slide

  69. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View full-size slide

  70. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View full-size slide

  71. 1
    2
    4
    3
    How can we bind these together?

    View full-size slide

  72. cbind( , )
    1
    2
    4
    3

    View full-size slide

  73. cbind( , )
    1
    2
    4
    3
    Error:
    number of rows
    of matrices must match

    View full-size slide

  74. rbind( , )
    1
    2
    4
    3

    View full-size slide

  75. rbind( , )
    1
    2
    4
    3
    Error:
    number of columns
    of matrices must match

    View full-size slide

  76. 1
    2
    4
    3
    rray_bind( , , axis = 1)
    3 4
    2
    1 1
    2

    View full-size slide

  77. 1
    2
    4
    3
    4
    4
    2
    1
    3
    3
    rray_bind( , , axis = 2)

    View full-size slide

  78. 1
    2
    4
    3
    rray_bind( , , axis = 3)
    1
    1
    2 2
    3 4
    4
    3

    View full-size slide

  79. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View full-size slide

  80. What if we want to “normalize”
    by dividing by the max value?
    Along columns? Along rows?
    4
    2 6
    5
    3
    1
    x

    View full-size slide

  81. x / max(x)
    sweep(x, 1, apply(x, 1, max), “/")
    sweep(x, 2, apply(x, 2, max), “/")
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    .667
    .333 1
    .833
    .500
    .167
    .667
    .333 1
    1
    .600
    .200
    1
    1 1
    .833
    .750
    .500

    View full-size slide

  82. x / rray_max(x)
    x / rray_max(x, axes = 2)
    x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    .667
    .333 1
    .833
    .500
    .167
    .667
    .333 1
    1
    .600
    .200
    1
    1 1
    .833
    .750
    .500

    View full-size slide

  83. rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View full-size slide

  84. x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    /
    4
    2 6
    6
    4
    2
    rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View full-size slide

  85. x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    /
    4
    2 6
    6
    4
    2
    1
    1 1
    .833
    .750
    .500
    rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View full-size slide

  86. Arrays are:
    1. frustrating to work with.
    2. difficult to program around.
    3. underpowered.

    View full-size slide

  87. Arrays are:
    1. intuitive to work with.
    2. predictable to program around.
    3. powerful.

    View full-size slide

  88. GitHub
    https://github.com/DavisVaughan/rray
    Website
    https://davisvaughan.github.io/rray/

    View full-size slide