Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking Arrays in R

F3a9889311273df8c6f72ed94a91a3fd?s=47 Davis Vaughan
April 18, 2019
1.7k

Rethinking Arrays in R

F3a9889311273df8c6f72ed94a91a3fd?s=128

Davis Vaughan

April 18, 2019
Tweet

Transcript

  1. Rethinking Arrays in R Davis Vaughan @dvaughan32 Software Engineer, RStudio

    April 2019
  2. Array manipulation in R is inconsistent and doesn’t follow natural

    intuition.
  3. Arrays are: 1. frustrating to work with. 2. difficult to

    program around. 3. underpowered.
  4. Subsetting Broadcasting Manipulation

  5. dimensionality: The number of dimensions in an array. dimensions: The

    set of lengths describing the shape of the array.
  6. dimensionality VS dimensions Number of 1st dim elements (rows) Number

    of 2nd dim elements (columns) 4 2 6 5 3 1 (2, 3)
  7. dimensionality VS dimensions Number of 1st dim elements (rows) Number

    of 2nd dim elements (columns) The entire set makes up the dimensions 4 2 6 5 3 1 (2, 3)
  8. dimensionality VS dimensions Number of 1st dim elements (rows) Number

    of 2nd dim elements (columns) The entire set makes up the dimensions The dimensionality is 2 (2D object) 4 2 6 5 3 1 (2, 3)
  9. Subsetting

  10. Enter the matrix. 4 2 6 5 3 1 x

  11. Column selection 4 2 6 5 3 1 x x[,

    1:2] 3 1 4 2
  12. One column? 4 2 6 5 3 1 x x[,

    1] ?
  13. One column? 4 2 6 5 3 1 x x[,

    1] 1 2
  14. Oh! Let me fix that for you… 4 2 6

    5 3 1 x x[, 1, drop = FALSE] 2 1
  15. Let’s go 3D 4 2 6 5 3 1 y

    10 8 12 11 9 7 y[, 1:2] ?
  16. Let’s go 3D 4 2 6 5 3 1 y

    10 8 12 11 9 7 y[, 1:2] Error: incorrect number of dimensions
  17. Let’s go 3D 4 2 6 5 3 1 y

    10 8 12 11 9 7 y[, 1:2] Error: incorrect number of dimensions http://gph.is/1kA5eNi
  18. Let’s go 3D 4 2 6 5 3 1 y

    10 8 12 11 9 7 y[, 1:2,] 1 3 2 4 7 8 9 10
  19. One column? 4 2 6 5 3 1 y 10

    8 12 11 9 7 y[, 1,] ?
  20. One column? 4 2 6 5 3 1 y 10

    8 12 11 9 7 y[, 1,] 1 7 2 8
  21. One column? 4 2 6 5 3 1 y 10

    8 12 11 9 7 y[, 1,] 1 7 2 8
  22. One column? 4 2 6 5 3 1 y 10

    8 12 11 9 7 y[, 1,] 1 7 2 8
  23. 4 2 6 5 3 1 y 10 8 12

    11 9 7 y[, 1, , drop = FALSE] 1 2 7 8 Oh! Let me fix that for you…
  24. The confusion? Subsetting is not dimensionality-stable.

  25. Summary: column selection = How Many? Drops? 2D 3D Proposed

    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D
  26. Summary: column selection = How Many? Drops? 2D 3D Proposed

    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D
  27. Summary: column selection = How Many? Drops? 2D 3D Proposed

    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D
  28. Summary: column selection = How Many? Drops? 2D 3D Proposed

    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D
  29. Summary: column selection = How Many? Drops? 2D 3D Proposed

    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D
  30. rray

  31. rray is designed to provide a stricter array class.

  32. Create an rray library(rray) x "<- matrix(1:6, nrow = 2)

    x #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6 x_rray "<- as_rray(x) x_rray #> <vctrs_rray<integer>[,3][6]> #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6
  33. Column subsetting…round two 4 2 6 5 3 1 x_rray

    x_rray[, 1] 2 1 x_rray[, 1:2] 3 1 4 2 4 2 6 5 3 1 y_rray 10 8 12 11 9 7 2 1 y_rray[, 1] 7 8 4 2 3 1 y_rray[, 1:2] 7 10 8 9
  34. rray_extract() always drops to 1D 4 2 6 5 3

    1 y_rray 10 8 12 11 9 7 rray_extract(y_rray, , 1) 8 7 2 1
  35. Broadcasting

  36. Broadcasting has to do with increasing dimensionality and recycling dimensions.

  37. Let’s do some math 4 2 6 5 3 1

    x: (2, 3) 1 + = y: (1)
  38. Let’s do some math 4 2 6 5 3 1

    1 + = 5 3 7 6 4 2 (2, 3) x: (2, 3) y: (1)
  39. How is y reshaped so that this works?

  40. x: (2, 3) y: (1) ————————— (?, ?) Step 1

    - increase dimensionality Dimensionality of 2 Dimensionality of 1 Append 1’s to the dimensionality of y until it matches the dimensionality of x x: (2, 3) y: (1, 1) ————————— (?, ?)
  41. Step 1 - increase dimensionality 4 2 6 5 3

    1 x: (2, 3) 1 + = y: (1) 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1)
  42. Step 2 - recycle dimensions If the rows of y

    were recycled to length 2, it would match the length of the rows of x x: (2, 3) y: (2, 1) ————————— (2, ?) x: (2, 3) y: (1, 1) ————————— (?, ?)
  43. Step 2 - recycle dimensions 4 2 6 5 3

    1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1)
  44. Step 2 - recycle dimensions If the columns of y

    were recycled to length 3, it would match the length of the columns of x x: (2, 3) y: (2, 3) ————————— (2, 3) x: (2, 3) y: (2, 1) ————————— (2, ?)
  45. Step 2 - recycle dimensions 4 2 6 5 3

    1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 1 1 1 1 1 + y: (2, 3) =
  46. Step 2 - recycle dimensions 4 2 6 5 3

    1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 1 1 1 1 1 + y: (2, 3) = 5 3 7 6 4 2 (2, 3)
  47. What if we started here? 4 2 6 5 3

    1 x: (2, 3) 1 + = y: (1, 1)
  48. What if we started here? 4 2 6 5 3

    1 x: (2, 3) 1 + = y: (1, 1) Error: non-conformable arrays
  49. What if we started here? 4 2 6 5 3

    1 x: (2, 3) 1 + = y: (1, 1) Error: non-conformable arrays https://tenor.com/view/ronswanson-throw-computer-gif-9550833
  50. R doesn’t broadcast. We just got lucky that it worked

    with scalars.
  51. We know the result 4 2 6 5 3 1

    1 + = 5 3 7 6 4 2 (2, 3) x: (2, 3) y: (1, 1) 1 1 1 1 1 1 y: (2, 3)
  52. Match dimensionality by appending 1’s Match dimensions by recycling dimensions

    of length 1 Broadcasting rules:
  53. rray broadcasts library(rray) x "<- matrix(1:6, nrow = 2) x

    #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6 z "<- matrix(1) z #> [,1] #> [1,] 1 x + z #> Error in x + z : non-conformable arrays as_rray(x) + z #> <vctrs_rray<double>[,3][6]> #> [,1] [,2] [,3] #> [1,] 2 4 6 #> [2,] 3 5 7
  54. How? All hail our C++ overlords at QuantStack. Buy them

    a beer for creating xtensor.
  55. How? All hail our C++ overlords at QuantStack. Buy them

    a beer for creating xtensor.
  56. How? All hail our C++ overlords at QuantStack. Buy them

    a beer for creating xtensor.
  57. Let’s go 3D 1 3 2 + = y: (3,

    1) 1 2 3 x: (1, 3, 2) 6 4 5
  58. Let’s go 3D 1 3 2 + = y: (3,

    1) 1 2 3 x: (1, 3, 2) 6 4 5 Can you even do that!"
  59. x: (1, 3, 2) y: (3, 1) ———————————— (?, ?,

    ?) Step 1 - increase dimensionality Dimensionality of 3 Dimensionality of 2 Append 1’s to the dimensionality of y until it matches the dimensionality of x x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?)
  60. Step 1 - increase dimensionality 1 3 2 + =

    y: (3, 1, 1) 1 2 3 x: (1, 3, 2) 6 4 5
  61. Step 2 - recycle dimensions x: (1, 3, 2) y:

    (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle
  62. Step 2 - recycle dimensions x: (1, 3, 2) y:

    (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle
  63. Step 2 - recycle dimensions x: (1, 3, 2) y:

    (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle
  64. Step 2 - recycle dimensions x: (1, 3, 2) y:

    (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle
  65. Step 2 - recycle dimensions 1 2 3 2 1

    3 1 3 2 + = y: (3, 3, 2) 1 3 2 3 1 2 1 2 3 x: (3, 3, 2) 6 4 5 5 6 4 6 4 5 1 2 3 2 1 3 1 3 2
  66. Step 2 - recycle dimensions 1 2 3 2 1

    3 1 3 2 + = y: (3, 3, 2) 1 3 2 3 1 2 1 2 3 x: (3, 3, 2) 6 4 5 5 6 4 6 4 5 1 2 3 2 1 3 1 3 2 4 6 5 5 3 4 2 3 4 9 7 8 7 8 6 7 5 6 (3, 3, 2)
  67. Manipulation

  68. rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max()

    rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …
  69. The best part? They all work with base R.

  70. The best part? They all work with base R.

  71. rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max()

    rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …
  72. rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max()

    rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …
  73. 1 2 4 3 How can we bind these together?

  74. cbind( , ) 1 2 4 3

  75. cbind( , ) 1 2 4 3 Error: number of

    rows of matrices must match
  76. rbind( , ) 1 2 4 3

  77. rbind( , ) 1 2 4 3 Error: number of

    columns of matrices must match
  78. 1 2 4 3 rray_bind( , , axis = 1)

    3 4 2 1 1 2
  79. 1 2 4 3 4 4 2 1 3 3

    rray_bind( , , axis = 2)
  80. 1 2 4 3 rray_bind( , , axis = 3)

    1 1 2 2 3 4 4 3
  81. rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max()

    rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …
  82. What if we want to “normalize” by dividing by the

    max value? Along columns? Along rows? 4 2 6 5 3 1 x
  83. x / max(x) sweep(x, 1, apply(x, 1, max), “/") sweep(x,

    2, apply(x, 2, max), “/") 4 2 6 5 3 1 4 2 6 5 3 1 4 2 6 5 3 1 .667 .333 1 .833 .500 .167 .667 .333 1 1 .600 .200 1 1 1 .833 .750 .500
  84. x / rray_max(x) x / rray_max(x, axes = 2) x

    / rray_max(x, axes = 1) 4 2 6 5 3 1 4 2 6 5 3 1 4 2 6 5 3 1 .667 .333 1 .833 .500 .167 .667 .333 1 1 .600 .200 1 1 1 .833 .750 .500
  85. rray_max(x, axes = 1) 4 2 6 5 3 1

    6 4 2
  86. x / rray_max(x, axes = 1) 4 2 6 5

    3 1 / 4 2 6 6 4 2 rray_max(x, axes = 1) 4 2 6 5 3 1 6 4 2
  87. x / rray_max(x, axes = 1) 4 2 6 5

    3 1 / 4 2 6 6 4 2 1 1 1 .833 .750 .500 rray_max(x, axes = 1) 4 2 6 5 3 1 6 4 2
  88. Arrays are: 1. frustrating to work with. 2. difficult to

    program around. 3. underpowered.
  89. Arrays are: 1. intuitive to work with. 2. predictable to

    program around. 3. powerful.
  90. GitHub https://github.com/DavisVaughan/rray Website https://davisvaughan.github.io/rray/