Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking Arrays in R

Davis Vaughan
April 18, 2019
1.7k

Rethinking Arrays in R

Davis Vaughan

April 18, 2019
Tweet

Transcript

  1. Rethinking Arrays in R
    Davis Vaughan
    @dvaughan32
    Software Engineer, RStudio
    April 2019

    View Slide

  2. Array manipulation in R is
    inconsistent and doesn’t follow
    natural intuition.

    View Slide

  3. Arrays are:
    1. frustrating to work with.
    2. difficult to program around.
    3. underpowered.

    View Slide

  4. Subsetting
    Broadcasting
    Manipulation

    View Slide

  5. dimensionality:
    The number of dimensions in an
    array.
    dimensions:
    The set of lengths describing the
    shape of the array.

    View Slide

  6. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    4
    2 6
    5
    3
    1
    (2, 3)

    View Slide

  7. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    The entire set makes up the dimensions
    4
    2 6
    5
    3
    1
    (2, 3)

    View Slide

  8. dimensionality VS dimensions
    Number of 1st dim elements (rows)
    Number of 2nd dim elements (columns)
    The entire set makes up the dimensions
    The dimensionality is 2
    (2D object)
    4
    2 6
    5
    3
    1
    (2, 3)

    View Slide

  9. Subsetting

    View Slide

  10. Enter the matrix.
    4
    2 6
    5
    3
    1
    x

    View Slide

  11. Column selection
    4
    2 6
    5
    3
    1
    x x[, 1:2]
    3
    1
    4
    2

    View Slide

  12. One column?
    4
    2 6
    5
    3
    1
    x x[, 1]
    ?

    View Slide

  13. One column?
    4
    2 6
    5
    3
    1
    x x[, 1]
    1 2

    View Slide

  14. Oh! Let me fix that for you…
    4
    2 6
    5
    3
    1
    x x[, 1, drop = FALSE]
    2
    1

    View Slide

  15. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    ?

    View Slide

  16. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    Error:
    incorrect number
    of dimensions

    View Slide

  17. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2]
    Error:
    incorrect number
    of dimensions
    http://gph.is/1kA5eNi

    View Slide

  18. Let’s go 3D
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1:2,]
    1 3
    2 4
    7
    8
    9
    10

    View Slide

  19. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    ?

    View Slide

  20. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View Slide

  21. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View Slide

  22. One column?
    4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1,]
    1 7
    2 8

    View Slide

  23. 4
    2 6
    5
    3
    1
    y
    10
    8 12
    11
    9
    7
    y[, 1, , drop = FALSE]
    1
    2
    7
    8
    Oh! Let me fix that for you…

    View Slide

  24. The confusion?
    Subsetting is not
    dimensionality-stable.

    View Slide

  25. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View Slide

  26. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View Slide

  27. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View Slide

  28. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View Slide

  29. Summary: column selection =
    How
    Many?
    Drops? 2D 3D Proposed
    1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
    >1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
    1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
    >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
    * Drops to 2D

    View Slide

  30. rray

    View Slide

  31. rray is designed to provide a
    stricter array class.

    View Slide

  32. Create an rray
    library(rray)
    x "x
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6
    x_rray "x_rray
    #> [,3][6]>
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6

    View Slide

  33. Column subsetting…round two
    4
    2 6
    5
    3
    1
    x_rray x_rray[, 1]
    2
    1
    x_rray[, 1:2]
    3
    1
    4
    2
    4
    2 6
    5
    3
    1
    y_rray
    10
    8 12
    11
    9
    7
    2
    1
    y_rray[, 1]
    7
    8
    4
    2
    3
    1
    y_rray[, 1:2]
    7
    10
    8
    9

    View Slide

  34. rray_extract() always drops to 1D
    4
    2 6
    5
    3
    1
    y_rray
    10
    8 12
    11
    9
    7
    rray_extract(y_rray, , 1)
    8
    7
    2
    1

    View Slide

  35. Broadcasting

    View Slide

  36. Broadcasting has to do with
    increasing dimensionality and
    recycling dimensions.

    View Slide

  37. Let’s do some math
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1)

    View Slide

  38. Let’s do some math
    4
    2 6
    5
    3
    1
    1
    + =
    5
    3 7
    6
    4
    2
    (2, 3)
    x: (2, 3) y: (1)

    View Slide

  39. How is y reshaped
    so that this works?

    View Slide

  40. x: (2, 3)
    y: (1)
    —————————
    (?, ?)
    Step 1 - increase dimensionality
    Dimensionality of 2
    Dimensionality of 1
    Append 1’s to the dimensionality of y
    until it matches the dimensionality of x
    x: (2, 3)
    y: (1, 1)
    —————————
    (?, ?)

    View Slide

  41. Step 1 - increase dimensionality
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View Slide

  42. Step 2 - recycle dimensions
    If the rows of y were recycled to length 2, it would
    match the length of the rows of x
    x: (2, 3)
    y: (2, 1)
    —————————
    (2, ?)
    x: (2, 3)
    y: (1, 1)
    —————————
    (?, ?)

    View Slide

  43. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View Slide

  44. Step 2 - recycle dimensions
    If the columns of y were recycled to length 3, it would
    match the length of the columns of x
    x: (2, 3)
    y: (2, 3)
    —————————
    (2, 3)
    x: (2, 3)
    y: (2, 1)
    —————————
    (2, ?)

    View Slide

  45. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    1
    1
    1
    1
    +
    y: (2, 3)
    =

    View Slide

  46. Step 2 - recycle dimensions
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    + =
    y: (2, 1)
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    1
    1
    1
    1
    1
    +
    y: (2, 3)
    =
    5
    3 7
    6
    4
    2
    (2, 3)

    View Slide

  47. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)

    View Slide

  48. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)
    Error:
    non-conformable
    arrays

    View Slide

  49. What if we started here?
    4
    2 6
    5
    3
    1
    x: (2, 3)
    1
    + =
    y: (1, 1)
    Error:
    non-conformable
    arrays
    https://tenor.com/view/ronswanson-throw-computer-gif-9550833

    View Slide

  50. R doesn’t broadcast.
    We just got lucky that it worked
    with scalars.

    View Slide

  51. We know the result
    4
    2 6
    5
    3
    1
    1
    + =
    5
    3 7
    6
    4
    2
    (2, 3)
    x: (2, 3) y: (1, 1)
    1
    1
    1
    1
    1
    1
    y: (2, 3)

    View Slide

  52. Match dimensionality by appending 1’s
    Match dimensions by recycling dimensions of length 1
    Broadcasting rules:

    View Slide

  53. rray broadcasts
    library(rray)
    x "x
    #> [,1] [,2] [,3]
    #> [1,] 1 3 5
    #> [2,] 2 4 6
    z "z
    #> [,1]
    #> [1,] 1
    x + z
    #> Error in x + z : non-conformable arrays
    as_rray(x) + z
    #> [,3][6]>
    #> [,1] [,2] [,3]
    #> [1,] 2 4 6
    #> [2,] 3 5 7

    View Slide

  54. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View Slide

  55. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View Slide

  56. How?
    All hail our C++ overlords at
    QuantStack.
    Buy them a beer for creating
    xtensor.

    View Slide

  57. Let’s go 3D
    1
    3
    2
    + =
    y: (3, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5

    View Slide

  58. Let’s go 3D
    1
    3
    2
    + =
    y: (3, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5
    Can you even
    do that!"

    View Slide

  59. x: (1, 3, 2)
    y: (3, 1)
    ————————————
    (?, ?, ?)
    Step 1 - increase dimensionality
    Dimensionality of 3
    Dimensionality of 2
    Append 1’s to the dimensionality of y
    until it matches the dimensionality of x
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)

    View Slide

  60. Step 1 - increase dimensionality
    1
    3
    2
    + =
    y: (3, 1, 1)
    1 2 3
    x: (1, 3, 2)
    6
    4 5

    View Slide

  61. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View Slide

  62. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View Slide

  63. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View Slide

  64. Step 2 - recycle dimensions
    x: (1, 3, 2)
    y: (3, 1, 1)
    ————————————
    (?, ?, ?)
    x: (3, 3, 2)
    y: (3, 3, 2)
    ————————————
    (3, 3, 2)
    recycle

    View Slide

  65. Step 2 - recycle dimensions
    1
    2
    3
    2
    1
    3
    1
    3
    2
    + =
    y: (3, 3, 2)
    1 3
    2
    3
    1 2
    1 2 3
    x: (3, 3, 2)
    6
    4 5
    5 6
    4
    6
    4 5 1
    2
    3
    2
    1
    3
    1
    3
    2

    View Slide

  66. Step 2 - recycle dimensions
    1
    2
    3
    2
    1
    3
    1
    3
    2
    + =
    y: (3, 3, 2)
    1 3
    2
    3
    1 2
    1 2 3
    x: (3, 3, 2)
    6
    4 5
    5 6
    4
    6
    4 5 1
    2
    3
    2
    1
    3
    1
    3
    2
    4 6
    5
    5
    3 4
    2 3 4
    9
    7 8
    7 8
    6
    7
    5 6
    (3, 3, 2)

    View Slide

  67. Manipulation

    View Slide

  68. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View Slide

  69. The best part?
    They all work with base R.

    View Slide

  70. The best part?
    They all work with base R.

    View Slide

  71. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View Slide

  72. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View Slide

  73. 1
    2
    4
    3
    How can we bind these together?

    View Slide

  74. cbind( , )
    1
    2
    4
    3

    View Slide

  75. cbind( , )
    1
    2
    4
    3
    Error:
    number of rows
    of matrices must match

    View Slide

  76. rbind( , )
    1
    2
    4
    3

    View Slide

  77. rbind( , )
    1
    2
    4
    3
    Error:
    number of columns
    of matrices must match

    View Slide

  78. 1
    2
    4
    3
    rray_bind( , , axis = 1)
    3 4
    2
    1 1
    2

    View Slide

  79. 1
    2
    4
    3
    4
    4
    2
    1
    3
    3
    rray_bind( , , axis = 2)

    View Slide

  80. 1
    2
    4
    3
    rray_bind( , , axis = 3)
    1
    1
    2 2
    3 4
    4
    3

    View Slide

  81. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()

    View Slide

  82. What if we want to “normalize”
    by dividing by the max value?
    Along columns? Along rows?
    4
    2 6
    5
    3
    1
    x

    View Slide

  83. x / max(x)
    sweep(x, 1, apply(x, 1, max), “/")
    sweep(x, 2, apply(x, 2, max), “/")
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    .667
    .333 1
    .833
    .500
    .167
    .667
    .333 1
    1
    .600
    .200
    1
    1 1
    .833
    .750
    .500

    View Slide

  84. x / rray_max(x)
    x / rray_max(x, axes = 2)
    x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    4
    2 6
    5
    3
    1
    .667
    .333 1
    .833
    .500
    .167
    .667
    .333 1
    1
    .600
    .200
    1
    1 1
    .833
    .750
    .500

    View Slide

  85. rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View Slide

  86. x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    /
    4
    2 6
    6
    4
    2
    rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View Slide

  87. x / rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1
    /
    4
    2 6
    6
    4
    2
    1
    1 1
    .833
    .750
    .500
    rray_max(x, axes = 1)
    4
    2 6
    5
    3
    1 6
    4
    2

    View Slide

  88. Arrays are:
    1. frustrating to work with.
    2. difficult to program around.
    3. underpowered.

    View Slide

  89. Arrays are:
    1. intuitive to work with.
    2. predictable to program around.
    3. powerful.

    View Slide

  90. GitHub
    https://github.com/DavisVaughan/rray
    Website
    https://davisvaughan.github.io/rray/

    View Slide