Davis Vaughan
April 18, 2019
1.8k

# Rethinking Arrays in R

April 18, 2019

## Transcript

1. Rethinking Arrays in R
Davis Vaughan
@dvaughan32
Software Engineer, RStudio
April 2019

2. Array manipulation in R is
inconsistent and doesn’t follow
natural intuition.

3. Arrays are:
1. frustrating to work with.
2. difﬁcult to program around.
3. underpowered.

4. Subsetting
Manipulation

5. dimensionality:
The number of dimensions in an
array.
dimensions:
The set of lengths describing the
shape of the array.

6. dimensionality VS dimensions
Number of 1st dim elements (rows)
Number of 2nd dim elements (columns)
4
2 6
5
3
1
(2, 3)

7. dimensionality VS dimensions
Number of 1st dim elements (rows)
Number of 2nd dim elements (columns)
The entire set makes up the dimensions
4
2 6
5
3
1
(2, 3)

8. dimensionality VS dimensions
Number of 1st dim elements (rows)
Number of 2nd dim elements (columns)
The entire set makes up the dimensions
The dimensionality is 2
(2D object)
4
2 6
5
3
1
(2, 3)

9. Subsetting

10. Enter the matrix.
4
2 6
5
3
1
x

11. Column selection
4
2 6
5
3
1
x x[, 1:2]
3
1
4
2

12. One column?
4
2 6
5
3
1
x x[, 1]
?

13. One column?
4
2 6
5
3
1
x x[, 1]
1 2

14. Oh! Let me ﬁx that for you…
4
2 6
5
3
1
x x[, 1, drop = FALSE]
2
1

15. Let’s go 3D
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1:2]
?

16. Let’s go 3D
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1:2]
Error:
incorrect number
of dimensions

17. Let’s go 3D
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1:2]
Error:
incorrect number
of dimensions
http://gph.is/1kA5eNi

18. Let’s go 3D
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1:2,]
1 3
2 4
7
8
9
10

19. One column?
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1,]
?

20. One column?
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1,]
1 7
2 8

21. One column?
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1,]
1 7
2 8

22. One column?
4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1,]
1 7
2 8

23. 4
2 6
5
3
1
y
10
8 12
11
9
7
y[, 1, , drop = FALSE]
1
2
7
8
Oh! Let me ﬁx that for you…

24. The confusion?
Subsetting is not
dimensionality-stable.

25. Summary: column selection =
How
Many?
Drops? 2D 3D Proposed
1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
>1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
>1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
* Drops to 2D

26. Summary: column selection =
How
Many?
Drops? 2D 3D Proposed
1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
>1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
>1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
* Drops to 2D

27. Summary: column selection =
How
Many?
Drops? 2D 3D Proposed
1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
>1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
>1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
* Drops to 2D

28. Summary: column selection =
How
Many?
Drops? 2D 3D Proposed
1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
>1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
>1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
* Drops to 2D

29. Summary: column selection =
How
Many?
Drops? 2D 3D Proposed
1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1]
>1 No x[, 1:2] x[, 1:2, ] x[, 1:2]
1 Yes x[, 1] x[, 1, ]* extract(x, , 1)
>1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2)
* Drops to 2D

30. rray

31. rray is designed to provide a
stricter array class.

32. Create an rray
library(rray)
x "<- matrix(1:6, nrow = 2)
x
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
x_rray "<- as_rray(x)
x_rray
#> [,3][6]>
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6

33. Column subsetting…round two
4
2 6
5
3
1
x_rray x_rray[, 1]
2
1
x_rray[, 1:2]
3
1
4
2
4
2 6
5
3
1
y_rray
10
8 12
11
9
7
2
1
y_rray[, 1]
7
8
4
2
3
1
y_rray[, 1:2]
7
10
8
9

34. rray_extract() always drops to 1D
4
2 6
5
3
1
y_rray
10
8 12
11
9
7
rray_extract(y_rray, , 1)
8
7
2
1

36. Broadcasting has to do with
increasing dimensionality and
recycling dimensions.

37. Let’s do some math
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1)

38. Let’s do some math
4
2 6
5
3
1
1
+ =
5
3 7
6
4
2
(2, 3)
x: (2, 3) y: (1)

39. How is y reshaped
so that this works?

40. x: (2, 3)
y: (1)
—————————
(?, ?)
Step 1 - increase dimensionality
Dimensionality of 2
Dimensionality of 1
Append 1’s to the dimensionality of y
until it matches the dimensionality of x
x: (2, 3)
y: (1, 1)
—————————
(?, ?)

41. Step 1 - increase dimensionality
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1)
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1, 1)

42. Step 2 - recycle dimensions
If the rows of y were recycled to length 2, it would
match the length of the rows of x
x: (2, 3)
y: (2, 1)
—————————
(2, ?)
x: (2, 3)
y: (1, 1)
—————————
(?, ?)

43. Step 2 - recycle dimensions
4
2 6
5
3
1
x: (2, 3)
1
1
+ =
y: (2, 1)
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1, 1)

44. Step 2 - recycle dimensions
If the columns of y were recycled to length 3, it would
match the length of the columns of x
x: (2, 3)
y: (2, 3)
—————————
(2, 3)
x: (2, 3)
y: (2, 1)
—————————
(2, ?)

45. Step 2 - recycle dimensions
4
2 6
5
3
1
x: (2, 3)
1
1
+ =
y: (2, 1)
4
2 6
5
3
1
x: (2, 3)
1
1
1
1
1
1
+
y: (2, 3)
=

46. Step 2 - recycle dimensions
4
2 6
5
3
1
x: (2, 3)
1
1
+ =
y: (2, 1)
4
2 6
5
3
1
x: (2, 3)
1
1
1
1
1
1
+
y: (2, 3)
=
5
3 7
6
4
2
(2, 3)

47. What if we started here?
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1, 1)

48. What if we started here?
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1, 1)
Error:
non-conformable
arrays

49. What if we started here?
4
2 6
5
3
1
x: (2, 3)
1
+ =
y: (1, 1)
Error:
non-conformable
arrays
https://tenor.com/view/ronswanson-throw-computer-gif-9550833

We just got lucky that it worked
with scalars.

51. We know the result
4
2 6
5
3
1
1
+ =
5
3 7
6
4
2
(2, 3)
x: (2, 3) y: (1, 1)
1
1
1
1
1
1
y: (2, 3)

52. Match dimensionality by appending 1’s
Match dimensions by recycling dimensions of length 1

library(rray)
x "<- matrix(1:6, nrow = 2)
x
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
z "<- matrix(1)
z
#> [,1]
#> [1,] 1
x + z
#> Error in x + z : non-conformable arrays
as_rray(x) + z
#> [,3][6]>
#> [,1] [,2] [,3]
#> [1,] 2 4 6
#> [2,] 3 5 7

54. How?
All hail our C++ overlords at
QuantStack.
Buy them a beer for creating
xtensor.

55. How?
All hail our C++ overlords at
QuantStack.
Buy them a beer for creating
xtensor.

56. How?
All hail our C++ overlords at
QuantStack.
Buy them a beer for creating
xtensor.

57. Let’s go 3D
1
3
2
+ =
y: (3, 1)
1 2 3
x: (1, 3, 2)
6
4 5

58. Let’s go 3D
1
3
2
+ =
y: (3, 1)
1 2 3
x: (1, 3, 2)
6
4 5
Can you even
do that!"

59. x: (1, 3, 2)
y: (3, 1)
————————————
(?, ?, ?)
Step 1 - increase dimensionality
Dimensionality of 3
Dimensionality of 2
Append 1’s to the dimensionality of y
until it matches the dimensionality of x
x: (1, 3, 2)
y: (3, 1, 1)
————————————
(?, ?, ?)

60. Step 1 - increase dimensionality
1
3
2
+ =
y: (3, 1, 1)
1 2 3
x: (1, 3, 2)
6
4 5

61. Step 2 - recycle dimensions
x: (1, 3, 2)
y: (3, 1, 1)
————————————
(?, ?, ?)
x: (3, 3, 2)
y: (3, 3, 2)
————————————
(3, 3, 2)
recycle

62. Step 2 - recycle dimensions
x: (1, 3, 2)
y: (3, 1, 1)
————————————
(?, ?, ?)
x: (3, 3, 2)
y: (3, 3, 2)
————————————
(3, 3, 2)
recycle

63. Step 2 - recycle dimensions
x: (1, 3, 2)
y: (3, 1, 1)
————————————
(?, ?, ?)
x: (3, 3, 2)
y: (3, 3, 2)
————————————
(3, 3, 2)
recycle

64. Step 2 - recycle dimensions
x: (1, 3, 2)
y: (3, 1, 1)
————————————
(?, ?, ?)
x: (3, 3, 2)
y: (3, 3, 2)
————————————
(3, 3, 2)
recycle

65. Step 2 - recycle dimensions
1
2
3
2
1
3
1
3
2
+ =
y: (3, 3, 2)
1 3
2
3
1 2
1 2 3
x: (3, 3, 2)
6
4 5
5 6
4
6
4 5 1
2
3
2
1
3
1
3
2

66. Step 2 - recycle dimensions
1
2
3
2
1
3
1
3
2
+ =
y: (3, 3, 2)
1 3
2
3
1 2
1 2 3
x: (3, 3, 2)
6
4 5
5 6
4
6
4 5 1
2
3
2
1
3
1
3
2
4 6
5
5
3 4
2 3 4
9
7 8
7 8
6
7
5 6
(3, 3, 2)

67. Manipulation

68. rray as a toolkit
rray_bind()
rray_duplicate_any()
rray_expand_dims()
rray_flip()
rray_max()
rray_sum()
rray_mean()
rray_reshape()
rray_rotate()
rray_split()
rray_tile()
rray_unique()

69. The best part?
They all work with base R.

70. The best part?
They all work with base R.

71. rray as a toolkit
rray_bind()
rray_duplicate_any()
rray_expand_dims()
rray_flip()
rray_max()
rray_sum()
rray_mean()
rray_reshape()
rray_rotate()
rray_split()
rray_tile()
rray_unique()

72. rray as a toolkit
rray_bind()
rray_duplicate_any()
rray_expand_dims()
rray_flip()
rray_max()
rray_sum()
rray_mean()
rray_reshape()
rray_rotate()
rray_split()
rray_tile()
rray_unique()

73. 1
2
4
3
How can we bind these together?

74. cbind( , )
1
2
4
3

75. cbind( , )
1
2
4
3
Error:
number of rows
of matrices must match

76. rbind( , )
1
2
4
3

77. rbind( , )
1
2
4
3
Error:
number of columns
of matrices must match

78. 1
2
4
3
rray_bind( , , axis = 1)
3 4
2
1 1
2

79. 1
2
4
3
4
4
2
1
3
3
rray_bind( , , axis = 2)

80. 1
2
4
3
rray_bind( , , axis = 3)
1
1
2 2
3 4
4
3

81. rray as a toolkit
rray_bind()
rray_duplicate_any()
rray_expand_dims()
rray_flip()
rray_max()
rray_sum()
rray_mean()
rray_reshape()
rray_rotate()
rray_split()
rray_tile()
rray_unique()

82. What if we want to “normalize”
by dividing by the max value?
Along columns? Along rows?
4
2 6
5
3
1
x

83. x / max(x)
sweep(x, 1, apply(x, 1, max), “/")
sweep(x, 2, apply(x, 2, max), “/")
4
2 6
5
3
1
4
2 6
5
3
1
4
2 6
5
3
1
.667
.333 1
.833
.500
.167
.667
.333 1
1
.600
.200
1
1 1
.833
.750
.500

84. x / rray_max(x)
x / rray_max(x, axes = 2)
x / rray_max(x, axes = 1)
4
2 6
5
3
1
4
2 6
5
3
1
4
2 6
5
3
1
.667
.333 1
.833
.500
.167
.667
.333 1
1
.600
.200
1
1 1
.833
.750
.500

85. rray_max(x, axes = 1)
4
2 6
5
3
1 6
4
2

86. x / rray_max(x, axes = 1)
4
2 6
5
3
1
/
4
2 6
6
4
2
rray_max(x, axes = 1)
4
2 6
5
3
1 6
4
2

87. x / rray_max(x, axes = 1)
4
2 6
5
3
1
/
4
2 6
6
4
2
1
1 1
.833
.750
.500
rray_max(x, axes = 1)
4
2 6
5
3
1 6
4
2

88. Arrays are:
1. frustrating to work with.
2. difﬁcult to program around.
3. underpowered.

89. Arrays are:
1. intuitive to work with.
2. predictable to program around.
3. powerful.

90. GitHub
https://github.com/DavisVaughan/rray
Website
https://davisvaughan.github.io/rray/