Slide 1

Slide 1 text

Rethinking Arrays in R Davis Vaughan @dvaughan32 Software Engineer, RStudio April 2019

Slide 2

Slide 2 text

Array manipulation in R is inconsistent and doesn’t follow natural intuition.

Slide 3

Slide 3 text

Arrays are: 1. frustrating to work with. 2. difficult to program around. 3. underpowered.

Slide 4

Slide 4 text

Subsetting Broadcasting Manipulation

Slide 5

Slide 5 text

dimensionality: The number of dimensions in an array. dimensions: The set of lengths describing the shape of the array.

Slide 6

Slide 6 text

dimensionality VS dimensions Number of 1st dim elements (rows) Number of 2nd dim elements (columns) 4 2 6 5 3 1 (2, 3)

Slide 7

Slide 7 text

dimensionality VS dimensions Number of 1st dim elements (rows) Number of 2nd dim elements (columns) The entire set makes up the dimensions 4 2 6 5 3 1 (2, 3)

Slide 8

Slide 8 text

dimensionality VS dimensions Number of 1st dim elements (rows) Number of 2nd dim elements (columns) The entire set makes up the dimensions The dimensionality is 2 (2D object) 4 2 6 5 3 1 (2, 3)

Slide 9

Slide 9 text

Subsetting

Slide 10

Slide 10 text

Enter the matrix. 4 2 6 5 3 1 x

Slide 11

Slide 11 text

Column selection 4 2 6 5 3 1 x x[, 1:2] 3 1 4 2

Slide 12

Slide 12 text

One column? 4 2 6 5 3 1 x x[, 1] ?

Slide 13

Slide 13 text

One column? 4 2 6 5 3 1 x x[, 1] 1 2

Slide 14

Slide 14 text

Oh! Let me fix that for you… 4 2 6 5 3 1 x x[, 1, drop = FALSE] 2 1

Slide 15

Slide 15 text

Let’s go 3D 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1:2] ?

Slide 16

Slide 16 text

Let’s go 3D 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1:2] Error: incorrect number of dimensions

Slide 17

Slide 17 text

Let’s go 3D 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1:2] Error: incorrect number of dimensions http://gph.is/1kA5eNi

Slide 18

Slide 18 text

Let’s go 3D 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1:2,] 1 3 2 4 7 8 9 10

Slide 19

Slide 19 text

One column? 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1,] ?

Slide 20

Slide 20 text

One column? 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1,] 1 7 2 8

Slide 21

Slide 21 text

One column? 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1,] 1 7 2 8

Slide 22

Slide 22 text

One column? 4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1,] 1 7 2 8

Slide 23

Slide 23 text

4 2 6 5 3 1 y 10 8 12 11 9 7 y[, 1, , drop = FALSE] 1 2 7 8 Oh! Let me fix that for you…

Slide 24

Slide 24 text

The confusion? Subsetting is not dimensionality-stable.

Slide 25

Slide 25 text

Summary: column selection = How Many? Drops? 2D 3D Proposed 1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D

Slide 26

Slide 26 text

Summary: column selection = How Many? Drops? 2D 3D Proposed 1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D

Slide 27

Slide 27 text

Summary: column selection = How Many? Drops? 2D 3D Proposed 1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D

Slide 28

Slide 28 text

Summary: column selection = How Many? Drops? 2D 3D Proposed 1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D

Slide 29

Slide 29 text

Summary: column selection = How Many? Drops? 2D 3D Proposed 1 No x[, 1, drop = F] x[, 1, , drop = F] x[, 1] >1 No x[, 1:2] x[, 1:2, ] x[, 1:2] 1 Yes x[, 1] x[, 1, ]* extract(x, , 1) >1 Yes x[, 1:2, drop = T] x[, 1:2, , drop = T] extract(x, , 1:2) * Drops to 2D

Slide 30

Slide 30 text

rray

Slide 31

Slide 31 text

rray is designed to provide a stricter array class.

Slide 32

Slide 32 text

Create an rray library(rray) x "<- matrix(1:6, nrow = 2) x #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6 x_rray "<- as_rray(x) x_rray #> [,3][6]> #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6

Slide 33

Slide 33 text

Column subsetting…round two 4 2 6 5 3 1 x_rray x_rray[, 1] 2 1 x_rray[, 1:2] 3 1 4 2 4 2 6 5 3 1 y_rray 10 8 12 11 9 7 2 1 y_rray[, 1] 7 8 4 2 3 1 y_rray[, 1:2] 7 10 8 9

Slide 34

Slide 34 text

rray_extract() always drops to 1D 4 2 6 5 3 1 y_rray 10 8 12 11 9 7 rray_extract(y_rray, , 1) 8 7 2 1

Slide 35

Slide 35 text

Broadcasting

Slide 36

Slide 36 text

Broadcasting has to do with increasing dimensionality and recycling dimensions.

Slide 37

Slide 37 text

Let’s do some math 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1)

Slide 38

Slide 38 text

Let’s do some math 4 2 6 5 3 1 1 + = 5 3 7 6 4 2 (2, 3) x: (2, 3) y: (1)

Slide 39

Slide 39 text

How is y reshaped so that this works?

Slide 40

Slide 40 text

x: (2, 3) y: (1) ————————— (?, ?) Step 1 - increase dimensionality Dimensionality of 2 Dimensionality of 1 Append 1’s to the dimensionality of y until it matches the dimensionality of x x: (2, 3) y: (1, 1) ————————— (?, ?)

Slide 41

Slide 41 text

Step 1 - increase dimensionality 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1) 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1)

Slide 42

Slide 42 text

Step 2 - recycle dimensions If the rows of y were recycled to length 2, it would match the length of the rows of x x: (2, 3) y: (2, 1) ————————— (2, ?) x: (2, 3) y: (1, 1) ————————— (?, ?)

Slide 43

Slide 43 text

Step 2 - recycle dimensions 4 2 6 5 3 1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1)

Slide 44

Slide 44 text

Step 2 - recycle dimensions If the columns of y were recycled to length 3, it would match the length of the columns of x x: (2, 3) y: (2, 3) ————————— (2, 3) x: (2, 3) y: (2, 1) ————————— (2, ?)

Slide 45

Slide 45 text

Step 2 - recycle dimensions 4 2 6 5 3 1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 1 1 1 1 1 + y: (2, 3) =

Slide 46

Slide 46 text

Step 2 - recycle dimensions 4 2 6 5 3 1 x: (2, 3) 1 1 + = y: (2, 1) 4 2 6 5 3 1 x: (2, 3) 1 1 1 1 1 1 + y: (2, 3) = 5 3 7 6 4 2 (2, 3)

Slide 47

Slide 47 text

What if we started here? 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1)

Slide 48

Slide 48 text

What if we started here? 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1) Error: non-conformable arrays

Slide 49

Slide 49 text

What if we started here? 4 2 6 5 3 1 x: (2, 3) 1 + = y: (1, 1) Error: non-conformable arrays https://tenor.com/view/ronswanson-throw-computer-gif-9550833

Slide 50

Slide 50 text

R doesn’t broadcast. We just got lucky that it worked with scalars.

Slide 51

Slide 51 text

We know the result 4 2 6 5 3 1 1 + = 5 3 7 6 4 2 (2, 3) x: (2, 3) y: (1, 1) 1 1 1 1 1 1 y: (2, 3)

Slide 52

Slide 52 text

Match dimensionality by appending 1’s Match dimensions by recycling dimensions of length 1 Broadcasting rules:

Slide 53

Slide 53 text

rray broadcasts library(rray) x "<- matrix(1:6, nrow = 2) x #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6 z "<- matrix(1) z #> [,1] #> [1,] 1 x + z #> Error in x + z : non-conformable arrays as_rray(x) + z #> [,3][6]> #> [,1] [,2] [,3] #> [1,] 2 4 6 #> [2,] 3 5 7

Slide 54

Slide 54 text

How? All hail our C++ overlords at QuantStack. Buy them a beer for creating xtensor.

Slide 55

Slide 55 text

How? All hail our C++ overlords at QuantStack. Buy them a beer for creating xtensor.

Slide 56

Slide 56 text

How? All hail our C++ overlords at QuantStack. Buy them a beer for creating xtensor.

Slide 57

Slide 57 text

Let’s go 3D 1 3 2 + = y: (3, 1) 1 2 3 x: (1, 3, 2) 6 4 5

Slide 58

Slide 58 text

Let’s go 3D 1 3 2 + = y: (3, 1) 1 2 3 x: (1, 3, 2) 6 4 5 Can you even do that!"

Slide 59

Slide 59 text

x: (1, 3, 2) y: (3, 1) ———————————— (?, ?, ?) Step 1 - increase dimensionality Dimensionality of 3 Dimensionality of 2 Append 1’s to the dimensionality of y until it matches the dimensionality of x x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?)

Slide 60

Slide 60 text

Step 1 - increase dimensionality 1 3 2 + = y: (3, 1, 1) 1 2 3 x: (1, 3, 2) 6 4 5

Slide 61

Slide 61 text

Step 2 - recycle dimensions x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle

Slide 62

Slide 62 text

Step 2 - recycle dimensions x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle

Slide 63

Slide 63 text

Step 2 - recycle dimensions x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle

Slide 64

Slide 64 text

Step 2 - recycle dimensions x: (1, 3, 2) y: (3, 1, 1) ———————————— (?, ?, ?) x: (3, 3, 2) y: (3, 3, 2) ———————————— (3, 3, 2) recycle

Slide 65

Slide 65 text

Step 2 - recycle dimensions 1 2 3 2 1 3 1 3 2 + = y: (3, 3, 2) 1 3 2 3 1 2 1 2 3 x: (3, 3, 2) 6 4 5 5 6 4 6 4 5 1 2 3 2 1 3 1 3 2

Slide 66

Slide 66 text

Step 2 - recycle dimensions 1 2 3 2 1 3 1 3 2 + = y: (3, 3, 2) 1 3 2 3 1 2 1 2 3 x: (3, 3, 2) 6 4 5 5 6 4 6 4 5 1 2 3 2 1 3 1 3 2 4 6 5 5 3 4 2 3 4 9 7 8 7 8 6 7 5 6 (3, 3, 2)

Slide 67

Slide 67 text

Manipulation

Slide 68

Slide 68 text

rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max() rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …

Slide 69

Slide 69 text

The best part? They all work with base R.

Slide 70

Slide 70 text

The best part? They all work with base R.

Slide 71

Slide 71 text

rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max() rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …

Slide 72

Slide 72 text

rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max() rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …

Slide 73

Slide 73 text

1 2 4 3 How can we bind these together?

Slide 74

Slide 74 text

cbind( , ) 1 2 4 3

Slide 75

Slide 75 text

cbind( , ) 1 2 4 3 Error: number of rows of matrices must match

Slide 76

Slide 76 text

rbind( , ) 1 2 4 3

Slide 77

Slide 77 text

rbind( , ) 1 2 4 3 Error: number of columns of matrices must match

Slide 78

Slide 78 text

1 2 4 3 rray_bind( , , axis = 1) 3 4 2 1 1 2

Slide 79

Slide 79 text

1 2 4 3 4 4 2 1 3 3 rray_bind( , , axis = 2)

Slide 80

Slide 80 text

1 2 4 3 rray_bind( , , axis = 3) 1 1 2 2 3 4 4 3

Slide 81

Slide 81 text

rray as a toolkit rray_bind() rray_duplicate_any() rray_expand_dims() rray_broadcast() rray_flip() rray_max() rray_sum() rray_mean() rray_reshape() rray_rotate() rray_split() rray_tile() rray_unique() …

Slide 82

Slide 82 text

What if we want to “normalize” by dividing by the max value? Along columns? Along rows? 4 2 6 5 3 1 x

Slide 83

Slide 83 text

x / max(x) sweep(x, 1, apply(x, 1, max), “/") sweep(x, 2, apply(x, 2, max), “/") 4 2 6 5 3 1 4 2 6 5 3 1 4 2 6 5 3 1 .667 .333 1 .833 .500 .167 .667 .333 1 1 .600 .200 1 1 1 .833 .750 .500

Slide 84

Slide 84 text

x / rray_max(x) x / rray_max(x, axes = 2) x / rray_max(x, axes = 1) 4 2 6 5 3 1 4 2 6 5 3 1 4 2 6 5 3 1 .667 .333 1 .833 .500 .167 .667 .333 1 1 .600 .200 1 1 1 .833 .750 .500

Slide 85

Slide 85 text

rray_max(x, axes = 1) 4 2 6 5 3 1 6 4 2

Slide 86

Slide 86 text

x / rray_max(x, axes = 1) 4 2 6 5 3 1 / 4 2 6 6 4 2 rray_max(x, axes = 1) 4 2 6 5 3 1 6 4 2

Slide 87

Slide 87 text

x / rray_max(x, axes = 1) 4 2 6 5 3 1 / 4 2 6 6 4 2 1 1 1 .833 .750 .500 rray_max(x, axes = 1) 4 2 6 5 3 1 6 4 2

Slide 88

Slide 88 text

Arrays are: 1. frustrating to work with. 2. difficult to program around. 3. underpowered.

Slide 89

Slide 89 text

Arrays are: 1. intuitive to work with. 2. predictable to program around. 3. powerful.

Slide 90

Slide 90 text

GitHub https://github.com/DavisVaughan/rray Website https://davisvaughan.github.io/rray/