$30 off During Our Annual Pro Sale. View Details »

useR-2019-rray.pdf

Davis Vaughan
July 05, 2019
960

 useR-2019-rray.pdf

Davis Vaughan

July 05, 2019
Tweet

Transcript

  1. Rethinking Arrays in R
    Davis Vaughan
    @dvaughan32
    Software Engineer, RStudio
    July 2019

    View Slide

  2. peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8

    View Slide

  3. This is a
    (2, 3) matrix
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8

    View Slide

  4. peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8

    View Slide

  5. + =
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8 blue
    red
    green
    2
    3 1

    View Slide

  6. Error:
    non-conformable
    arrays
    + =
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8 blue
    red
    green
    2
    3 1

    View Slide

  7. peanut
    plain
    blue
    red
    green
    14
    11
    13
    18
    7
    9
    + =
    blue
    red
    green
    2
    3 1
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8

    View Slide

  8. Subsetting
    Broadcasting
    Manipulation

    View Slide

  9. Subsetting

    View Slide

  10. bag bag[,1:2]
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8
    red
    15 8
    green
    peanut
    plain
    10 6

    View Slide

  11. bag bag[,1:2]
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8
    red
    15 8
    green
    peanut
    plain
    10 6
    6
    8 12
    10
    blue
    15
    pb
    green
    9
    plain
    red
    ?
    bag bag[,1]

    View Slide

  12. 6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    peanut plain
    10 ]
    [ 15
    bag bag[,1]
    bag bag[,1:2]
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8
    red
    15 8
    green
    peanut
    plain
    10 6

    View Slide

  13. 6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    plain
    10
    peanut
    green
    15
    bag bag[, 1, drop = FALSE]
    bag bag[,1:2]
    peanut
    plain
    blue
    red
    green
    12
    9
    10
    15
    6
    8
    red
    15 8
    green
    peanut
    plain
    10 6

    View Slide

  14. bag1
    bag2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags
    10
    green
    plain
    15
    peanut
    3
    13
    plain peanut
    bags[, 1, , drop = FALSE]
    5
    6 15
    3
    13
    peanut
    11
    plain

    View Slide

  15. bag1
    bag2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags
    10
    green
    plain
    15
    peanut
    3
    13
    plain peanut
    bags[, 1, , drop = FALSE]
    5
    6 15
    3
    13
    peanut
    11
    plain
    This is a
    (2, 3, 2) 3D array

    View Slide

  16. bag1
    bag2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags
    10
    green
    plain
    15
    peanut
    3
    13
    plain peanut
    bags[, 1, , drop = FALSE]
    5
    6 15
    3
    13
    peanut
    11
    plain

    View Slide

  17. bag1
    bag2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    5
    6 15
    3
    13
    peanut
    11
    plain
    bags
    10
    green
    plain
    15
    peanut
    3
    13
    plain peanut
    bags[, 1, , drop = FALSE]
    bag[, 1, drop = FALSE]

    View Slide

  18. The confusion?
    Subsetting is not
    dimensionality-stable.

    View Slide

  19. rray

    View Slide

  20. rray is designed to provide a
    stricter array class.

    View Slide

  21. Create an rray
    library(rray)
    bag
    #> green red blue
    #> peanut 15 8 12
    #> plain 10 6 9
    bag_rray <- as_rray(bag)
    bag_rray
    #> [,3][2]>
    #> green red blue
    #> peanut 15 8 12
    #> plain 10 6 9

    View Slide

  22. bag1
    bag2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags_rray
    10
    green
    plain
    15
    peanut
    3
    13
    plain peanut
    bags_rray[, 1]
    bag1
    bag2
    6
    red
    peanut
    8
    plain
    10
    15
    green
    3
    plain
    6
    13
    5
    peanut
    bags_rray[, 1:2]
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    15
    6
    10
    plain
    8
    peanut
    red
    green
    bag_rray bag_rray[, 1:2]
    peanut
    10
    15
    green
    plain
    bag_rray[, 1]
    5
    6 15
    3
    13
    peanut
    11
    plain

    View Slide

  23. Broadcasting

    View Slide

  24. Broadcasting has to do with
    1) increasing dimensionality
    2) recycling dimensions

    View Slide

  25. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    blue
    red
    green
    2
    3 1
    extra (1, 3)
    5
    6 15
    3
    13
    peanut
    11
    plain
    =
    Error:
    non-conformable
    arrays

    View Slide

  26. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (2, 3, 2)
    =
    bag1
    bag2
    7
    9 14
    13
    blue
    18
    peanut
    green
    11
    plain
    red
    (2, 3, 2)
    5
    6 15
    3
    13
    peanut
    11
    plain
    6
    7 17
    6
    16
    peanut
    13
    plain
    3
    red
    3
    green blue
    2
    1
    1 2
    2
    1
    2
    3 1
    3

    View Slide

  27. How is extra reshaped
    so that this works?

    View Slide

  28. # row # col # frame
    bags 2 3 2
    extra 1 3
    ? ? ?

    View Slide

  29. # row # col # frame
    bags 2 3 2
    extra 1 3
    ? ? ?

    View Slide

  30. # row # col # frame
    bags 2 3 2
    extra 1 3 1
    ? ? ?

    View Slide

  31. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (1, 3, 1)
    =
    5
    6 15
    3
    13
    peanut
    11
    plain
    blue
    green red
    3 1 2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    blue
    red
    green
    2
    3 1
    extra (1, 3)
    =
    5
    6 15
    3
    13
    peanut
    11
    plain

    View Slide

  32. # row # col # frame
    bags 2 3 2
    extra 1 3 1
    ? ? ?

    View Slide

  33. # row # col # frame
    bags 2 3 2
    extra 1 2 3 1
    ? 2 ? ?

    View Slide

  34. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (2, 3, 1)
    =
    5
    6 15
    3
    13
    peanut
    11
    plain
    2
    1
    3
    blue
    green red
    3 1 2
    bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (1, 3, 1)
    =
    5
    6 15
    3
    13
    peanut
    11
    plain
    blue
    green red
    3 1 2

    View Slide

  35. # row # col # frame
    bags 2 3 2
    extra 1 2 3 1
    ? 2 ? ?

    View Slide

  36. # row # col # frame
    bags 2 3 2
    extra 1 2 3 1
    ? 2 ? 3 ?

    View Slide

  37. # row # col # frame
    bags 2 3 2
    extra 1 2 3 1 2
    ? 2 ? 3 ? 2

    View Slide

  38. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (2, 3, 2)
    =
    bag1
    bag2
    7
    9 14
    13
    blue
    18
    peanut
    green
    11
    plain
    red
    (2, 3, 2)
    5
    6 15
    3
    13
    peanut
    11
    plain
    6
    7 17
    6
    16
    peanut
    13
    plain
    3
    red
    3
    green blue
    2
    1
    1 2
    2
    1
    2
    3 1
    3

    View Slide

  39. bag1
    bag2
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bags (2, 3, 2)
    +
    extra (2, 3, 2)
    =
    bag1
    bag2
    7
    9 14
    13
    blue
    18
    peanut
    green
    11
    plain
    red
    (2, 3, 2)
    5
    6 15
    3
    13
    peanut
    11
    plain
    6
    7 17
    6
    16
    peanut
    13
    plain
    3
    red
    3
    green blue
    2
    1
    1 2
    2
    1
    2
    3 1
    3

    View Slide

  40. Match dimensionality by appending 1’s
    Match dimensions by recycling them
    Broadcasting rules:

    View Slide

  41. rray broadcasts
    bag
    #> green red blue
    #> peanut 15 8 12
    #> plain 10 6 9
    extra
    #> green red blue
    #> [1,] 3 1 2
    bag + extra
    #> Error in bag + extra: non-conformable arrays
    bag_rray + extra
    #> [,3][2]>
    #> green red blue
    #> peanut 18 9 14
    #> plain 13 7 11

    View Slide

  42. Manipulation

    View Slide

  43. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()
    ...

    View Slide

  44. The best part?
    They all work with base R.

    View Slide

  45. The best part?
    They all work with base R.

    View Slide

  46. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()
    ...

    View Slide

  47. rray as a toolkit
    rray_bind()
    rray_duplicate_any()
    rray_expand_dims()
    rray_broadcast()
    rray_flip()
    rray_max()
    rray_sum()
    rray_mean()
    rray_reshape()
    rray_rotate()
    rray_split()
    rray_tile()
    rray_unique()
    ...

    View Slide

  48. 6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    Compute proportions
    1) Overall
    2) By filling
    3) By color

    View Slide

  49. 0.43
    0.57 0.57
    0.40
    blue
    0.60
    peanut
    green
    0.43
    plain
    red
    0.24
    0.23 0.34
    0.40
    blue
    0.43
    peanut
    green
    0.36
    plain
    red
    0.10
    0.13 0.20
    0.17
    blue
    0.25
    peanut
    green
    0.15
    plain
    red
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    bag / sum(bag)
    sweep(bag, 1, apply(bag, 1, sum), "/")
    sweep(bag, 2, apply(bag, 2, sum), "/")
    Overall
    By Filling
    By Color

    View Slide

  50. bag / rray_sum(bag)
    bag / rray_sum(bag, axes = 2)
    bag / rray_sum(bag, axes = 1)
    Overall
    By Filling
    By Color
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    6
    8 12
    10
    blue
    15
    peanut
    green
    9
    plain
    red
    0.43
    0.57 0.57
    0.40
    blue
    0.60
    peanut
    green
    0.43
    plain
    red
    0.24
    0.23 0.34
    0.40
    blue
    0.43
    peanut
    green
    0.36
    plain
    red
    0.10
    0.13 0.20
    0.17
    blue
    0.25
    peanut
    green
    0.15
    plain
    red

    View Slide

  51. In conclusion...

    View Slide

  52. 1) Stricter rray class
    2) Broadcasting
    3) Toolkit

    View Slide

  53. GitHub
    https://github.com/r-lib/rray
    Website
    https://rray.r-lib.org
    Powered by: xtensor
    https://github.com/QuantStack/xtensor
    Questions?

    View Slide