Slide 1

Slide 1 text

BeginneR Session - Data Pipeline - #75 Tokyo.R 2019.01.19 @kilometer00

Slide 2

Slide 2 text

Who!?

Slide 3

Slide 3 text

Who!? 名前: 三村 @kilometer 職業: ポスドク (こうがくはくし) 専⾨: ⾏動神経科学(霊⻑類) 脳イメージング 医療システム⼯学 R歴: ~ 10年ぐらい 流⾏: グリル付きコンロ

Slide 4

Slide 4 text

BeginneR Session

Slide 5

Slide 5 text

BeginneR

Slide 6

Slide 6 text

BeginneR Advanced Hoxo_m If I have seen further it is by standing on the shoulders of Giants. -- Sir Isaac Newton, 1676

Slide 7

Slide 7 text

Before After BeginneR Session BeginneR BeginneR

Slide 8

Slide 8 text

BeginneR Session - Data Pipeline -

Slide 9

Slide 9 text

Input Output Data Pipeline

Slide 10

Slide 10 text

packages you

Slide 11

Slide 11 text

Input Output packages Data Pipeline

Slide 12

Slide 12 text

Output Input Input Data Pipeline

Slide 13

Slide 13 text

Output Input Input Data Pipeline

Slide 14

Slide 14 text

Output Input Input Data Pipeline

Slide 15

Slide 15 text

Data Pipeline

Slide 16

Slide 16 text

Data Pipeline readable coding

Slide 17

Slide 17 text

Programing Write Run Read Think

Slide 18

Slide 18 text

Run!!! https://www.amazon.co.jp/dp/B00Y0UI990/

Slide 19

Slide 19 text

Programing Write Run Read Think

Slide 20

Slide 20 text

Programing Write Run Read Think coding style

Slide 21

Slide 21 text

The tidyverse style guide https://style.tidyverse.org/ "Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread." Google's R Style Guide https://style.tidyverse.org/ "The goal of the R Programming Style Guide is to make our R code easier to read, share, and verify." R coding style guides

Slide 22

Slide 22 text

The tidyverse style guides https://style.tidyverse.org/syntax.html#object-names

Slide 23

Slide 23 text

The tidyverse style guides https://style.tidyverse.org/syntax.html#object-names

Slide 24

Slide 24 text

> data function(..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), envir = .GlobalEnv) { fileExt <- function(x) { db <- grepl("\\.[^.]+\\.(gz|bz2|xz)$", x) ans <- sub(".*\\.", "", x) ... "Where possible, avoid re-using names of common functions and variables. This will cause confusion for the readers of your code." # Good df <- read.csv("hoge.csv") dat <- read.csv("hoge.csv") # Bad data <- read.csv("hoge.csv")

Slide 25

Slide 25 text

# Bad for(i in 1:10){ print(i) } # Good for(i in 1:10){ print(i) } copy (cut) & paste Auto-indentation (in RStudio) Details: RStudio > Preference > Code > Editing

Slide 26

Slide 26 text

The tidyverse style guide https://style.tidyverse.org/ "Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread." Google's R Style Guide https://style.tidyverse.org/ "The goal of the R Programming Style Guide is to make our R code easier to read, share, and verify." R coding style guides

Slide 27

Slide 27 text

Programing Write Run Read Think Write Run Read Think Share

Slide 28

Slide 28 text

Text Figure Information Intention Data decode encode feedback Programing

Slide 29

Slide 29 text

ブール演算⼦ Boolean Algebra A == B A != B George Boole 1815 - 1864 A | B A & B A %in% B # equal to # not equal to # or # and # is A in B? wikipedia

Slide 30

Slide 30 text

"a" != "b" # is A in B? ブール演算⼦ Boolean Algebra [1] TRUE 1 %in% 10:100 # is A in B? [1] FALSE

Slide 31

Slide 31 text

George Boole 1815 - 1864 A Class-Room Introduction to Logic https://niyamaklogic.wordpress.co m/category/laws-of-thoughts/ Mathematician Philosopher &

Slide 32

Slide 32 text

ブール演算⼦ Boolean Algebra A == B A != B George Boole 1815 - 1864 A | B A & B A %in% B # equal to # not equal to # or # and # is A in B? wikipedia

Slide 33

Slide 33 text

Programing

Slide 34

Slide 34 text

Programing

Slide 35

Slide 35 text

Programing Write Run Read Think Write Run Read Think Communicate Share

Slide 36

Slide 36 text

Programing Write Run Read Think Write Run Read Think Communicate Share

Slide 37

Slide 37 text

Programing Write Run Read Think Write Run Read Think Communicate Share

Slide 38

Slide 38 text

Input Output packages Data Pipeline

Slide 39

Slide 39 text

Integrated Development Environment RStudio https://www.rstudio.com/

Slide 40

Slide 40 text

Integrated Development Environment RStudio https://www.rstudio.com/

Slide 41

Slide 41 text

RStudio

Slide 42

Slide 42 text

Projects RStudio

Slide 43

Slide 43 text

RStudio > Project ⼀説には2147483647個存在するとも⾔われるRStudioの 利点のなかでも、 ‧Rなどのソースファイルをタブで並べて表⽰できる ‧そのタブの順番を保持できる ‧タブの内容をファイルを保存せずにRStudioを終了して しまっても、編集途中の内容を保持してくれている 等が全国2147483647⼈のRユーザーのQOLを⼤いに向上 させたのは、おそらく異論がないことと思われます。 RStudioって...なんだ? Projectって...なんだ???? @wakuteka https://qiita.com/wakuteka/items/9599bb0a8985d98928d7

Slide 44

Slide 44 text

File > New Project… > New Directory > New Project hogehoge

Slide 45

Slide 45 text

hogehoge ~/Documents/R hogehoge.Rproj .Rproj.user Project Root Directory Double click!! .RData .Rhistory Auto saved project information Open project New!!

Slide 46

Slide 46 text

~/Documents/R project1 project2 project3

Slide 47

Slide 47 text

0. Introduction 1. data.frame 2. Pipe 4. Tidy data 3. Verbs Agenda 済

Slide 48

Slide 48 text

vector in Excel

Slide 49

Slide 49 text

vector in R in Excel pre <- c(1, 2, 3, 4, 5) post <- pre * 5 > pre [1] 1 2 3 4 5 > post [1] 5 10 15 20 25

Slide 50

Slide 50 text

vector vec1 <- c(1, 2, 3, 4, 5) vec2 <- 1:5 vec3 <- seq(from = 1, to = 5, by = 1) > vec1 [1] 1 2 3 4 5 > vec2 [1] 1 2 3 4 5 > vec3 [1] 1 2 3 4 5

Slide 51

Slide 51 text

vector vec1 <- seq(from = 1, to = 5, by = 1) vec2 <- seq(1, 5, 1) > vec1 [1] 1 2 3 4 5 > vec2 [1] 1 2 3 4 5

Slide 52

Slide 52 text

> ?seq vector seq{base} Sequence Generation Description Generate regular sequences. seq is a standard generic with a default method. … Usage seq(...) ## Default S3 method: seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)), length.out = NULL, along.with = NULL, ...)

Slide 53

Slide 53 text

vector vec1 <- rep(1:3, times = 2) vec2 <- rep(1:3, each = 2) vec3 <- rep(1:3, times = 2, each = 2) > vec1 [1] 1 2 3 1 2 3 > vec2 [1] 1 1 2 2 3 3 > vec3 [1] 1 1 2 2 3 3 1 1 2 2 3 3

Slide 54

Slide 54 text

vector vec1 <- 11:15 > vec1 [1] 11 12 13 14 15 > vec1[1] [1] 11 > vec1[3:5] [1] 13 14 15 > vec1[c(1:2, 5)] [1] 11 12 15

Slide 55

Slide 55 text

list list1 <- list(1:6, 11:15, c("a", "b", "c")) > list1 [[1]] [1] 1 2 3 4 5 6 [[2]] [1] 11 12 13 14 15 [[3]] [1] "a" "b" "c"

Slide 56

Slide 56 text

list list1 <- list(1:6, 11:15, c("a", "b", "c")) > list1[[1]] [1] 1 2 3 4 5 6 > list1[[3]][2:3] [1] "b" "c" > list1[[2]] * 3 [1] 33 36 39 42 45

Slide 57

Slide 57 text

named list list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c")) > list2 $A [1] 1 2 3 4 5 6 $B [1] 11 12 13 14 15 $C [1] "a" "b" "c"

Slide 58

Slide 58 text

> list2$A [1] 1 2 3 4 5 6 > list2$C[2:3] [1] "b" "c" > list2$B * 3 [1] 33 36 39 42 45 list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c")) named list

Slide 59

Slide 59 text

list1 <- list(1:6, 11:15, c("a", "b", "c")) > class(list1) [1] "list" > names(list1) NULL list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c")) > class(list2) [1] "list" > names(list2) [1] "A" "B" "C" named list list

Slide 60

Slide 60 text

list3 <- list(A = 1:3, B = 11:13) > class(list3) [1] "list" > names(list3) [1] "A" "B" df1 <- data.frame(A = 1:3, B = 11:13) > class(df1) [1] "data.frame" > names(df1) [1] "A" "B" named list & data.frame

Slide 61

Slide 61 text

> str(list3) List of 2 $ A: int [1:3] 1 2 3 $ B: int [1:3] 11 12 13 > str(df1) 'data.frame': 3 obs. of 2 variables: $ A: int 1 2 3 $ B: int 11 12 13 list3 <- list(A = 1:3, B = 11:13) df1 <- data.frame(A = 1:3, B = 11:13) named list & data.frame

Slide 62

Slide 62 text

> list3 $A [1] 1 2 3 $B [1] 11 12 13 > df1 A B 1 1 11 2 2 12 3 3 13 named list & data.frame

Slide 63

Slide 63 text

> list3 $A [1] 1 2 3 $B [1] 11 12 13 > df1 A B 1 1 11 2 2 12 3 3 13 named list & data.frame observation variable

Slide 64

Slide 64 text

data.frame v.s. matrix A B 1 1 11 2 2 12 3 3 13 [,1] [,2] [1,] 1 11 [2,] 2 12 [3,] 3 13 df1 <- data.frame(A = 1:3, B = 11:13) > str(mat1) int [1:3, 1:2] 1 2 3 11 12 13 > str(df1) 'data.frame': 3 obs. of 2 vars.: $ A: int 1 2 3 $ B: int 11 12 13 mat1 <- matrix(c(1:3, 11:13), 3, 2)

Slide 65

Slide 65 text

data.frame v.s. matrix

Slide 66

Slide 66 text

data.frame v.s. matrix

Slide 67

Slide 67 text

…(省前) … そんなわけで、「data.frame」は、我々の⼼の中にしかありません。 あの四⾓い感じの、みんなが「data.frame」と呼んでいるものこそが 「data.frame」なのです。 ... So, "data.frame" is only in our mind. Something like square, everyone calls "data.frame", is the "data.frame".

Slide 68

Slide 68 text

0. Introduction 1. data.frame 2. Tidy data 4. Verbs 3. Pipe Agenda 済 済

Slide 69

Slide 69 text

http://vita.had.co.nz/papers/tidy-data.html

Slide 70

Slide 70 text

https://r4ds.had.co.nz/

Slide 71

Slide 71 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell.

Slide 72

Slide 72 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell. > df1 A B 1 1 11 2 2 12 3 3 13 observation variable df1 <- data.frame(A = 1:3, B = 11:13)

Slide 73

Slide 73 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell.

Slide 74

Slide 74 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell.

Slide 75

Slide 75 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell. Different observation data Value Label

Slide 76

Slide 76 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell. data tidying

Slide 77

Slide 77 text

"Horizontal" tidy data variables

Slide 78

Slide 78 text

"Horizontal" tidy data "Vertical" tidy data gather(df, key, value, -c(obsid, group)) {tidyr} variables

Slide 79

Slide 79 text

"Horizontal" style "Vertical" style gather(df, key, value, -c(obsid, group)) {tidyr} variables variables

Slide 80

Slide 80 text

"Horizontal" style "Nested" style nest(group_by(df, group)) {tidyr}

Slide 81

Slide 81 text

"Nested" style df2 <- nest(group_by(df, group)) {tidyr}

Slide 82

Slide 82 text

"Horizontal" data "Nested" data "Vertical" data nest unnest gather spread input output visualization Data style manipulation in {tidyr} Loops, Summarization, Feature extractions et al., ...

Slide 83

Slide 83 text

0. Introduction 1. data.frame 2. Tidy data 4. Verbs 3. Pipe Agenda 済 済

Slide 84

Slide 84 text

1JQF X %>% f X %>% f(y) X %>% f %>% g X %>% f(y, .) f(X) f(X, y) g(f(X)) f(y, X) %>% {magrittr} 「dplyr再⼊⾨(基本編)」yutanihilation https://speakerdeck.com/yutannihilation/dplyrzai-ru-men-ji-ben-bian

Slide 85

Slide 85 text

{magrittr} 「最近パイプしか打ってないです」 「パイプ、あれはいいよなって 他の⾔語の⼈も皆んな思ってますよ」 「1年ぐらいかけてゆっくりこっち (パイプ)にシフトしましたね」 【中毒 愛⽤者たちの声】 「Rコミュニティ四⽅⼭話」https://rlangradio.org/ 1JQF %>%

Slide 86

Slide 86 text

① ② ③ ④ lift take pour put Bring milk from the kitchen!

Slide 87

Slide 87 text

① lift Bring milk from the kitchen! lift(Robot, glass, table) -> Robot' take ② take(Robot', fridge, milk) -> Robot''

Slide 88

Slide 88 text

Bring milk from the kitchen! Robot' <- lift(Robot, glass, table) Robot'' <- take(Robot', fridge, milk) Robot''' <- pour(Robot'', milk, glass) result <- put(Robot''', glass, table) result <- Robot %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table) by using pipe, # ① # ② # ③ # ④ # ① # ② # ③ # ④

Slide 89

Slide 89 text

The tidyverse style guides https://style.tidyverse.org/syntax.html#object-names "There are only two hard things in Computer Science: cache invalidation and naming things"

Slide 90

Slide 90 text

Bring milk from the kitchen! Robot' <- lift(Robot, glass, table) Robot'' <- take(Robot', fridge, milk) Robot''' <- pour(Robot'', milk, glass) result <- put(Robot''', glass, table) result <- Robot %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table) by using pipe, # ① # ② # ③ # ④ # ① # ② # ③ # ④

Slide 91

Slide 91 text

Robot' <- lift(Robot, glass, table) Robot'' <- take(Robot', fridge, milk) Robot''' <- pour(Robot'', milk, glass) result <- put(Robot''', glass, table) result <- Robot %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table) by using pipe, # ① # ② # ③ # ④ # ① # ② # ③ # ④ Thinking Reading Bring milk from the kitchen!

Slide 92

Slide 92 text

Programing Write Run Read Think Write Run Read Think Communicate Share

Slide 93

Slide 93 text

0. Introduction 1. data.frame 2. Tidy data 4. Verbs 3. Pipe Agenda 済 済

Slide 94

Slide 94 text

① ② ③ ④ lift take pour put Bring milk from the kitchen!

Slide 95

Slide 95 text

① ② ③ ④ lift take pour put Bring milk from the kitchen! result <- Robot %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table)

Slide 96

Slide 96 text

please_bring <- function(someone, milk, glass, table = dining_table, fridge= kitchen_fridge){ someone %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table) } RobotA %>% please_bring(milk, my_glass) Define an original function Usage RobotB %>% please_bring(cold_tea, her_glass)

Slide 97

Slide 97 text

• nouns for variables • verbs for functions General naming guidance to naming things

Slide 98

Slide 98 text

• nouns for variables • verbs for functions General naming guidance to naming things https://www.grinchcentral.com/function-names-to-verb-or-not-to-verb

Slide 99

Slide 99 text

• nouns for variables • verbs for functions General naming guidance to naming things • variables are nouns • functions are verbs Conversely,

Slide 100

Slide 100 text

Functions are verbs.

Slide 101

Slide 101 text

filter(df, x == "a", y == 1) df[df$x == "a" & df$y == 1, ] # verb (動詞的) # noun (名詞的) {dplyr} df %>% filter(x == "a", y == 1) # verb with pipe

Slide 102

Slide 102 text

mutate select filter arrange summaries # add column # select column # select row # arrange row # summary of vars {dplyr} WFSCT WFSCGVODUJPOT

Slide 103

Slide 103 text

It (dplyr) provides simple “verbs” to help you translate your thoughts into code. functions that correspond to the most common data manipulation tasks Introduction to dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html WFSCT {dplyr}

Slide 104

Slide 104 text

dplyrは、あなたの考えをコードに翻訳 するための【動詞】を提供する。 データ操作における基本のキを、 シンプルに実⾏できる関数 (群) Introduction to dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html WFSCT {dplyr} ※ かなり意訳

Slide 105

Slide 105 text

WFSCT {dplyr} mutate # カラムの追加 + mutate

Slide 106

Slide 106 text

library(dplyr) iris %>% mutate(a = 1:nrow(.)) %>% str 'data.frame': 150 obs. of 6 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 ... $ Species : Factor w/ 3 levels "setosa", ... $ a : int 1 2 3 4 5 6 7 8 9 10 ... WFSCT {dplyr}

Slide 107

Slide 107 text

library(dplyr) iris %>% mutate(a = 1:nrow(.), a = a * 5/3 %>% round) 'data.frame': 150 obs. of 6 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 ... $ Species : Factor w/ 3 levels “setosa”, ... $ a : num 1.67 3.33 5 6.67 8.33 ... ... WFSCT {dplyr} over write

Slide 108

Slide 108 text

WFSCT {dplyr} select # カラムの選択 select

Slide 109

Slide 109 text

library(dplyr) iris %>% select(Sepal.Length, Sepal.Width) 'data.frame': 150 obs. of 6 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 ... WFSCT {dplyr}

Slide 110

Slide 110 text

library(dplyr) iris %>% select(contains(“Width”)) 'data.frame': 150 obs. of 6 variables: $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 ... WFSCT {dplyr} Select help functions

Slide 111

Slide 111 text

WFSCT {dplyr} # Select help functions starts_with("s") ends_with("s") contains("se") matches("^.e") one_of(c("Sepal.Length", "Species")) everything() https://kazutan.github.io/blog/2017/04/dplyr-select-memo/ 「dplyr::selectの活⽤例メモ」kazutan

Slide 112

Slide 112 text

mutate select filter arrange summaries # カラムの追加 # カラムの選択 # ⾏の絞り込み # ⾏の並び替え # 値の集約 {dplyr} WFSCT WFSCؔ਺܈

Slide 113

Slide 113 text

WFSCT {dplyr} filter # ⾏の絞り込み filter

Slide 114

Slide 114 text

library(dplyr) iris %>% filter(Species == "versicolor") WFSCT {dplyr} 'data.frame': 50 obs. of 5 variables: $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 ... $ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 ... $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 ... $ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...

Slide 115

Slide 115 text

library(dplyr) iris %>% filter(Species == "versicolor") WFSCT {dplyr} NSE (Non-Standard Evaluation) 'data.frame': 50 obs. of 5 variables: $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 ... $ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 ... $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 ... $ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...

Slide 116

Slide 116 text

filter(df, x == "a", y == 1) /4&ͷ࿩ NSE (Non-Standard Evaluation) df[df$x == "a" & df$y == 1, ] SE (Standard Evaluation) http://dplyr.tidyverse.org/articles/programming.html Programming with dplyr

Slide 117

Slide 117 text

filter(df, x == "a", y == 1) /4&ͷ࿩ NSEを使うと、 ‧dfの名前を何回も書かなくていいよ ‧SQLっぽく書けるよ http://dplyr.tidyverse.org/articles/programming.html Programming with dplyr df[df$x == "a" & df$y == 1, ]

Slide 118

Slide 118 text

filter(df, x == "a", y == 1) /4&ͷ࿩ NSEを使うと、 df[df$x == "a" & df$y == 1, ] http://dplyr.tidyverse.org/articles/programming.html Programming with dplyr ⾊々あるけどスッキリしているのは正義 (私⾒) 書きやすく、読みやすく。 思考と実装の距離を近く。 # verb (動詞的) # noun (名詞的)

Slide 119

Slide 119 text

df <- data.frame(x = 1:3, y = 1:3) filter(df, x == 1) /4&ͷ࿩ Because of NSE.. http://dplyr.tidyverse.org/articles/programming.html Programming with dplyr my_var <- "x" filter(df, my_var == 1) This do NOT work There is No “my_var” column in df

Slide 120

Slide 120 text

/4&ͷ࿩ my_var <- quo(x) filter(df, (!! my_var) == 1) ど〜〜〜してもやりたければ、 何故こうなるかは、 「dplyr再⼊⾨(Tidyval編)」を参照。 https://speakerdeck.com/yutannihilation/dplyrzai-ru-men-tidyevalbian 「dplyr再⼊⾨(Tidyval編)」yutanihilation

Slide 121

Slide 121 text

/4&ͷ࿩ my_var <- quo(x) filter(df, (!! my_var) == 1) ど〜〜〜してもやりたければ、 何故こうなるかは、 「dplyr再⼊⾨(Tidyval編)」を参照。 https://speakerdeck.com/yutannihilation/dplyrzai-ru-men-tidyevalbian 可読性が上がる?下がる? それは、あなたと読み⼿次第。 「dplyr再⼊⾨(Tidyval編)」yutanihilation

Slide 122

Slide 122 text

mutate select filter arrange summaries # カラムの追加 # カラムの選択 # ⾏の絞り込み # ⾏の並び替え # 値の集約 {dplyr} WFSCT WFSCؔ਺܈

Slide 123

Slide 123 text

(SBNNBSPGEBUBNBOJQVMBUJPO By constraining your options, it helps you think about your data manipulation challenges. Introduction to dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

Slide 124

Slide 124 text

選択肢を制限することで、 データ解析のステップを シンプルに考えられますヨ。 (めっちゃ意訳) Introduction to dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html ※ まさに意訳 (SBNNBSPGEBUBNBOJQVMBUJPO

Slide 125

Slide 125 text

より多くの制約を課す事で、 魂の⾜枷から、より⾃由になる。 Igor Stravinsky И ́ горь Ф Страви́нский The more constraints one imposes, the more one frees one's self of the chains that shackle the spirit. 1882 - 1971 ※ 割と意訳

Slide 126

Slide 126 text

0. Introduction 1. data.frame 2. Tidy data 4. Verbs 3. Pipe Agenda 済 済

Slide 127

Slide 127 text

Summary

Slide 128

Slide 128 text

> list3 $A [1] 1 2 3 $B [1] 11 12 13 > df1 A B 1 1 11 2 2 12 3 3 13 observation variable

Slide 129

Slide 129 text

In tidy data: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value must have its own cell. data tidying

Slide 130

Slide 130 text

"Horizontal" data "Nested" data "Vertical" data nest unnest gather spread input output visualization Data style manipulation in {tidyr} Loops, Summarization, Feature extractions et al., ...

Slide 131

Slide 131 text

1JQF X %>% f X %>% f(y) X %>% f %>% g X %>% f(y, .) f(X) f(X, y) g(f(X)) f(y, X) %>% {magrittr} 「dplyr再⼊⾨(基本編)」yutanihilation https://speakerdeck.com/yutannihilation/dplyrzai-ru-men-ji-ben-bian

Slide 132

Slide 132 text

① ② ③ ④ lift take pour put Functions are verbs result <- Robot %>% lift(glass, table) %>% take(fridge, milk) %>% pour(milk, glass) %>% put(glass, table)

Slide 133

Slide 133 text

filter(df, x == "a", y == 1) df[df$x == "a" & df$y == 1, ] # verb (動詞的) # noun (名詞的) {dplyr} df %>% filter(x == "a", y == 1)

Slide 134

Slide 134 text

mutate select filter arrange summaries # add column # select column # select row # arrange row # summary of vars {dplyr} WFSCT WFSCGVODUJPOT

Slide 135

Slide 135 text

Data Pipeline readable coding

Slide 136

Slide 136 text

https://www.tidyverse.org/

Slide 137

Slide 137 text

Programing languages are language Write Run Read Think Write Run Read Think Communicate Share

Slide 138

Slide 138 text

“Life shrinks or expands to one’s courage.” -- Anaïs Nin, 2000 http://theamericanreader.com

Slide 139

Slide 139 text

Before After BeginneR Session BeginneR BeginneR ?

Slide 140

Slide 140 text

No content