Slide 1

Slide 1 text

googlesheets @JennyBryan @jennybc @STAT545 stat545-ubc.github.io

Slide 2

Slide 2 text

For the rOpenSci Community Call 2016 Mar 02

Slide 3

Slide 3 text

+ googlesheets = R wrapper around the Sheets & Drive APIs +

Slide 4

Slide 4 text

thanks Joanna Zhao!

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Why would a UseR use a Google Sheet? collaboration - link, don’t attach! built-in functions to import web data - importxml(), importhtml(), importfeed() magic templates - TAGS, Amazon Price Tracker web services - ifttt.com Google - Forms - Explore - Statistics for Google Sheets

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Visit the Sheet: All the #rstats pipe tweets https://ifttt.com/

Slide 9

Slide 9 text

import.io http://newsdev.github.io/driveshaft/

Slide 10

Slide 10 text

https://www.cloudstitch.com 26 Amazon Dash Button Hacks You Probably Didn't Know About

Slide 11

Slide 11 text

Why a Google Sheet? They are of the web. They are eminently hackable.

Slide 12

Slide 12 text

Why important to have an R interface to Google Sheets? Seriously?

Slide 13

Slide 13 text

The participant who loses the most weight (as measured by the final weight as a percentage of his starting weight) shall be declared the winner. Jim Craig

Slide 14

Slide 14 text

Craig won :( Read the contract on github.

Slide 15

Slide 15 text

“Examples of permitted services include, but are not limited to, cooking, gardening, handiwork, caddying, office work, serving food and drinks or running errands.”

Slide 16

Slide 16 text

~1700 consecutive days of weight data

Slide 17

Slide 17 text

iOS Android enter data from a phone enter data w/o WiFi

Slide 18

Slide 18 text

library(googlesheets) ... jc_ss <- gs_title("jim-craig-weight-loss") jc_dat <- gs_read(jc_ss) jc_dat <- jc_dat %>% mutate(date = as.Date(date, format = "%m/%d/%Y")) %>% gather("who", "weight", Jim, Craig) p <- ggplot(jc_dat, aes(x = date, y = weight, color = who)) + geom_point(alpha = 0.35) + geom_smooth(se = FALSE, lwd = 2) + scale_y_continuous(labels = lbs_formatter) print(direct.label(p))

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

I wanted live stats!

Slide 21

Slide 21 text

vn_ss <- gs_title("2015-05-23_seaRM-at-vanNH") #> Sheet successfully identifed: "2015-05-23_seaRM-at-vanNH" game_play <- gs_read(vn_ss, ws = 10, range = cell_limits(c(2, NA), c(1, 2))) #> Accessing worksheet titled "10" head(game_play) #> Source: local data frame [6 x 2] #> #> Offense Defense #> 1 NA 42P #> 2 21pu NA #> 3 NA 29pu #> 4 NA 31 #> 5 NA 29 #> 6 NA 31

Slide 22

Slide 22 text

point_info <- vn_ss %>% gs_read_cellfeed(ws = 10, range = "D1:D4") %>% gs_simplify_cellfeed(col_names = FALSE) #> Accessing worksheet titled "10" point_info #> D1 D2 D3 D4 #> "Seattle Rainmakers" "1" "3:12:00" "1:49:00"

Slide 23

Slide 23 text

install.packages(“googlesheets”) ## highly recommend you use dev version library(devtools) install_github(“jennybc/googlesheets”)

Slide 24

Slide 24 text

gs_*

Slide 25

Slide 25 text

> gs_ls("Gap") Source: local data frame [5 x 10] sheet_title author perm version updated (chr) (chr) (chr) (chr) (time) 1 Gapminder_copy gspreadr rw new 2016-02-19 07:42:37 2 Gapminder 2007 Can Write gspreadr rw new 2015-04-15 19:33:55 3 Gapminder gspreadr rw new 2015-03-23 20:59:10 4 Gapminder_old gspreadr rw new 2015-03-23 20:23:06 5 Gapminder by Continent joannaazhao rw new 2015-01-20 22:23:56 Variables not shown: sheet_key (chr), ws_feed (chr), alternate (chr), self (chr), alt_key (chr).

Slide 26

Slide 26 text

ss <- gs_title("Gapminder") ss <- gs_key("1BzfL0...") ss <- gs_url("https://docs.google.com/spreadsheets/d/1BzfL0.../" ) register a Sheet

Slide 27

Slide 27 text

> ss <- gs_title("Gapminder") > ss Spreadsheet title: Gapminder Spreadsheet author: gspreadr Date of googlesheets registration: 2016-03-02 07:02:38 GMT Date of last spreadsheet update: 2015-03-23 20:34:08 GMT visibility: private permissions: rw version: new Contains 5 worksheets: (Title): (Nominal worksheet extent as rows x columns) Africa: 625 x 6 Americas: 301 x 6 Asia: 397 x 6 Europe: 361 x 6 Oceania: 25 x 6 Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA Browser URL: https://docs.google.com/spreadsheets/d/ 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/

Slide 28

Slide 28 text

ss <- gs_title(“Gapminder") gs_browse(ss)

Slide 29

Slide 29 text

gs_gap() gs_gap_key() gs_gap_url() Gapminder Sheet available for examples, testing, practice

Slide 30

Slide 30 text

gs_upload() gs_rename() gs_download() gs_delete() gs_vecdel() gs_grepdel() gs_new() gs_copy()

Slide 31

Slide 31 text

gs_upload() .xls .xlsx .xlt .xltx .xltm .xlsm .csv .tsv .txt .tab .ods

Slide 32

Slide 32 text

gs_download() .csv .pdf .xlsx

Slide 33

Slide 33 text

gs_ws_ls() gs_ws_new() gs_ws_rename() gs_ws_delete()

Slide 34

Slide 34 text

gs_read()

Slide 35

Slide 35 text

gs_read(..., ws = 3, ...) gs_read(..., ws = “Europe”, ...) gs_read(..., range = "D12:F15") gs_read(..., range = "R1C12:R6C15") gs_read(..., range = cell_limits(c(3, NA), c(NA, 10))) gs_read(..., range = cell_rows(1:100)) gs_read(..., range = cell_cols(3:8)) gs_read(..., range = cell_cols("B:MZ")) gs_read(..., range = anchored("B4", dim = c(2, 10))) limit cell consumption

Slide 36

Slide 36 text

gap_ss <- gs_gap() oceania_crazy <- gs_read(gap_ss, ws = "Oceania", range = cell_rows(2:4), col_names = paste0("z", 1:6), col_types = "ccncnn", na = "1962") Source: local data frame [3 x 6] z1 z2 z3 z4 z5 z6 (chr) (chr) (dbl) (chr) (dbl) (dbl) 1 Australia Oceania 1952 69.12 8691212 10039.60 2 Australia Oceania 1957 70.33 9712569 10949.65 3 Australia Oceania NA 70.93 10794968 12217.23 readr-style data ingest

Slide 37

Slide 37 text

gs_new() gs_ws_new() gs_edit_cells() gs_add_row()

Slide 38

Slide 38 text

library(googlesheets) cdp <- "http://i.imgur.com/lrg4uy5.jpg" cell_contents <- paste0("=IMAGE(\"", cdp, "\", 2)") ss <- gs_new("cute-dog-photo-in-cell-2", input = cell_contents) gs_browse(ss) ## you'll have to change the row size in the browser Visit the Sheet Visit the Gist

Slide 39

Slide 39 text

gs_auth() gs_user()

Slide 40

Slide 40 text

Technical bits of interest flexible auth flow: - “just works” for most people on 1st try - unattended use w/ tokens stored to file - works with Shiny we are actually testing - against the Sheets and Drive APIs - including operations that require authentication - on Travis-CI - vignette about that

Slide 41

Slide 41 text

Deep thoughts My first “real” R package = inflicted on other people. Pros - It is actually useful and I wanted it to use it. - Pretty broad appeal. Cons: - I had no idea what I was getting into. None. Nada. - “What’s an API? XML? Oauth2?!?” What is/was hard about it has little to do with R, which is what I actually knew something about.

Slide 42

Slide 42 text

XML

Slide 43

Slide 43 text

OAuth2

Slide 44

Slide 44 text

Near term issues Refactoring OMG Strip numeric formatting (done-ish) Access formulas (done-ish) Navigate and manipulate Drive directory structure Get a handle on - browser actions re: sheet publishing and sharing - affect on Drive notions of permissions, “capabilities” - implications for API access via exportcsv link, list feed, cell feed Don’t require/send token when I don’t need auth! Encoding woes

Slide 45

Slide 45 text

ffs_read_csv #> Source: local data frame [5 x 5] #> #> Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 default = “literal values”

Slide 46

Slide 46 text

#> Source: local data frame [6 x 3] #> #> literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Number_wFormat Number_wFormat NA #> 2 654,321 654321 654321.0 #> 3 12.34% 12.34% 0.1234 #> 4 1.23E+09 1234567890 1.23456789E9 #> 5 3 1/7 3.14159265359 3.14159265359 #> 6 $0.36 0.36 0.36

Slide 47

Slide 47 text

#> Source: local data frame [6 x 3] #> #> literal_value input_value #> (chr) (chr) #> 1 Formulas Formulas #> 2 Google =HYPERLINK("http://www.google.com/","Google") #> 3 1,271,591.00 =sum(R[-1]C[-3]:R[3]C[-3]) #> 4 =IMAGE("https://www.google.com/images/srpr/logo3w.png" #> 5 $A$1 =ADDRESS(1,1) #> 6 =SPARKLINE(R[-4]C[-3]:R[0]C[-3]) #> Variables not shown: numeric_value (chr).

Slide 48

Slide 48 text

#> Source: local data frame [6 x 3] #> #> literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Formula_wFormat Formula_wFormat NA #> 2 3.18E+05 =average(R[0]C[-4]:R[4]C[-4]) 317897.75 #> 3 52.63% =R[-1]C[-4]/R[1]C[-4] 0.5263144432081239 #> 4 0.22 =R[-2]C[-4]/R[2]C[-4] 0.21739421366813996 #> 5 123,456.00 =min(R[-3]C[-4]:R[1]C[-4]) 123456.0 #> 6 317,898 =average(R2C1:R6C1) 317897.75

Slide 49

Slide 49 text

#> Source: local data frame [5 x 5] #> #> Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 now possible = “literal values, except when then input value is more appropriate, but sometimes the numeric value instead”

Slide 50

Slide 50 text

I wish we could automatically get the data and tidy it! And parse those formulas. A worthy project for the future….

Slide 51

Slide 51 text

No content