Upgrade to Pro — share decks privately, control downloads, hide ads and more …

googlesheets

 googlesheets

rOpenSci Community Call

Jennifer (Jenny) Bryan

March 02, 2016
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Technology

Transcript

  1. Why would a UseR use a Google Sheet? collaboration -

    link, don’t attach! built-in functions to import web data - importxml(), importhtml(), importfeed() magic templates - TAGS, Amazon Price Tracker web services - ifttt.com Google - Forms - Explore - Statistics for Google Sheets
  2. The participant who loses the most weight (as measured by

    the final weight as a percentage of his starting weight) shall be declared the winner. Jim Craig
  3. “Examples of permitted services include, but are not limited to,

    cooking, gardening, handiwork, caddying, office work, serving food and drinks or running errands.”
  4. library(googlesheets) ... jc_ss <- gs_title("jim-craig-weight-loss") jc_dat <- gs_read(jc_ss) jc_dat <-

    jc_dat %>% mutate(date = as.Date(date, format = "%m/%d/%Y")) %>% gather("who", "weight", Jim, Craig) p <- ggplot(jc_dat, aes(x = date, y = weight, color = who)) + geom_point(alpha = 0.35) + geom_smooth(se = FALSE, lwd = 2) + scale_y_continuous(labels = lbs_formatter) print(direct.label(p))
  5. vn_ss <- gs_title("2015-05-23_seaRM-at-vanNH") #> Sheet successfully identifed: "2015-05-23_seaRM-at-vanNH" game_play <-

    gs_read(vn_ss, ws = 10, range = cell_limits(c(2, NA), c(1, 2))) #> Accessing worksheet titled "10" head(game_play) #> Source: local data frame [6 x 2] #> #> Offense Defense #> 1 NA 42P #> 2 21pu NA #> 3 NA 29pu #> 4 NA 31 #> 5 NA 29 #> 6 NA 31
  6. point_info <- vn_ss %>% gs_read_cellfeed(ws = 10, range = "D1:D4")

    %>% gs_simplify_cellfeed(col_names = FALSE) #> Accessing worksheet titled "10" point_info #> D1 D2 D3 D4 #> "Seattle Rainmakers" "1" "3:12:00" "1:49:00"
  7. > gs_ls("Gap") Source: local data frame [5 x 10] sheet_title

    author perm version updated (chr) (chr) (chr) (chr) (time) 1 Gapminder_copy gspreadr rw new 2016-02-19 07:42:37 2 Gapminder 2007 Can Write gspreadr rw new 2015-04-15 19:33:55 3 Gapminder gspreadr rw new 2015-03-23 20:59:10 4 Gapminder_old gspreadr rw new 2015-03-23 20:23:06 5 Gapminder by Continent joannaazhao rw new 2015-01-20 22:23:56 Variables not shown: sheet_key (chr), ws_feed (chr), alternate (chr), self (chr), alt_key (chr).
  8. > ss <- gs_title("Gapminder") > ss Spreadsheet title: Gapminder Spreadsheet

    author: gspreadr Date of googlesheets registration: 2016-03-02 07:02:38 GMT Date of last spreadsheet update: 2015-03-23 20:34:08 GMT visibility: private permissions: rw version: new Contains 5 worksheets: (Title): (Nominal worksheet extent as rows x columns) Africa: 625 x 6 Americas: 301 x 6 Asia: 397 x 6 Europe: 361 x 6 Oceania: 25 x 6 Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA Browser URL: https://docs.google.com/spreadsheets/d/ 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/
  9. gs_read(..., ws = 3, ...) gs_read(..., ws = “Europe”, ...)

    gs_read(..., range = "D12:F15") gs_read(..., range = "R1C12:R6C15") gs_read(..., range = cell_limits(c(3, NA), c(NA, 10))) gs_read(..., range = cell_rows(1:100)) gs_read(..., range = cell_cols(3:8)) gs_read(..., range = cell_cols("B:MZ")) gs_read(..., range = anchored("B4", dim = c(2, 10))) limit cell consumption
  10. gap_ss <- gs_gap() oceania_crazy <- gs_read(gap_ss, ws = "Oceania", range

    = cell_rows(2:4), col_names = paste0("z", 1:6), col_types = "ccncnn", na = "1962") Source: local data frame [3 x 6] z1 z2 z3 z4 z5 z6 (chr) (chr) (dbl) (chr) (dbl) (dbl) 1 Australia Oceania 1952 69.12 8691212 10039.60 2 Australia Oceania 1957 70.33 9712569 10949.65 3 Australia Oceania NA 70.93 10794968 12217.23 readr-style data ingest
  11. library(googlesheets) cdp <- "http://i.imgur.com/lrg4uy5.jpg" cell_contents <- paste0("=IMAGE(\"", cdp, "\", 2)")

    ss <- gs_new("cute-dog-photo-in-cell-2", input = cell_contents) gs_browse(ss) ## you'll have to change the row size in the browser Visit the Sheet Visit the Gist
  12. Technical bits of interest flexible auth flow: - “just works”

    for most people on 1st try - unattended use w/ tokens stored to file - works with Shiny we are actually testing - against the Sheets and Drive APIs - including operations that require authentication - on Travis-CI - vignette about that
  13. Deep thoughts My first “real” R package = inflicted on

    other people. Pros - It is actually useful and I wanted it to use it. - Pretty broad appeal. Cons: - I had no idea what I was getting into. None. Nada. - “What’s an API? XML? Oauth2?!?” What is/was hard about it has little to do with R, which is what I actually knew something about.
  14. XML

  15. Near term issues Refactoring OMG Strip numeric formatting (done-ish) Access

    formulas (done-ish) Navigate and manipulate Drive directory structure Get a handle on - browser actions re: sheet publishing and sharing - affect on Drive notions of permissions, “capabilities” - implications for API access via exportcsv link, list feed, cell feed Don’t require/send token when I don’t need auth! Encoding woes
  16. ffs_read_csv #> Source: local data frame [5 x 5] #>

    #> Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 default = “literal values”
  17. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Number_wFormat Number_wFormat NA #> 2 654,321 654321 654321.0 #> 3 12.34% 12.34% 0.1234 #> 4 1.23E+09 1234567890 1.23456789E9 #> 5 3 1/7 3.14159265359 3.14159265359 #> 6 $0.36 0.36 0.36
  18. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value #> (chr) (chr) #> 1 Formulas Formulas #> 2 Google =HYPERLINK("http://www.google.com/","Google") #> 3 1,271,591.00 =sum(R[-1]C[-3]:R[3]C[-3]) #> 4 =IMAGE("https://www.google.com/images/srpr/logo3w.png" #> 5 $A$1 =ADDRESS(1,1) #> 6 =SPARKLINE(R[-4]C[-3]:R[0]C[-3]) #> Variables not shown: numeric_value (chr).
  19. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Formula_wFormat Formula_wFormat NA #> 2 3.18E+05 =average(R[0]C[-4]:R[4]C[-4]) 317897.75 #> 3 52.63% =R[-1]C[-4]/R[1]C[-4] 0.5263144432081239 #> 4 0.22 =R[-2]C[-4]/R[2]C[-4] 0.21739421366813996 #> 5 123,456.00 =min(R[-3]C[-4]:R[1]C[-4]) 123456.0 #> 6 317,898 =average(R2C1:R6C1) 317897.75
  20. #> Source: local data frame [5 x 5] #> #>

    Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 now possible = “literal values, except when then input value is more appropriate, but sometimes the numeric value instead”
  21. I wish we could automatically get the data and tidy

    it! And parse those formulas. A worthy project for the future….