Upgrade to Pro — share decks privately, control downloads, hide ads and more …

googlesheets

 googlesheets

rOpenSci Community Call

0a4f62e90c976eeb44d33add75cca5af?s=128

Jennifer (Jenny) Bryan

March 02, 2016
Tweet

Transcript

  1. googlesheets @JennyBryan @jennybc @STAT545 stat545-ubc.github.io

  2. For the rOpenSci Community Call 2016 Mar 02

  3. + googlesheets = R wrapper around the Sheets & Drive

    APIs +
  4. thanks Joanna Zhao!

  5. None
  6. Why would a UseR use a Google Sheet? collaboration -

    link, don’t attach! built-in functions to import web data - importxml(), importhtml(), importfeed() magic templates - TAGS, Amazon Price Tracker web services - ifttt.com Google - Forms - Explore - Statistics for Google Sheets
  7. None
  8. Visit the Sheet: All the #rstats pipe tweets https://ifttt.com/

  9. import.io http://newsdev.github.io/driveshaft/

  10. https://www.cloudstitch.com 26 Amazon Dash Button Hacks You Probably Didn't Know

    About
  11. Why a Google Sheet? They are of the web. They

    are eminently hackable.
  12. Why important to have an R interface to Google Sheets?

    Seriously?
  13. The participant who loses the most weight (as measured by

    the final weight as a percentage of his starting weight) shall be declared the winner. Jim Craig
  14. Craig won :( Read the contract on github.

  15. “Examples of permitted services include, but are not limited to,

    cooking, gardening, handiwork, caddying, office work, serving food and drinks or running errands.”
  16. ~1700 consecutive days of weight data

  17. iOS Android enter data from a phone enter data w/o

    WiFi
  18. library(googlesheets) ... jc_ss <- gs_title("jim-craig-weight-loss") jc_dat <- gs_read(jc_ss) jc_dat <-

    jc_dat %>% mutate(date = as.Date(date, format = "%m/%d/%Y")) %>% gather("who", "weight", Jim, Craig) p <- ggplot(jc_dat, aes(x = date, y = weight, color = who)) + geom_point(alpha = 0.35) + geom_smooth(se = FALSE, lwd = 2) + scale_y_continuous(labels = lbs_formatter) print(direct.label(p))
  19. None
  20. I wanted live stats!

  21. vn_ss <- gs_title("2015-05-23_seaRM-at-vanNH") #> Sheet successfully identifed: "2015-05-23_seaRM-at-vanNH" game_play <-

    gs_read(vn_ss, ws = 10, range = cell_limits(c(2, NA), c(1, 2))) #> Accessing worksheet titled "10" head(game_play) #> Source: local data frame [6 x 2] #> #> Offense Defense #> 1 NA 42P #> 2 21pu NA #> 3 NA 29pu #> 4 NA 31 #> 5 NA 29 #> 6 NA 31
  22. point_info <- vn_ss %>% gs_read_cellfeed(ws = 10, range = "D1:D4")

    %>% gs_simplify_cellfeed(col_names = FALSE) #> Accessing worksheet titled "10" point_info #> D1 D2 D3 D4 #> "Seattle Rainmakers" "1" "3:12:00" "1:49:00"
  23. install.packages(“googlesheets”) ## highly recommend you use dev version library(devtools) install_github(“jennybc/googlesheets”)

  24. gs_*

  25. > gs_ls("Gap") Source: local data frame [5 x 10] sheet_title

    author perm version updated (chr) (chr) (chr) (chr) (time) 1 Gapminder_copy gspreadr rw new 2016-02-19 07:42:37 2 Gapminder 2007 Can Write gspreadr rw new 2015-04-15 19:33:55 3 Gapminder gspreadr rw new 2015-03-23 20:59:10 4 Gapminder_old gspreadr rw new 2015-03-23 20:23:06 5 Gapminder by Continent joannaazhao rw new 2015-01-20 22:23:56 Variables not shown: sheet_key (chr), ws_feed (chr), alternate (chr), self (chr), alt_key (chr).
  26. ss <- gs_title("Gapminder") ss <- gs_key("1BzfL0...") ss <- gs_url("https://docs.google.com/spreadsheets/d/1BzfL0.../" )

    register a Sheet
  27. > ss <- gs_title("Gapminder") > ss Spreadsheet title: Gapminder Spreadsheet

    author: gspreadr Date of googlesheets registration: 2016-03-02 07:02:38 GMT Date of last spreadsheet update: 2015-03-23 20:34:08 GMT visibility: private permissions: rw version: new Contains 5 worksheets: (Title): (Nominal worksheet extent as rows x columns) Africa: 625 x 6 Americas: 301 x 6 Asia: 397 x 6 Europe: 361 x 6 Oceania: 25 x 6 Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA Browser URL: https://docs.google.com/spreadsheets/d/ 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/
  28. ss <- gs_title(“Gapminder") gs_browse(ss)

  29. gs_gap() gs_gap_key() gs_gap_url() Gapminder Sheet available for examples, testing, practice

  30. gs_upload() gs_rename() gs_download() gs_delete() gs_vecdel() gs_grepdel() gs_new() gs_copy()

  31. gs_upload() .xls .xlsx .xlt .xltx .xltm .xlsm .csv .tsv .txt

    .tab .ods
  32. gs_download() .csv .pdf .xlsx

  33. gs_ws_ls() gs_ws_new() gs_ws_rename() gs_ws_delete()

  34. gs_read()

  35. gs_read(..., ws = 3, ...) gs_read(..., ws = “Europe”, ...)

    gs_read(..., range = "D12:F15") gs_read(..., range = "R1C12:R6C15") gs_read(..., range = cell_limits(c(3, NA), c(NA, 10))) gs_read(..., range = cell_rows(1:100)) gs_read(..., range = cell_cols(3:8)) gs_read(..., range = cell_cols("B:MZ")) gs_read(..., range = anchored("B4", dim = c(2, 10))) limit cell consumption
  36. gap_ss <- gs_gap() oceania_crazy <- gs_read(gap_ss, ws = "Oceania", range

    = cell_rows(2:4), col_names = paste0("z", 1:6), col_types = "ccncnn", na = "1962") Source: local data frame [3 x 6] z1 z2 z3 z4 z5 z6 (chr) (chr) (dbl) (chr) (dbl) (dbl) 1 Australia Oceania 1952 69.12 8691212 10039.60 2 Australia Oceania 1957 70.33 9712569 10949.65 3 Australia Oceania NA 70.93 10794968 12217.23 readr-style data ingest
  37. gs_new() gs_ws_new() gs_edit_cells() gs_add_row()

  38. library(googlesheets) cdp <- "http://i.imgur.com/lrg4uy5.jpg" cell_contents <- paste0("=IMAGE(\"", cdp, "\", 2)")

    ss <- gs_new("cute-dog-photo-in-cell-2", input = cell_contents) gs_browse(ss) ## you'll have to change the row size in the browser Visit the Sheet Visit the Gist
  39. gs_auth() gs_user()

  40. Technical bits of interest flexible auth flow: - “just works”

    for most people on 1st try - unattended use w/ tokens stored to file - works with Shiny we are actually testing - against the Sheets and Drive APIs - including operations that require authentication - on Travis-CI - vignette about that
  41. Deep thoughts My first “real” R package = inflicted on

    other people. Pros - It is actually useful and I wanted it to use it. - Pretty broad appeal. Cons: - I had no idea what I was getting into. None. Nada. - “What’s an API? XML? Oauth2?!?” What is/was hard about it has little to do with R, which is what I actually knew something about.
  42. XML

  43. OAuth2

  44. Near term issues Refactoring OMG Strip numeric formatting (done-ish) Access

    formulas (done-ish) Navigate and manipulate Drive directory structure Get a handle on - browser actions re: sheet publishing and sharing - affect on Drive notions of permissions, “capabilities” - implications for API access via exportcsv link, list feed, cell feed Don’t require/send token when I don’t need auth! Encoding woes
  45. ffs_read_csv #> Source: local data frame [5 x 5] #>

    #> Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 default = “literal values”
  46. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Number_wFormat Number_wFormat NA #> 2 654,321 654321 654321.0 #> 3 12.34% 12.34% 0.1234 #> 4 1.23E+09 1234567890 1.23456789E9 #> 5 3 1/7 3.14159265359 3.14159265359 #> 6 $0.36 0.36 0.36
  47. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value #> (chr) (chr) #> 1 Formulas Formulas #> 2 Google =HYPERLINK("http://www.google.com/","Google") #> 3 1,271,591.00 =sum(R[-1]C[-3]:R[3]C[-3]) #> 4 =IMAGE("https://www.google.com/images/srpr/logo3w.png" #> 5 $A$1 =ADDRESS(1,1) #> 6 =SPARKLINE(R[-4]C[-3]:R[0]C[-3]) #> Variables not shown: numeric_value (chr).
  48. #> Source: local data frame [6 x 3] #> #>

    literal_value input_value numeric_value #> (chr) (chr) (chr) #> 1 Formula_wFormat Formula_wFormat NA #> 2 3.18E+05 =average(R[0]C[-4]:R[4]C[-4]) 317897.75 #> 3 52.63% =R[-1]C[-4]/R[1]C[-4] 0.5263144432081239 #> 4 0.22 =R[-2]C[-4]/R[2]C[-4] 0.21739421366813996 #> 5 123,456.00 =min(R[-3]C[-4]:R[1]C[-4]) 123456.0 #> 6 317,898 =average(R2C1:R6C1) 317897.75
  49. #> Source: local data frame [5 x 5] #> #>

    Number Number_wFormat Character Formulas Formula_wFormat #> (int) (chr) (chr) (chr) (chr) #> 1 123456 654,321 one Google 3.18E+05 #> 2 345678 12.34% NA 1,271,591.00 52.63% #> 3 234567 1.23E+09 three NA 0.22 #> 4 NA 3 1/7 four $A$1 123,456.00 #> 5 567890 $0.36 five NA 317,898 now possible = “literal values, except when then input value is more appropriate, but sometimes the numeric value instead”
  50. I wish we could automatically get the data and tidy

    it! And parse those formulas. A worthy project for the future….
  51. None