Upgrade to Pro — share decks privately, control downloads, hide ads and more …

googlesheets

 googlesheets

rOpenSci Community Call

Jennifer (Jenny) Bryan

March 02, 2016
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Technology

Transcript

  1. googlesheets
    @JennyBryan
    @jennybc
    @STAT545
    stat545-ubc.github.io

    View full-size slide

  2. For the
    rOpenSci Community Call
    2016 Mar 02

    View full-size slide

  3. +
    googlesheets =
    R wrapper around the
    Sheets & Drive APIs
    +

    View full-size slide

  4. thanks
    Joanna Zhao!

    View full-size slide

  5. Why would a UseR use a Google Sheet?
    collaboration
    - link, don’t attach!
    built-in functions to import web data
    - importxml(), importhtml(), importfeed()
    magic templates
    - TAGS, Amazon Price Tracker
    web services
    - ifttt.com
    Google
    - Forms
    - Explore
    - Statistics for Google Sheets

    View full-size slide

  6. Visit the Sheet: All the #rstats pipe tweets
    https://ifttt.com/

    View full-size slide

  7. import.io
    http://newsdev.github.io/driveshaft/

    View full-size slide

  8. https://www.cloudstitch.com
    26 Amazon Dash Button Hacks You Probably Didn't Know About

    View full-size slide

  9. Why a Google Sheet?
    They are of the web.
    They are eminently hackable.

    View full-size slide

  10. Why important to have an R
    interface to Google Sheets?
    Seriously?

    View full-size slide

  11. The participant who loses the most
    weight (as measured by the final
    weight as a percentage of his starting
    weight) shall be declared the winner.
    Jim
    Craig

    View full-size slide

  12. Craig won :(
    Read the contract on github.

    View full-size slide

  13. “Examples of permitted services
    include, but are not limited to,
    cooking, gardening, handiwork,
    caddying, office work, serving food
    and drinks or running errands.”

    View full-size slide

  14. ~1700 consecutive
    days of weight data

    View full-size slide

  15. iOS Android
    enter data from a phone
    enter data w/o WiFi

    View full-size slide

  16. library(googlesheets)
    ...
    jc_ss <- gs_title("jim-craig-weight-loss")
    jc_dat <- gs_read(jc_ss)
    jc_dat <- jc_dat %>%
    mutate(date = as.Date(date, format = "%m/%d/%Y")) %>%
    gather("who", "weight", Jim, Craig)
    p <- ggplot(jc_dat, aes(x = date, y = weight, color = who)) +
    geom_point(alpha = 0.35) + geom_smooth(se = FALSE, lwd = 2) +
    scale_y_continuous(labels = lbs_formatter)
    print(direct.label(p))

    View full-size slide

  17. I wanted live stats!

    View full-size slide

  18. vn_ss <- gs_title("2015-05-23_seaRM-at-vanNH")
    #> Sheet successfully identifed: "2015-05-23_seaRM-at-vanNH"
    game_play <-
    gs_read(vn_ss, ws = 10,
    range = cell_limits(c(2, NA), c(1, 2)))
    #> Accessing worksheet titled "10"
    head(game_play)
    #> Source: local data frame [6 x 2]
    #>
    #> Offense Defense
    #> 1 NA 42P
    #> 2 21pu NA
    #> 3 NA 29pu
    #> 4 NA 31
    #> 5 NA 29
    #> 6 NA 31

    View full-size slide

  19. point_info <- vn_ss %>%
    gs_read_cellfeed(ws = 10, range = "D1:D4") %>%
    gs_simplify_cellfeed(col_names = FALSE)
    #> Accessing worksheet titled "10"
    point_info
    #> D1 D2 D3 D4
    #> "Seattle Rainmakers" "1" "3:12:00" "1:49:00"

    View full-size slide

  20. install.packages(“googlesheets”)
    ## highly recommend you use dev version
    library(devtools)
    install_github(“jennybc/googlesheets”)

    View full-size slide

  21. > gs_ls("Gap")
    Source: local data frame [5 x 10]
    sheet_title author perm version updated
    (chr) (chr) (chr) (chr) (time)
    1 Gapminder_copy gspreadr rw new 2016-02-19 07:42:37
    2 Gapminder 2007 Can Write gspreadr rw new 2015-04-15 19:33:55
    3 Gapminder gspreadr rw new 2015-03-23 20:59:10
    4 Gapminder_old gspreadr rw new 2015-03-23 20:23:06
    5 Gapminder by Continent joannaazhao rw new 2015-01-20 22:23:56
    Variables not shown: sheet_key (chr), ws_feed (chr), alternate (chr), self
    (chr),
    alt_key (chr).

    View full-size slide

  22. ss <- gs_title("Gapminder")
    ss <- gs_key("1BzfL0...")
    ss <- gs_url("https://docs.google.com/spreadsheets/d/1BzfL0.../"
    )
    register
    a Sheet

    View full-size slide

  23. > ss <- gs_title("Gapminder")
    > ss
    Spreadsheet title: Gapminder
    Spreadsheet author: gspreadr
    Date of googlesheets registration: 2016-03-02 07:02:38 GMT
    Date of last spreadsheet update: 2015-03-23 20:34:08 GMT
    visibility: private
    permissions: rw
    version: new
    Contains 5 worksheets:
    (Title): (Nominal worksheet extent as rows x columns)
    Africa: 625 x 6
    Americas: 301 x 6
    Asia: 397 x 6
    Europe: 361 x 6
    Oceania: 25 x 6
    Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA
    Browser URL: https://docs.google.com/spreadsheets/d/
    1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/

    View full-size slide

  24. ss <- gs_title(“Gapminder")
    gs_browse(ss)

    View full-size slide

  25. gs_gap()
    gs_gap_key()
    gs_gap_url()
    Gapminder Sheet
    available for
    examples, testing,
    practice

    View full-size slide

  26. gs_upload()
    gs_rename()
    gs_download()
    gs_delete()
    gs_vecdel()
    gs_grepdel()
    gs_new()
    gs_copy()

    View full-size slide

  27. gs_upload()
    .xls .xlsx .xlt .xltx .xltm .xlsm
    .csv .tsv .txt .tab
    .ods

    View full-size slide

  28. gs_download()
    .csv .pdf .xlsx

    View full-size slide

  29. gs_ws_ls()
    gs_ws_new()
    gs_ws_rename()
    gs_ws_delete()

    View full-size slide

  30. gs_read(..., ws = 3, ...)
    gs_read(..., ws = “Europe”, ...)
    gs_read(..., range = "D12:F15")
    gs_read(..., range = "R1C12:R6C15")
    gs_read(..., range = cell_limits(c(3, NA), c(NA, 10)))
    gs_read(..., range = cell_rows(1:100))
    gs_read(..., range = cell_cols(3:8))
    gs_read(..., range = cell_cols("B:MZ"))
    gs_read(..., range = anchored("B4", dim = c(2, 10)))
    limit cell
    consumption

    View full-size slide

  31. gap_ss <- gs_gap()
    oceania_crazy <-
    gs_read(gap_ss,
    ws = "Oceania",
    range = cell_rows(2:4),
    col_names = paste0("z", 1:6),
    col_types = "ccncnn",
    na = "1962")
    Source: local data frame [3 x 6]
    z1 z2 z3 z4 z5 z6
    (chr) (chr) (dbl) (chr) (dbl) (dbl)
    1 Australia Oceania 1952 69.12 8691212 10039.60
    2 Australia Oceania 1957 70.33 9712569 10949.65
    3 Australia Oceania NA 70.93 10794968 12217.23
    readr-style
    data ingest

    View full-size slide

  32. gs_new()
    gs_ws_new()
    gs_edit_cells()
    gs_add_row()

    View full-size slide

  33. library(googlesheets)
    cdp <- "http://i.imgur.com/lrg4uy5.jpg"
    cell_contents <- paste0("=IMAGE(\"", cdp, "\", 2)")
    ss <- gs_new("cute-dog-photo-in-cell-2", input = cell_contents)
    gs_browse(ss) ## you'll have to change the row size in the browser
    Visit the Sheet
    Visit the Gist

    View full-size slide

  34. gs_auth()
    gs_user()

    View full-size slide

  35. Technical bits of interest
    flexible auth flow:
    - “just works” for most people on 1st try
    - unattended use w/ tokens stored to file
    - works with Shiny
    we are actually testing
    - against the Sheets and Drive APIs
    - including operations that require authentication
    - on Travis-CI
    - vignette about that

    View full-size slide

  36. Deep thoughts
    My first “real” R package = inflicted on other people.
    Pros
    - It is actually useful and I wanted it to use it.
    - Pretty broad appeal.
    Cons:
    - I had no idea what I was getting into. None. Nada.
    - “What’s an API? XML? Oauth2?!?”
    What is/was hard about it has little to do with R, which
    is what I actually knew something about.

    View full-size slide

  37. Near term issues
    Refactoring OMG
    Strip numeric formatting (done-ish)
    Access formulas (done-ish)
    Navigate and manipulate Drive directory structure
    Get a handle on
    - browser actions re: sheet publishing and sharing
    - affect on Drive notions of permissions, “capabilities”
    - implications for API access via exportcsv link, list feed, cell feed
    Don’t require/send token when I don’t need auth!
    Encoding woes

    View full-size slide

  38. ffs_read_csv
    #> Source: local data frame [5 x 5]
    #>
    #> Number Number_wFormat Character Formulas Formula_wFormat
    #> (int) (chr) (chr) (chr) (chr)
    #> 1 123456 654,321 one Google 3.18E+05
    #> 2 345678 12.34% NA 1,271,591.00 52.63%
    #> 3 234567 1.23E+09 three NA 0.22
    #> 4 NA 3 1/7 four $A$1 123,456.00
    #> 5 567890 $0.36 five NA 317,898
    default = “literal values”

    View full-size slide

  39. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value numeric_value
    #> (chr) (chr) (chr)
    #> 1 Number_wFormat Number_wFormat NA
    #> 2 654,321 654321 654321.0
    #> 3 12.34% 12.34% 0.1234
    #> 4 1.23E+09 1234567890 1.23456789E9
    #> 5 3 1/7 3.14159265359 3.14159265359
    #> 6 $0.36 0.36 0.36

    View full-size slide

  40. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value
    #> (chr) (chr)
    #> 1 Formulas Formulas
    #> 2 Google =HYPERLINK("http://www.google.com/","Google")
    #> 3 1,271,591.00 =sum(R[-1]C[-3]:R[3]C[-3])
    #> 4 =IMAGE("https://www.google.com/images/srpr/logo3w.png"
    #> 5 $A$1 =ADDRESS(1,1)
    #> 6 =SPARKLINE(R[-4]C[-3]:R[0]C[-3])
    #> Variables not shown: numeric_value (chr).

    View full-size slide

  41. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value numeric_value
    #> (chr) (chr) (chr)
    #> 1 Formula_wFormat Formula_wFormat NA
    #> 2 3.18E+05 =average(R[0]C[-4]:R[4]C[-4]) 317897.75
    #> 3 52.63% =R[-1]C[-4]/R[1]C[-4] 0.5263144432081239
    #> 4 0.22 =R[-2]C[-4]/R[2]C[-4] 0.21739421366813996
    #> 5 123,456.00 =min(R[-3]C[-4]:R[1]C[-4]) 123456.0
    #> 6 317,898 =average(R2C1:R6C1) 317897.75

    View full-size slide

  42. #> Source: local data frame [5 x 5]
    #>
    #> Number Number_wFormat Character Formulas Formula_wFormat
    #> (int) (chr) (chr) (chr) (chr)
    #> 1 123456 654,321 one Google 3.18E+05
    #> 2 345678 12.34% NA 1,271,591.00 52.63%
    #> 3 234567 1.23E+09 three NA 0.22
    #> 4 NA 3 1/7 four $A$1 123,456.00
    #> 5 567890 $0.36 five NA 317,898
    now possible = “literal values,
    except when then input value is more appropriate,
    but sometimes the numeric value instead”

    View full-size slide

  43. I wish we could automatically get the data and tidy it!
    And parse those formulas.
    A worthy project for the future….

    View full-size slide