Upgrade to Pro — share decks privately, control downloads, hide ads and more …

googlesheets

 googlesheets

rOpenSci Community Call

Jennifer (Jenny) Bryan

March 02, 2016
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Technology

Transcript

  1. googlesheets
    @JennyBryan
    @jennybc
    @STAT545
    stat545-ubc.github.io

    View Slide

  2. For the
    rOpenSci Community Call
    2016 Mar 02

    View Slide

  3. +
    googlesheets =
    R wrapper around the
    Sheets & Drive APIs
    +

    View Slide

  4. thanks
    Joanna Zhao!

    View Slide

  5. View Slide

  6. Why would a UseR use a Google Sheet?
    collaboration
    - link, don’t attach!
    built-in functions to import web data
    - importxml(), importhtml(), importfeed()
    magic templates
    - TAGS, Amazon Price Tracker
    web services
    - ifttt.com
    Google
    - Forms
    - Explore
    - Statistics for Google Sheets

    View Slide

  7. View Slide

  8. Visit the Sheet: All the #rstats pipe tweets
    https://ifttt.com/

    View Slide

  9. import.io
    http://newsdev.github.io/driveshaft/

    View Slide

  10. https://www.cloudstitch.com
    26 Amazon Dash Button Hacks You Probably Didn't Know About

    View Slide

  11. Why a Google Sheet?
    They are of the web.
    They are eminently hackable.

    View Slide

  12. Why important to have an R
    interface to Google Sheets?
    Seriously?

    View Slide

  13. The participant who loses the most
    weight (as measured by the final
    weight as a percentage of his starting
    weight) shall be declared the winner.
    Jim
    Craig

    View Slide

  14. Craig won :(
    Read the contract on github.

    View Slide

  15. “Examples of permitted services
    include, but are not limited to,
    cooking, gardening, handiwork,
    caddying, office work, serving food
    and drinks or running errands.”

    View Slide

  16. ~1700 consecutive
    days of weight data

    View Slide

  17. iOS Android
    enter data from a phone
    enter data w/o WiFi

    View Slide

  18. library(googlesheets)
    ...
    jc_ss jc_dat jc_dat %
    mutate(date = as.Date(date, format = "%m/%d/%Y")) %>%
    gather("who", "weight", Jim, Craig)
    p geom_point(alpha = 0.35) + geom_smooth(se = FALSE, lwd = 2) +
    scale_y_continuous(labels = lbs_formatter)
    print(direct.label(p))

    View Slide

  19. View Slide

  20. I wanted live stats!

    View Slide

  21. vn_ss #> Sheet successfully identifed: "2015-05-23_seaRM-at-vanNH"
    game_play gs_read(vn_ss, ws = 10,
    range = cell_limits(c(2, NA), c(1, 2)))
    #> Accessing worksheet titled "10"
    head(game_play)
    #> Source: local data frame [6 x 2]
    #>
    #> Offense Defense
    #> 1 NA 42P
    #> 2 21pu NA
    #> 3 NA 29pu
    #> 4 NA 31
    #> 5 NA 29
    #> 6 NA 31

    View Slide

  22. point_info %
    gs_read_cellfeed(ws = 10, range = "D1:D4") %>%
    gs_simplify_cellfeed(col_names = FALSE)
    #> Accessing worksheet titled "10"
    point_info
    #> D1 D2 D3 D4
    #> "Seattle Rainmakers" "1" "3:12:00" "1:49:00"

    View Slide

  23. install.packages(“googlesheets”)
    ## highly recommend you use dev version
    library(devtools)
    install_github(“jennybc/googlesheets”)

    View Slide

  24. gs_*

    View Slide

  25. > gs_ls("Gap")
    Source: local data frame [5 x 10]
    sheet_title author perm version updated
    (chr) (chr) (chr) (chr) (time)
    1 Gapminder_copy gspreadr rw new 2016-02-19 07:42:37
    2 Gapminder 2007 Can Write gspreadr rw new 2015-04-15 19:33:55
    3 Gapminder gspreadr rw new 2015-03-23 20:59:10
    4 Gapminder_old gspreadr rw new 2015-03-23 20:23:06
    5 Gapminder by Continent joannaazhao rw new 2015-01-20 22:23:56
    Variables not shown: sheet_key (chr), ws_feed (chr), alternate (chr), self
    (chr),
    alt_key (chr).

    View Slide

  26. ss ss ss )
    register
    a Sheet

    View Slide

  27. > ss > ss
    Spreadsheet title: Gapminder
    Spreadsheet author: gspreadr
    Date of googlesheets registration: 2016-03-02 07:02:38 GMT
    Date of last spreadsheet update: 2015-03-23 20:34:08 GMT
    visibility: private
    permissions: rw
    version: new
    Contains 5 worksheets:
    (Title): (Nominal worksheet extent as rows x columns)
    Africa: 625 x 6
    Americas: 301 x 6
    Asia: 397 x 6
    Europe: 361 x 6
    Oceania: 25 x 6
    Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA
    Browser URL: https://docs.google.com/spreadsheets/d/
    1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/

    View Slide

  28. ss gs_browse(ss)

    View Slide

  29. gs_gap()
    gs_gap_key()
    gs_gap_url()
    Gapminder Sheet
    available for
    examples, testing,
    practice

    View Slide

  30. gs_upload()
    gs_rename()
    gs_download()
    gs_delete()
    gs_vecdel()
    gs_grepdel()
    gs_new()
    gs_copy()

    View Slide

  31. gs_upload()
    .xls .xlsx .xlt .xltx .xltm .xlsm
    .csv .tsv .txt .tab
    .ods

    View Slide

  32. gs_download()
    .csv .pdf .xlsx

    View Slide

  33. gs_ws_ls()
    gs_ws_new()
    gs_ws_rename()
    gs_ws_delete()

    View Slide

  34. gs_read()

    View Slide

  35. gs_read(..., ws = 3, ...)
    gs_read(..., ws = “Europe”, ...)
    gs_read(..., range = "D12:F15")
    gs_read(..., range = "R1C12:R6C15")
    gs_read(..., range = cell_limits(c(3, NA), c(NA, 10)))
    gs_read(..., range = cell_rows(1:100))
    gs_read(..., range = cell_cols(3:8))
    gs_read(..., range = cell_cols("B:MZ"))
    gs_read(..., range = anchored("B4", dim = c(2, 10)))
    limit cell
    consumption

    View Slide

  36. gap_ss oceania_crazy gs_read(gap_ss,
    ws = "Oceania",
    range = cell_rows(2:4),
    col_names = paste0("z", 1:6),
    col_types = "ccncnn",
    na = "1962")
    Source: local data frame [3 x 6]
    z1 z2 z3 z4 z5 z6
    (chr) (chr) (dbl) (chr) (dbl) (dbl)
    1 Australia Oceania 1952 69.12 8691212 10039.60
    2 Australia Oceania 1957 70.33 9712569 10949.65
    3 Australia Oceania NA 70.93 10794968 12217.23
    readr-style
    data ingest

    View Slide

  37. gs_new()
    gs_ws_new()
    gs_edit_cells()
    gs_add_row()

    View Slide

  38. library(googlesheets)
    cdp cell_contents ss gs_browse(ss) ## you'll have to change the row size in the browser
    Visit the Sheet
    Visit the Gist

    View Slide

  39. gs_auth()
    gs_user()

    View Slide

  40. Technical bits of interest
    flexible auth flow:
    - “just works” for most people on 1st try
    - unattended use w/ tokens stored to file
    - works with Shiny
    we are actually testing
    - against the Sheets and Drive APIs
    - including operations that require authentication
    - on Travis-CI
    - vignette about that

    View Slide

  41. Deep thoughts
    My first “real” R package = inflicted on other people.
    Pros
    - It is actually useful and I wanted it to use it.
    - Pretty broad appeal.
    Cons:
    - I had no idea what I was getting into. None. Nada.
    - “What’s an API? XML? Oauth2?!?”
    What is/was hard about it has little to do with R, which
    is what I actually knew something about.

    View Slide

  42. XML

    View Slide

  43. OAuth2

    View Slide

  44. Near term issues
    Refactoring OMG
    Strip numeric formatting (done-ish)
    Access formulas (done-ish)
    Navigate and manipulate Drive directory structure
    Get a handle on
    - browser actions re: sheet publishing and sharing
    - affect on Drive notions of permissions, “capabilities”
    - implications for API access via exportcsv link, list feed, cell feed
    Don’t require/send token when I don’t need auth!
    Encoding woes

    View Slide

  45. ffs_read_csv
    #> Source: local data frame [5 x 5]
    #>
    #> Number Number_wFormat Character Formulas Formula_wFormat
    #> (int) (chr) (chr) (chr) (chr)
    #> 1 123456 654,321 one Google 3.18E+05
    #> 2 345678 12.34% NA 1,271,591.00 52.63%
    #> 3 234567 1.23E+09 three NA 0.22
    #> 4 NA 3 1/7 four $A$1 123,456.00
    #> 5 567890 $0.36 five NA 317,898
    default = “literal values”

    View Slide

  46. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value numeric_value
    #> (chr) (chr) (chr)
    #> 1 Number_wFormat Number_wFormat NA
    #> 2 654,321 654321 654321.0
    #> 3 12.34% 12.34% 0.1234
    #> 4 1.23E+09 1234567890 1.23456789E9
    #> 5 3 1/7 3.14159265359 3.14159265359
    #> 6 $0.36 0.36 0.36

    View Slide

  47. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value
    #> (chr) (chr)
    #> 1 Formulas Formulas
    #> 2 Google =HYPERLINK("http://www.google.com/","Google")
    #> 3 1,271,591.00 =sum(R[-1]C[-3]:R[3]C[-3])
    #> 4 =IMAGE("https://www.google.com/images/srpr/logo3w.png"
    #> 5 $A$1 =ADDRESS(1,1)
    #> 6 =SPARKLINE(R[-4]C[-3]:R[0]C[-3])
    #> Variables not shown: numeric_value (chr).

    View Slide

  48. #> Source: local data frame [6 x 3]
    #>
    #> literal_value input_value numeric_value
    #> (chr) (chr) (chr)
    #> 1 Formula_wFormat Formula_wFormat NA
    #> 2 3.18E+05 =average(R[0]C[-4]:R[4]C[-4]) 317897.75
    #> 3 52.63% =R[-1]C[-4]/R[1]C[-4] 0.5263144432081239
    #> 4 0.22 =R[-2]C[-4]/R[2]C[-4] 0.21739421366813996
    #> 5 123,456.00 =min(R[-3]C[-4]:R[1]C[-4]) 123456.0
    #> 6 317,898 =average(R2C1:R6C1) 317897.75

    View Slide

  49. #> Source: local data frame [5 x 5]
    #>
    #> Number Number_wFormat Character Formulas Formula_wFormat
    #> (int) (chr) (chr) (chr) (chr)
    #> 1 123456 654,321 one Google 3.18E+05
    #> 2 345678 12.34% NA 1,271,591.00 52.63%
    #> 3 234567 1.23E+09 three NA 0.22
    #> 4 NA 3 1/7 four $A$1 123,456.00
    #> 5 567890 $0.36 five NA 317,898
    now possible = “literal values,
    except when then input value is more appropriate,
    but sometimes the numeric value instead”

    View Slide

  50. I wish we could automatically get the data and tidy it!
    And parse those formulas.
    A worthy project for the future….

    View Slide

  51. View Slide