Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing your inner data scientist

Growing your inner data scientist

So you've recently found a passion for data science. What next? In this talk I'll try to convince you of the value and effectiveness of sharing your work, contributing to open source projects, collaborating with others, and broadcasting your accomplishments (from small to big). In addition to why, I will also discuss how you can most effectively and efficiently do these, including giving pointers for tools you can use to streamline your process for building a public portfolio.

Mine Cetinkaya-Rundel

October 06, 2020
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Technology

Transcript

  1. growing your inner
    data scientist
    Mine Çetinkaya-Rundel
    University of Edinburgh + RStudio
    bit.ly/grow-ds- mine-cetinkaya-rundel
    [email protected]
    @minebocek

    View full-size slide

  2. statistician
    data scientist
    book author
    minebocek
    mine-cetinkaya-rundel
    mine
    citizenstatistician
    mine

    View full-size slide

  3. what is
    data science?

    View full-size slide

  4. datascience.berkeley.edu/about/what-is-data-science
    r4ds.had.co.nz/explore-intro.html
    oreilly.com/library/view/doing-data-science/9781449363871/ch01.html

    View full-size slide

  5. data science
    is evolving…

    View full-size slide

  6. 1 always
    be
    curious

    View full-size slide

  7. keep current books
    articles
    blogs

    View full-size slide

  8. keep engaged
    conferences
    workshops
    meetups
    webinars

    View full-size slide

  9. 2improve
    your
    workflow

    View full-size slide

  10. work reproducibly
    use version control

    View full-size slide

  11. r4ds.had.co.nz

    View full-size slide

  12. ropensci.github.io/reproducibility-guide/

    View full-size slide

  13. whattheyforgot.org

    View full-size slide

  14. 3share
    your
    output

    View full-size slide

  15. David Robinson
    @drob
    @rstudio::conf 2019, The Unreasonable Effectiveness of Public Work
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable

    View full-size slide

  16. share the things you create

    View full-size slide

  17. share the things you create
    big

    View full-size slide

  18. datasciencebox.org

    View full-size slide

  19. share the things you create
    little

    View full-size slide

  20. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  21. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  22. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  23. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  24. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  25. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  26. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View full-size slide

  27. Reality Open Your Eyes Look Up To The Skies And “See I’m Just A Poor Boy: I Need: No
    Sympathy Because I’m EASY “Come Easy Go Little High Little Low Any Way The Wind
    *blows *doesn’t Really Matter To Me To Me Mama Just Killed A Man, Put A Gun Against
    His Head Pulled My Trigger Now He’s Dead Mama Life Had Just Begun But Now I’ve “Gone”
    And "Thrown It All Away Mama Ooh Didn’t Mean To *make You Cry If I’m Not Back AGAIN?!
    This Time Tomorrow Carry On Carry On As If Nothing, Really Matters Too Late My Time
    Has Come *sends *shivers Down My Spine Body’s Aching All The Time Goodbye EVERYBODY
    I’ve Got To GO Gotta Leave You All Behind “And Face The Truth Mama Ooh Any Way The
    *wind Blows I Don’t WANNA Die I Sometimes Wish I’d Never Been “Born At All I See A
    Little Shiloetto Of A. Man scaramouche: Scaramouche: Will You Do The Fandango
    Thunderbolt And “Lightning Very Very Frightening Me: Galileo Galileo Galileo Galileo
    Galileo Figaro Magnifico. I’m JUST A Poor Boy Nobody *loves Me. He’s Just A Poor Boy:
    From A Poor Family Spare Him: His Life From This Monstrosity. Easy Come Easy Go Will
    You Let Me Go BISMILLAH! No ... We Will Not Let You Go Let Him: Go Bismillah! We Will
    Not Let You GO Let Him: Go Bismillah, We Will Not Let You GO, Let Me Go Will Not Let
    You: Go Let Me Go NEVER Let You Go Never Never Never Never Let Me: Go Oh O Oh, Oh, NO
    No No No No No No Oh MAMA Mia “Mama M.I.A. Mama/ Mia: Let Me Go Beezlebub Has A Devil
    Put Aside For Me “For Me For Me: So You Think You Can Stone Me: And Spit In My Eye So
    You: Think You Can Love Me And Leave Me: To Die Oh “Baby Can’t Do This To Me: Baby
    JUST Gotta Get Out Just Gotta Get Right Outta Here oOooOoOo Oooh Yeah Ooh Yeah,
    Nothing, Really Matters Anyone Can See Nothing Really “Matters Nothing Really Matters

    View full-size slide

  28. share the things you learn

    View full-size slide

  29. Mara Averick
    @dataandme
    EARL 2017, leaRning out loud
    SOMETIMES I GO ON TWITTER,
    AND I TEND TO LEARN OUT LOUD

    View full-size slide

  30. # March 2019
    library(tidyverse)
    ggplot(mtcars, aes(x = wt, y = mpg)) %>%
    geom_point()
    #> Error: `mapping` must be created by `aes()`
    #> Did you use %>% instead of +?

    View full-size slide

  31. share your questions

    View full-size slide

  32. Thiago Maciera
    “The Art of Problem Solving.” In Open Advice: FOSS: What We Wish We Had Known
    When We Started, edited by Lydia Pintscher, 55–61.
    THE MOST USELESS PROBLEM
    STATEMENT THAT ONE CAN FACE IS
    “IT DOESN’T WORK”,
    YET WE SEEM TO GET IT
    FAR TOO OFTEN.

    View full-size slide

  33. TEN SIMPLE RULES FOR GETTING HELP
    FROM ONLINE SCIENTIFIC COMMUNITIES
    1. Don’t be afraid to ask a question
    2. State the question clearly
    3. Learn established customs before posting
    4. Don’t ask what has already been answered
    5. Always use a good title
    6. Do your homework before posting
    7. Proofread your post
    8. Be courteous to other forum members
    9. Remember that the archive of your question
    can be helpful to others
    10. Give back to the community
    Dall’Olio, Giovanni M., Jacopo Marino, Michael Schubert, Kevin L. Keys, Melanie I. Stefan, Colin S. Gillespie, Pierre Poulain, et al. 2011. “Ten Simple Rules for Getting Help from Online Scientific Communities.” PLoS Computational
    Biology 7 (9): 10–12. doi:10.1371/journal.pcbi.1002202.

    View full-size slide

  34. suppose…
    # Goal: "1 a" "2 b" "3 c" "4 d" "5 e"

    View full-size slide

  35. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e”
    So I define X to be 1:5 and
    Y to be the first 5 letters of the alphabet,
    but when I add them I get the following error.
    Error in x + y : non-numeric argument to
    binary operator

    Q

    View full-size slide

  36. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e"
    Below is a screenshot of what I tried.
    Why is it not working?

    Q

    View full-size slide

  37. library(reprex)
    Prepare reproducible examples for posting to GitHub
    issues, StackOverflow, or Slack snippets.
    writing good questions

    View full-size slide

  38. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e"
    Below is what I tried.
    What does this error mean, and how can I fix it?

    Q x <- 1:5
    y <- letters[1:5]
    x + y
    #> Error in x + y: non-numeric argument to binary operator

    View full-size slide

  39. 4contribute
    to
    community

    View full-size slide

  40. find open source projects
    you enjoy,
    and start contributing

    View full-size slide

  41. contribute to books

    View full-size slide

  42. contribute to packages

    View full-size slide

  43. get the pulse of a project
    read the code
    contributing to oss
    watch the repo
    discuss your ideas
    make a pull request
    review CoC + contributing guide

    View full-size slide


  44. readr.tidyverse.org/reference/index.html
    how to

    View full-size slide

  45. 5collaborate
    with
    others

    View full-size slide

  46. collaborate on process

    View full-size slide

  47. collaborate in class

    View full-size slide

  48. Lorem ipsum dolor sit amet, consectetur
    adipiscing elit. Proin vulputate feugiat lacus
    eu lobortis. Mauris dictum ultrices tortor sit
    amet tincidunt. Cras magna metus, volutpat
    eu tempus nec, mattis vel nibh. Vivamus eros
    tellus, lobortis id molestie quis, feugiat sed
    lorem. Proin quis pellentesque justo, vitae elementum eros. Proin
    orci ex, dignissim sed urna in, congue fringilla nisi. Aliquam id urna
    orci. Vestibulum consequat, enim et sodales finibus, nunc arcu
    condimentum odio, rhoncus venenatis ligula sem at lectus. Nullam
    nec porttitor nisl.
    Fusce hendrerit, mauris sed iaculis gravida,
    odio diam lacinia diam, bibendum dapibus
    metus mi imperdiet ex. Praesent ac urna
    scelerisque, condimentum est vitae,
    pellentesque erat. Integer sed hendrerit ex.
    Sed facilisis sollicitudin venenatis. Nulla fringilla lorem at metus
    maximus cursus. Morbi facilisis turpis at purus volutpat bibendum
    quis quis eros. Ut id odio interdum, luctus mauris nec, pharetra
    quam. Nullam velit risus, consectetur ac faucibus eu, lacinia ut
    neque. Pellentesque ut aliquet libero. Ut ac neque eget nunc
    hendrerit commodo. Maecenas vel ultrices augue.
    blog post
    portfolio entry
    competition submission

    View full-size slide

  49. USRESP: Undergraduate Research Project Competition
    Friday, 18 Dec 2020
    causeweb.org/usproc/usresp
    USCLAP: Undergraduate Class Project Competition
    Friday, 18 Dec 2020
    causeweb.org/usproc/usclap
    Kaggle: Prediction competition

    kaggle.com/competitions

    View full-size slide

  50. collaborate outside class

    View full-size slide

  51. bit.ly/df-edi

    View full-size slide

  52. 6broadcast
    your
    work

    View full-size slide

  53. make data visualizations

    View full-size slide

  54. Every Tuesday
    github.com/rfordatascience/tidytuesday
    #TidyTuesday

    View full-size slide

  55. speak at events

    View full-size slide

  56. write blog posts

    View full-size slide

  57. bookdown.org/yihui/blogdown
    alison.rbind.io/post/up-and-running-with-blogdown

    View full-size slide

  58. keeping a blog alive
    find co-authors
    keep it regular
    write themed posts
    review events

    View full-size slide

  59. 1
    2
    3
    4
    5
    6
    always be curious
    improve your workflow
    share your output
    contribute to community
    collaborate with others
    broadcast your work

    View full-size slide

  60. mine-cetinkaya-rundel
    [email protected]
    @minebocek
    growing your
    inner data
    scientist
    bit.ly/grow-ds-future
    Mine Çetinkaya-Rundel
    University of Edinburgh + Duke University + RStudio

    View full-size slide