Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing your inner data scientist

Growing your inner data scientist

So you've recently found a passion for data science. What next? In this talk I'll try to convince you of the value and effectiveness of sharing your work, contributing to open source projects, collaborating with others, and broadcasting your accomplishments (from small to big). In addition to why, I will also discuss how you can most effectively and efficiently do these, including giving pointers for tools you can use to streamline your process for building a public portfolio.

Mine Cetinkaya-Rundel

October 06, 2020
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Technology

Transcript

  1. growing your inner
    data scientist
    Mine Çetinkaya-Rundel
    University of Edinburgh + RStudio
    bit.ly/grow-ds- mine-cetinkaya-rundel
    [email protected]
    @minebocek

    View Slide

  2. statistician
    data scientist
    book author
    minebocek
    mine-cetinkaya-rundel
    mine
    citizenstatistician
    mine

    View Slide

  3. what is
    data science?

    View Slide

  4. datascience.berkeley.edu/about/what-is-data-science
    r4ds.had.co.nz/explore-intro.html
    oreilly.com/library/view/doing-data-science/9781449363871/ch01.html

    View Slide

  5. data science
    is evolving…

    View Slide

  6. 1 always
    be
    curious

    View Slide

  7. keep current books
    articles
    blogs

    View Slide

  8. View Slide

  9. keep engaged
    conferences
    workshops
    meetups
    webinars

    View Slide

  10. View Slide

  11. 2improve
    your
    workflow

    View Slide

  12. work reproducibly
    use version control

    View Slide

  13. r4ds.had.co.nz

    View Slide

  14. ropensci.github.io/reproducibility-guide/

    View Slide

  15. whattheyforgot.org

    View Slide

  16. 3share
    your
    output

    View Slide

  17. David Robinson
    @drob
    @rstudio::conf 2019, The Unreasonable Effectiveness of Public Work
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable

    View Slide

  18. share the things you create

    View Slide

  19. share the things you create
    big

    View Slide

  20. datasciencebox.org

    View Slide

  21. View Slide

  22. share the things you create
    little

    View Slide

  23. View Slide

  24. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  25. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  26. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  27. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  28. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  29. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  30. library(tidyverse)
    library(rtweet)
    library(glue)
    tml <- get_timelines("CostcoRiceBag", n = 3200)
    br <- tml %>%
    filter(is.na(reply_to_screen_name)) %>%
    slice(
    which(
    str_detect(text, "IS IT JUST ME")):
    max(which(str_detect(text, "[bB]lows")
    )
    )
    ) %>%
    mutate(first_word = word(text, 1))
    glue_collapse(br$first_word, sep = " ")

    View Slide

  31. Reality Open Your Eyes Look Up To The Skies And “See I’m Just A Poor Boy: I Need: No
    Sympathy Because I’m EASY “Come Easy Go Little High Little Low Any Way The Wind
    *blows *doesn’t Really Matter To Me To Me Mama Just Killed A Man, Put A Gun Against
    His Head Pulled My Trigger Now He’s Dead Mama Life Had Just Begun But Now I’ve “Gone”
    And "Thrown It All Away Mama Ooh Didn’t Mean To *make You Cry If I’m Not Back AGAIN?!
    This Time Tomorrow Carry On Carry On As If Nothing, Really Matters Too Late My Time
    Has Come *sends *shivers Down My Spine Body’s Aching All The Time Goodbye EVERYBODY
    I’ve Got To GO Gotta Leave You All Behind “And Face The Truth Mama Ooh Any Way The
    *wind Blows I Don’t WANNA Die I Sometimes Wish I’d Never Been “Born At All I See A
    Little Shiloetto Of A. Man scaramouche: Scaramouche: Will You Do The Fandango
    Thunderbolt And “Lightning Very Very Frightening Me: Galileo Galileo Galileo Galileo
    Galileo Figaro Magnifico. I’m JUST A Poor Boy Nobody *loves Me. He’s Just A Poor Boy:
    From A Poor Family Spare Him: His Life From This Monstrosity. Easy Come Easy Go Will
    You Let Me Go BISMILLAH! No ... We Will Not Let You Go Let Him: Go Bismillah! We Will
    Not Let You GO Let Him: Go Bismillah, We Will Not Let You GO, Let Me Go Will Not Let
    You: Go Let Me Go NEVER Let You Go Never Never Never Never Let Me: Go Oh O Oh, Oh, NO
    No No No No No No Oh MAMA Mia “Mama M.I.A. Mama/ Mia: Let Me Go Beezlebub Has A Devil
    Put Aside For Me “For Me For Me: So You Think You Can Stone Me: And Spit In My Eye So
    You: Think You Can Love Me And Leave Me: To Die Oh “Baby Can’t Do This To Me: Baby
    JUST Gotta Get Out Just Gotta Get Right Outta Here oOooOoOo Oooh Yeah Ooh Yeah,
    Nothing, Really Matters Anyone Can See Nothing Really “Matters Nothing Really Matters

    View Slide

  32. share the things you learn

    View Slide

  33. Mara Averick
    @dataandme
    EARL 2017, leaRning out loud
    SOMETIMES I GO ON TWITTER,
    AND I TEND TO LEARN OUT LOUD

    View Slide

  34. View Slide

  35. # March 2019
    library(tidyverse)
    ggplot(mtcars, aes(x = wt, y = mpg)) %>%
    geom_point()
    #> Error: `mapping` must be created by `aes()`
    #> Did you use %>% instead of +?

    View Slide

  36. share your questions

    View Slide

  37. Thiago Maciera
    “The Art of Problem Solving.” In Open Advice: FOSS: What We Wish We Had Known
    When We Started, edited by Lydia Pintscher, 55–61.
    THE MOST USELESS PROBLEM
    STATEMENT THAT ONE CAN FACE IS
    “IT DOESN’T WORK”,
    YET WE SEEM TO GET IT
    FAR TOO OFTEN.

    View Slide

  38. TEN SIMPLE RULES FOR GETTING HELP
    FROM ONLINE SCIENTIFIC COMMUNITIES
    1. Don’t be afraid to ask a question
    2. State the question clearly
    3. Learn established customs before posting
    4. Don’t ask what has already been answered
    5. Always use a good title
    6. Do your homework before posting
    7. Proofread your post
    8. Be courteous to other forum members
    9. Remember that the archive of your question
    can be helpful to others
    10. Give back to the community
    Dall’Olio, Giovanni M., Jacopo Marino, Michael Schubert, Kevin L. Keys, Melanie I. Stefan, Colin S. Gillespie, Pierre Poulain, et al. 2011. “Ten Simple Rules for Getting Help from Online Scientific Communities.” PLoS Computational
    Biology 7 (9): 10–12. doi:10.1371/journal.pcbi.1002202.

    View Slide

  39. suppose…
    # Goal: "1 a" "2 b" "3 c" "4 d" "5 e"

    View Slide

  40. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e”
    So I define X to be 1:5 and
    Y to be the first 5 letters of the alphabet,
    but when I add them I get the following error.
    Error in x + y : non-numeric argument to
    binary operator

    Q

    View Slide

  41. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e"
    Below is a screenshot of what I tried.
    Why is it not working?

    Q

    View Slide

  42. library(reprex)
    Prepare reproducible examples for posting to GitHub
    issues, StackOverflow, or Slack snippets.
    writing good questions

    View Slide

  43. I’m trying to create the following vector in R:
    "1 a" "2 b" "3 c" "4 d" "5 e"
    Below is what I tried.
    What does this error mean, and how can I fix it?

    Q x <- 1:5
    y <- letters[1:5]
    x + y
    #> Error in x + y: non-numeric argument to binary operator

    View Slide

  44. 4contribute
    to
    community

    View Slide

  45. find open source projects
    you enjoy,
    and start contributing

    View Slide

  46. contribute to books

    View Slide

  47. View Slide

  48. View Slide

  49. contribute to packages

    View Slide

  50. View Slide

  51. get the pulse of a project
    read the code
    contributing to oss
    watch the repo
    discuss your ideas
    make a pull request
    review CoC + contributing guide

    View Slide


  52. readr.tidyverse.org/reference/index.html
    how to

    View Slide

  53. View Slide

  54. 5collaborate
    with
    others

    View Slide

  55. collaborate on process

    View Slide

  56. View Slide

  57. collaborate in class

    View Slide

  58. Lorem ipsum dolor sit amet, consectetur
    adipiscing elit. Proin vulputate feugiat lacus
    eu lobortis. Mauris dictum ultrices tortor sit
    amet tincidunt. Cras magna metus, volutpat
    eu tempus nec, mattis vel nibh. Vivamus eros
    tellus, lobortis id molestie quis, feugiat sed
    lorem. Proin quis pellentesque justo, vitae elementum eros. Proin
    orci ex, dignissim sed urna in, congue fringilla nisi. Aliquam id urna
    orci. Vestibulum consequat, enim et sodales finibus, nunc arcu
    condimentum odio, rhoncus venenatis ligula sem at lectus. Nullam
    nec porttitor nisl.
    Fusce hendrerit, mauris sed iaculis gravida,
    odio diam lacinia diam, bibendum dapibus
    metus mi imperdiet ex. Praesent ac urna
    scelerisque, condimentum est vitae,
    pellentesque erat. Integer sed hendrerit ex.
    Sed facilisis sollicitudin venenatis. Nulla fringilla lorem at metus
    maximus cursus. Morbi facilisis turpis at purus volutpat bibendum
    quis quis eros. Ut id odio interdum, luctus mauris nec, pharetra
    quam. Nullam velit risus, consectetur ac faucibus eu, lacinia ut
    neque. Pellentesque ut aliquet libero. Ut ac neque eget nunc
    hendrerit commodo. Maecenas vel ultrices augue.
    blog post
    portfolio entry
    competition submission

    View Slide

  59. USRESP: Undergraduate Research Project Competition
    Friday, 18 Dec 2020
    causeweb.org/usproc/usresp
    USCLAP: Undergraduate Class Project Competition
    Friday, 18 Dec 2020
    causeweb.org/usproc/usclap
    Kaggle: Prediction competition

    kaggle.com/competitions

    View Slide

  60. collaborate outside class

    View Slide

  61. bit.ly/df-edi

    View Slide

  62. 6broadcast
    your
    work

    View Slide

  63. make data visualizations

    View Slide

  64. Every Tuesday
    github.com/rfordatascience/tidytuesday
    #TidyTuesday

    View Slide

  65. speak at events

    View Slide

  66. View Slide

  67. write blog posts

    View Slide

  68. bookdown.org/yihui/blogdown
    alison.rbind.io/post/up-and-running-with-blogdown

    View Slide

  69. keeping a blog alive
    find co-authors
    keep it regular
    write themed posts
    review events

    View Slide

  70. 1
    2
    3
    4
    5
    6
    always be curious
    improve your workflow
    share your output
    contribute to community
    collaborate with others
    broadcast your work

    View Slide

  71. mine-cetinkaya-rundel
    [email protected]
    @minebocek
    growing your
    inner data
    scientist
    bit.ly/grow-ds-future
    Mine Çetinkaya-Rundel
    University of Edinburgh + Duke University + RStudio

    View Slide