$30 off During Our Annual Pro Sale. View Details »

Growing your inner data scientist

Growing your inner data scientist

Mine Cetinkaya-Rundel

September 27, 2021
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Technology

Transcript

  1. growing your inner data scientist:


    tips to success in data science,


    from questions to results
    Mine Çetinkaya-Rundel


    Duke University + RStudio
    🔗 bit.ly/grow-ds-21 mine-cetinkaya-rundel
    [email protected]
    @minebocek

    View Slide

  2. professor of statistics
    data scientist
    book author
    minebocek
    mine-cetinkaya-rundel
    mine
    citizenstatistician

    View Slide

  3. what is


    data science?

    View Slide

  4. datascience.berkeley.edu/about/what-is-data-science


    r4ds.had.co.nz/explore-intro.html


    oreilly.com/library/view/doing-data-science/9781449363871/ch01.html

    View Slide

  5. data science


    is vague


    and evolving…

    View Slide

  6. 1 always


    be


    curious

    View Slide

  7. keep informed books
    articles
    blogs

    View Slide

  8. View Slide

  9. keep current

    View Slide

  10. View Slide

  11. keep engaged
    conferences
    workshops
    meetups
    webinars

    View Slide

  12. View Slide

  13. 2improve


    your


    workflow

    View Slide

  14. rstats.wtf

    View Slide

  15. r4ds.had.co.nz

    View Slide

  16. 3share


    your


    output

    View Slide

  17. David Robinson


    @drob


    @rstudio::conf 2019, The Unreasonable E
    ff
    ectiveness of Public Work
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable
    Idea
    Published
    paper
    Preliminary
    results
    Draft
    manuscript
    Completed
    manuscript
    How I used to think of my goals:
    More valuable
    Less valuable
    Anything still
    on your computer
    Anything out
    in the world
    (Data, code, results,
    draft, finished paper)
    (Paper, preprint, product,
    blog post, open source,
    tweet)
    How I should have been thinking of them:
    More valuable
    Less valuable

    View Slide

  18. share the things you create

    View Slide

  19. share the things you create
    big

    View Slide

  20. datasciencebox.org
    datasciencebox.org

    View Slide

  21. View Slide

  22. share the things you create
    little

    View Slide

  23. View Slide

  24. share the things you learn

    View Slide

  25. Mara Averick


    @dataandme


    EARL 2017, leaRning out loud
    SOMETIMES I GO ON TWITTER,


    AND I TEND TO LEARN OUT LOUD

    View Slide

  26. View Slide

  27. View Slide

  28. # March 2019


    library(tidyverse)


    ggplot(mtcars, aes(x = wt, y = mpg)) %>%


    geom_point()


    #> Error: `mapping` must be created by `aes()`


    #> Did you use %>% instead of +?

    View Slide

  29. share your questions

    View Slide

  30. Thiago Maciera


    “The Art of Problem Solving.” In Open Advice: FOSS: What We Wish We Had Known
    When We Started, edited by Lydia Pintscher, 55–61.
    THE MOST USELESS PROBLEM
    STATEMENT THAT ONE CAN FACE IS


    “IT DOESN’T WORK”,


    YET WE SEEM TO GET IT


    FAR TOO OFTEN.

    View Slide

  31. TEN SIMPLE RULES FOR GETTING HELP
    FROM ONLINE SCIENTIFIC COMMUNITIES
    1. Don’t be afraid to ask a question


    2. State the question clearly


    3. Learn established customs before posting


    4. Don’t ask what has already been answered


    5. Always use a good title


    6. Do your homework before posting


    7. Proofread your post


    8. Be courteous to other forum members


    9. Remember that the archive of your question
    can be helpful to others


    10. Give back to the community
    Dall’Olio, Giovanni M., Jacopo Marino, Michael Schubert, Kevin L. Keys, Melanie I. Stefan, Colin S. Gillespie, Pierre Poulain, et al. 2011. “Ten Simple Rules for Getting Help from Online Scientific Communities.” PLoS Computational
    Biology 7 (9): 10–12. doi:10.1371/journal.pcbi.1002202.

    View Slide

  32. suppose…
    # Goal: "1 a" "2 b" "3 c" "4 d" "5 e"

    View Slide

  33. I’m trying to create the following vector in R:


    "1 a" "2 b" "3 c" "4 d" "5 e”


    So I define X to be 1
    :
    5 and


    Y to be the first 5 letters of the alphabet,


    but when I add them I get the following error.


    Error in x + y : non
    -
    numeric argument to
    binary operator


    🤷
    Q

    View Slide

  34. I’m trying to create the following vector in R:


    "1 a" "2 b" "3 c" "4 d" "5 e"


    Below is a screenshot of what I tried.


    Why is it not working?
    🤷
    Q

    View Slide

  35. library(reprex)
    Prepare reproducible examples for posting to GitHub
    issues, StackOverflow, or Slack snippets.
    writing good questions

    View Slide

  36. I’m trying to create the following vector in R:


    "1 a" "2 b" "3 c" "4 d" "5 e”


    Below is what I tried.


    What does this error mean, and how can I fix it?
    🤷
    Q x
    < -
    1
    :
    5


    y
    < -
    letters[1
    :
    5]


    x + y


    #> Error in x + y: non
    -
    numeric argument to binary operator

    View Slide

  37. 4contribute


    to


    community

    View Slide

  38. find open source projects
    you enjoy,


    and start contributing

    View Slide

  39. contribute to books

    View Slide

  40. View Slide

  41. View Slide

  42. contribute to packages

    View Slide

  43. View Slide

  44. get the pulse of a project
    read the code
    contributing to oss
    watch the repo
    discuss your ideas
    make a pull request
    review CoC + contributing guide

    View Slide

  45. 5collaborate


    with


    others

    View Slide

  46. collaborate on process

    View Slide

  47. View Slide

  48. collaborate in class

    View Slide

  49. Lorem ipsum dolor sit amet, consectetur


    adipiscing elit. Proin vulputate feugiat lacus


    eu lobortis. Mauris dictum ultrices tortor sit


    amet tincidunt. Cras magna metus, volutpat


    eu tempus nec, mattis vel nibh. Vivamus eros


    tellus, lobortis id molestie quis, feugiat sed


    lorem. Proin quis pellentesque justo, vitae elementum eros. Proin
    orci ex, dignissim sed urna in, congue fringilla nisi. Aliquam id urna
    orci. Vestibulum consequat, enim et sodales finibus, nunc arcu
    condimentum odio, rhoncus venenatis ligula sem at lectus. Nullam
    nec porttitor nisl.


    Fusce hendrerit, mauris sed iaculis gravida,


    odio diam lacinia diam, bibendum dapibus


    metus mi imperdiet ex. Praesent ac urna


    scelerisque, condimentum est vitae,


    pellentesque erat. Integer sed hendrerit ex.
    Sed facilisis sollicitudin venenatis. Nulla fringilla lorem at metus
    maximus cursus. Morbi facilisis turpis at purus volutpat bibendum
    quis quis eros. Ut id odio interdum, luctus mauris nec, pharetra
    quam. Nullam velit risus, consectetur ac faucibus eu, lacinia ut
    neque. Pellentesque ut aliquet libero. Ut ac neque eget nunc
    hendrerit commodo. Maecenas vel ultrices augue.
    blog post


    portfolio entry


    competition submission



    View Slide

  50. collaborate outside class

    View Slide

  51. John M. Chambers Statistical So
    ft
    ware Award


    🔗 stat-computing.org/awards/jmc
    ASA StatComp Student Paper Competition


    🔗 stat-computing.org/awards/student
    Kaggle: Prediction competition


    🔗 kaggle.com/competitions

    View Slide

  52. 6broadcast


    your


    work

    View Slide

  53. make data visualizations

    View Slide

  54. 🗓 Every Tuesday


    🔗 github.com/rfordatascience/tidytuesday


    🐦 #TidyTuesday

    View Slide

  55. speak at events

    View Slide

  56. View Slide

  57. write blog posts

    View Slide

  58. bookdown.org/yihui/blogdown
    apreshill.com/blog/2020-12-new-year-new-blogdown

    View Slide

  59. keeping a blog alive
    find co-authors
    keep it regular
    write themed posts
    review events

    View Slide

  60. 1


    2


    3


    4


    5


    6
    always be curious


    improve your workflow


    share your output


    contribute to community


    collaborate with others


    broadcast your work

    View Slide

  61. View Slide

  62. mine-cetinkaya-rundel
    [email protected]
    @minebocek
    growing your
    inner data
    scientist
    🔗 bit.ly/grow-ds-21
    Mine Çetinkaya-Rundel


    Duke University + RStudio

    View Slide