Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing your inner data scientist

Growing your inner data scientist

So you've recently found a passion for data science. What next? In this talk I'll try to convince you of the value and effectiveness of sharing your work, contributing to open source projects, collaborating with others, and broadcasting your accomplishments (from small to big). In addition to why, I will also discuss how you can most effectively and efficiently do these, including giving pointers for tools you can use to streamline your process for building a public portfolio.

81689b093f75cf3f383e581ca57188df?s=128

Mine Cetinkaya-Rundel

October 06, 2020
Tweet

Transcript

  1. growing your inner data scientist Mine Çetinkaya-Rundel University of Edinburgh

    + RStudio bit.ly/grow-ds- mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek
  2. statistician data scientist book author minebocek mine-cetinkaya-rundel mine citizenstatistician mine

  3. what is data science?

  4. datascience.berkeley.edu/about/what-is-data-science r4ds.had.co.nz/explore-intro.html oreilly.com/library/view/doing-data-science/9781449363871/ch01.html

  5. data science is evolving…

  6. 1 always be curious

  7. keep current books articles blogs

  8. None
  9. keep engaged conferences workshops meetups webinars

  10. None
  11. 2improve your workflow

  12. work reproducibly use version control

  13. r4ds.had.co.nz

  14. ropensci.github.io/reproducibility-guide/

  15. whattheyforgot.org

  16. 3share your output

  17. David Robinson @drob @rstudio::conf 2019, The Unreasonable Effectiveness of Public

    Work Idea Published paper Preliminary results Draft manuscript Completed manuscript How I used to think of my goals: More valuable Less valuable Anything still on your computer Anything out in the world (Data, code, results, draft, finished paper) (Paper, preprint, product, blog post, open source, tweet) How I should have been thinking of them: More valuable Less valuable Idea Published paper Preliminary results Draft manuscript Completed manuscript How I used to think of my goals: More valuable Less valuable Anything still on your computer Anything out in the world (Data, code, results, draft, finished paper) (Paper, preprint, product, blog post, open source, tweet) How I should have been thinking of them: More valuable Less valuable
  18. share the things you create

  19. share the things you create big

  20. datasciencebox.org

  21. None
  22. share the things you create little

  23. None
  24. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  25. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  26. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  27. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  28. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  29. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  30. library(tidyverse) library(rtweet) library(glue) tml <- get_timelines("CostcoRiceBag", n = 3200) br

    <- tml %>% filter(is.na(reply_to_screen_name)) %>% slice( which( str_detect(text, "IS IT JUST ME")): max(which(str_detect(text, "[bB]lows") ) ) ) %>% mutate(first_word = word(text, 1)) glue_collapse(br$first_word, sep = " ")
  31. Reality Open Your Eyes Look Up To The Skies And

    “See I’m Just A Poor Boy: I Need: No Sympathy Because I’m EASY “Come Easy Go Little High Little Low Any Way The Wind *blows *doesn’t Really Matter To Me To Me Mama Just Killed A Man, Put A Gun Against His Head Pulled My Trigger Now He’s Dead Mama Life Had Just Begun But Now I’ve “Gone” And "Thrown It All Away Mama Ooh Didn’t Mean To *make You Cry If I’m Not Back AGAIN?! This Time Tomorrow Carry On Carry On As If Nothing, Really Matters Too Late My Time Has Come *sends *shivers Down My Spine Body’s Aching All The Time Goodbye EVERYBODY I’ve Got To GO Gotta Leave You All Behind “And Face The Truth Mama Ooh Any Way The *wind Blows I Don’t WANNA Die I Sometimes Wish I’d Never Been “Born At All I See A Little Shiloetto Of A. Man scaramouche: Scaramouche: Will You Do The Fandango Thunderbolt And “Lightning Very Very Frightening Me: Galileo Galileo Galileo Galileo Galileo Figaro Magnifico. I’m JUST A Poor Boy Nobody *loves Me. He’s Just A Poor Boy: From A Poor Family Spare Him: His Life From This Monstrosity. Easy Come Easy Go Will You Let Me Go BISMILLAH! No ... We Will Not Let You Go Let Him: Go Bismillah! We Will Not Let You GO Let Him: Go Bismillah, We Will Not Let You GO, Let Me Go Will Not Let You: Go Let Me Go NEVER Let You Go Never Never Never Never Let Me: Go Oh O Oh, Oh, NO No No No No No No Oh MAMA Mia “Mama M.I.A. Mama/ Mia: Let Me Go Beezlebub Has A Devil Put Aside For Me “For Me For Me: So You Think You Can Stone Me: And Spit In My Eye So You: Think You Can Love Me And Leave Me: To Die Oh “Baby Can’t Do This To Me: Baby JUST Gotta Get Out Just Gotta Get Right Outta Here oOooOoOo Oooh Yeah Ooh Yeah, Nothing, Really Matters Anyone Can See Nothing Really “Matters Nothing Really Matters
  32. share the things you learn

  33. Mara Averick @dataandme EARL 2017, leaRning out loud SOMETIMES I

    GO ON TWITTER, AND I TEND TO LEARN OUT LOUD
  34. None
  35. # March 2019 library(tidyverse) ggplot(mtcars, aes(x = wt, y =

    mpg)) %>% geom_point() #> Error: `mapping` must be created by `aes()` #> Did you use %>% instead of +?
  36. share your questions

  37. Thiago Maciera “The Art of Problem Solving.” In Open Advice:

    FOSS: What We Wish We Had Known When We Started, edited by Lydia Pintscher, 55–61. THE MOST USELESS PROBLEM STATEMENT THAT ONE CAN FACE IS “IT DOESN’T WORK”, YET WE SEEM TO GET IT FAR TOO OFTEN.
  38. TEN SIMPLE RULES FOR GETTING HELP FROM ONLINE SCIENTIFIC COMMUNITIES

    1. Don’t be afraid to ask a question 2. State the question clearly 3. Learn established customs before posting 4. Don’t ask what has already been answered 5. Always use a good title 6. Do your homework before posting 7. Proofread your post 8. Be courteous to other forum members 9. Remember that the archive of your question can be helpful to others 10. Give back to the community Dall’Olio, Giovanni M., Jacopo Marino, Michael Schubert, Kevin L. Keys, Melanie I. Stefan, Colin S. Gillespie, Pierre Poulain, et al. 2011. “Ten Simple Rules for Getting Help from Online Scientific Communities.” PLoS Computational Biology 7 (9): 10–12. doi:10.1371/journal.pcbi.1002202.
  39. suppose… # Goal: "1 a" "2 b" "3 c" "4

    d" "5 e"
  40. I’m trying to create the following vector in R: "1

    a" "2 b" "3 c" "4 d" "5 e” So I define X to be 1:5 and Y to be the first 5 letters of the alphabet, but when I add them I get the following error. Error in x + y : non-numeric argument to binary operator Q
  41. I’m trying to create the following vector in R: "1

    a" "2 b" "3 c" "4 d" "5 e" Below is a screenshot of what I tried. Why is it not working? Q
  42. library(reprex) Prepare reproducible examples for posting to GitHub issues, StackOverflow,

    or Slack snippets. writing good questions
  43. I’m trying to create the following vector in R: "1

    a" "2 b" "3 c" "4 d" "5 e" Below is what I tried. What does this error mean, and how can I fix it? Q x <- 1:5 y <- letters[1:5] x + y #> Error in x + y: non-numeric argument to binary operator
  44. 4contribute to community

  45. find open source projects you enjoy, and start contributing

  46. contribute to books

  47. None
  48. None
  49. contribute to packages

  50. None
  51. get the pulse of a project read the code contributing

    to oss watch the repo discuss your ideas make a pull request review CoC + contributing guide
  52. readr.tidyverse.org/reference/index.html how to

  53. None
  54. 5collaborate with others

  55. collaborate on process

  56. None
  57. collaborate in class

  58. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin vulputate

    feugiat lacus eu lobortis. Mauris dictum ultrices tortor sit amet tincidunt. Cras magna metus, volutpat eu tempus nec, mattis vel nibh. Vivamus eros tellus, lobortis id molestie quis, feugiat sed lorem. Proin quis pellentesque justo, vitae elementum eros. Proin orci ex, dignissim sed urna in, congue fringilla nisi. Aliquam id urna orci. Vestibulum consequat, enim et sodales finibus, nunc arcu condimentum odio, rhoncus venenatis ligula sem at lectus. Nullam nec porttitor nisl. Fusce hendrerit, mauris sed iaculis gravida, odio diam lacinia diam, bibendum dapibus metus mi imperdiet ex. Praesent ac urna scelerisque, condimentum est vitae, pellentesque erat. Integer sed hendrerit ex. Sed facilisis sollicitudin venenatis. Nulla fringilla lorem at metus maximus cursus. Morbi facilisis turpis at purus volutpat bibendum quis quis eros. Ut id odio interdum, luctus mauris nec, pharetra quam. Nullam velit risus, consectetur ac faucibus eu, lacinia ut neque. Pellentesque ut aliquet libero. Ut ac neque eget nunc hendrerit commodo. Maecenas vel ultrices augue. blog post portfolio entry competition submission …
  59. USRESP: Undergraduate Research Project Competition Friday, 18 Dec 2020 causeweb.org/usproc/usresp

    USCLAP: Undergraduate Class Project Competition Friday, 18 Dec 2020 causeweb.org/usproc/usclap Kaggle: Prediction competition … kaggle.com/competitions
  60. collaborate outside class

  61. bit.ly/df-edi

  62. 6broadcast your work

  63. make data visualizations

  64. Every Tuesday github.com/rfordatascience/tidytuesday #TidyTuesday

  65. speak at events

  66. None
  67. write blog posts

  68. bookdown.org/yihui/blogdown alison.rbind.io/post/up-and-running-with-blogdown

  69. keeping a blog alive find co-authors keep it regular write

    themed posts review events
  70. 1 2 3 4 5 6 always be curious improve

    your workflow share your output contribute to community collaborate with others broadcast your work
  71. mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek growing your inner data scientist bit.ly/grow-ds-future Mine

    Çetinkaya-Rundel University of Edinburgh + Duke University + RStudio