Making data science accessible in the Johns Hopkins Data Science Lab

68c6191fa302627da003b9ac1eaba4b5?s=47 Stephanie Hicks
February 27, 2019

Making data science accessible in the Johns Hopkins Data Science Lab

68c6191fa302627da003b9ac1eaba4b5?s=128

Stephanie Hicks

February 27, 2019
Tweet

Transcript

  1. Making data science accessible in the Johns Hopkins Data Science

    Lab Stephanie Hicks Assistant Professor, Biostatistics Johns Hopkins Bloomberg School of Public Health Faculty Member Johns Hopkins Data Science Lab @stephaniehicks
  2. Teaching: Data Science Research: Genomics (analyzing single-cell gene expression data)

    • R/Bioconductor user and developer (since 2009/2010) Other fun things about me: • Co-founded Baltimore • Creating a children’s book featuring women statisticians and data scientists ABOUT ME JOHNS HOPKINS BLOOMBERG SCHOOL OF PUBLIC HEALTH
  3. https://jhudatascience.org

  4. The “OG”s ROGER BRIAN JEFF Joined in 2018 STEPHANIE Who

    are we?
  5. Education

  6. Massive Open Online Courses in Data Science • > 4

    million enrolled • > 500K completed courses • > 200K completed specialization
  7. Can MOOC Programs Improve Student Employment Prospects?

  8. We don’t just need practicing data scientists

  9. • Variable pricing (including $0) • Readers get all edition

    updates • Author friendly royalty split • Bound books through 3rd party The E-book revolution
  10. The E-book revolution • Variable pricing (including $0) • Readers

    get all edition updates • Author friendly royalty split • Bound books through 3rd party
  11. Outreach

  12. None
  13. None
  14. None
  15. The Data Science Lab Puppets • Creating children’s videos to

    teach young students about statistics and data science • Puppets have their own DSL YouTube channel and twitter accounts: @LeekPuppet, @puppetpeng
  16. None
  17. None
  18. Research

  19. Why data science? Data science is the number one rated

    job by Glassdoor and there are more than 350,000 new data science jobs expected by 2020.
  20. What do I mean by “data science”?

  21. What do I mean by “data science”?

  22. Here, I focus on the term data science as it

    refers generally to Type A data scientists who process and interpret data as it pertains to answering real-world questions.
  23. Data Science in Academia? • Statistics was born directly from

    developing solutions to practical problems by data analysis problems • Galton, Ronald Fisher • Wild and Pfannkuch (1999) describe applied statistics as: • A department that embraces applied statistics defined above is a natural home for data science in academia “part of the information gathering and learning process which, in an ideal world, is undertaken to inform decisions and actions. With industry, medicine and many other sectors of society increasingly relying on data for decision making, statistics should be an integral part of the emerging information era.”
  24. What is missing in the current statistics curriculum? Wild and

    Pfannhuch (1999) complained that: “Large parts of the investigative process, such as problem analysis and measurement, have been largely abandoned by statisticians and statistics educators to the realm of the particular, perhaps to be developed separately within other disciplines.” They add that “[t]he arid, context-free landscape on which so many examples used in statistics teaching are built ensures that large numbers of students never even see, let alone engage in, statistical thinking.”
  25. What is missing in the current statistics curriculum? Computing •

    Need more computing in the curriculum
  26. What is missing in the current statistics curriculum? Computing, Connecting

    • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools
  27. What is missing in the current statistics curriculum? Computing, Connecting,

    Creating • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools • Instead of being passive, teach students to be active and how create and formulate questions to investigate hypotheses with data
  28. Bridging the gap in the statistics classroom to teach introductory

    data science courses
  29. Bridging the gap in the classroom to teach introductory data

    science courses • Educators need to be experienced themselves in creating, connecting and computing • Encourage applied statisticians experienced in creating, connecting, and computing to become involved in the development of courses • Encourage statistics departments to reach out to practicing data analysts, perhaps in other departments or from other disciplines, to collaborate in developing these courses
  30. Principles of Teaching Data Science

  31. Principles of Teaching Data Science • Organize the course around

    a set of diverse case studies • Integrate computing into every aspect of the course • Teach abstraction, but minimize reliance on mathematical notation • Structure course activities to realistically mimic a data scientist’s experience • Demonstrate the importance of critical thinking / skepticism through examples
  32. Female Male 0 10 20 30 18−24 25−44 18−24 25−44

    count What is your age? clincial effectiveness non−degree quantitative methods global health social and behavorial sciences MPH health policy environmental health computational biology biostatistics epidemiology 0 5 10 15 count What is your primary concentration? VB/VBScript Ruby Perl SQL BASIC Java Python C / C++ R 0 10 20 30 count What is your primary programming language? Less comfortable More comfortable 0 5 10 15 20 1 2 3 4 5 count Overall, how comfortable are you with programming? 0 10 20 <6mos 6mos − 1yr 1−3yrs >3yrs count How long have you been programming? A B C D E
  33. Public GitHub repository with course materials

  34. Private GitHub repos created for each student/ assignment combination

  35. Homework assigned in R Markdown

  36. Submitted homework assignment in HTML

  37. https://jhu-advdatasci.github.io/2018/ http://cs109.github.io/2014/ http://datasciencelabs.github.io/2016/

  38. https://opencasestudies.github.io

  39. Feel free to send comments/questions: Twitter: @stephaniehicks Email: shicks19@jhu.edu #rladies

    Thank you! https://opencasestudies.github.io https://jhu-advdatasci.github.io/2018/