Slide 1

Slide 1 text

Making data science accessible in the Johns Hopkins Data Science Lab Stephanie Hicks Assistant Professor, Biostatistics Johns Hopkins Bloomberg School of Public Health Faculty Member Johns Hopkins Data Science Lab @stephaniehicks

Slide 2

Slide 2 text

Teaching: Data Science Research: Genomics (analyzing single-cell gene expression data) • R/Bioconductor user and developer (since 2009/2010) Other fun things about me: • Co-founded Baltimore • Creating a children’s book featuring women statisticians and data scientists ABOUT ME JOHNS HOPKINS BLOOMBERG SCHOOL OF PUBLIC HEALTH

Slide 3

Slide 3 text

https://jhudatascience.org

Slide 4

Slide 4 text

The “OG”s ROGER BRIAN JEFF Joined in 2018 STEPHANIE Who are we?

Slide 5

Slide 5 text

Education

Slide 6

Slide 6 text

Massive Open Online Courses in Data Science • > 4 million enrolled • > 500K completed courses • > 200K completed specialization

Slide 7

Slide 7 text

Can MOOC Programs Improve Student Employment Prospects?

Slide 8

Slide 8 text

We don’t just need practicing data scientists

Slide 9

Slide 9 text

• Variable pricing (including $0) • Readers get all edition updates • Author friendly royalty split • Bound books through 3rd party The E-book revolution

Slide 10

Slide 10 text

The E-book revolution • Variable pricing (including $0) • Readers get all edition updates • Author friendly royalty split • Bound books through 3rd party

Slide 11

Slide 11 text

Outreach

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

The Data Science Lab Puppets • Creating children’s videos to teach young students about statistics and data science • Puppets have their own DSL YouTube channel and twitter accounts: @LeekPuppet, @puppetpeng

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Research

Slide 19

Slide 19 text

Why data science? Data science is the number one rated job by Glassdoor and there are more than 350,000 new data science jobs expected by 2020.

Slide 20

Slide 20 text

What do I mean by “data science”?

Slide 21

Slide 21 text

What do I mean by “data science”?

Slide 22

Slide 22 text

Here, I focus on the term data science as it refers generally to Type A data scientists who process and interpret data as it pertains to answering real-world questions.

Slide 23

Slide 23 text

Data Science in Academia? • Statistics was born directly from developing solutions to practical problems by data analysis problems • Galton, Ronald Fisher • Wild and Pfannkuch (1999) describe applied statistics as: • A department that embraces applied statistics defined above is a natural home for data science in academia “part of the information gathering and learning process which, in an ideal world, is undertaken to inform decisions and actions. With industry, medicine and many other sectors of society increasingly relying on data for decision making, statistics should be an integral part of the emerging information era.”

Slide 24

Slide 24 text

What is missing in the current statistics curriculum? Wild and Pfannhuch (1999) complained that: “Large parts of the investigative process, such as problem analysis and measurement, have been largely abandoned by statisticians and statistics educators to the realm of the particular, perhaps to be developed separately within other disciplines.” They add that “[t]he arid, context-free landscape on which so many examples used in statistics teaching are built ensures that large numbers of students never even see, let alone engage in, statistical thinking.”

Slide 25

Slide 25 text

What is missing in the current statistics curriculum? Computing • Need more computing in the curriculum

Slide 26

Slide 26 text

What is missing in the current statistics curriculum? Computing, Connecting • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools

Slide 27

Slide 27 text

What is missing in the current statistics curriculum? Computing, Connecting, Creating • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools • Instead of being passive, teach students to be active and how create and formulate questions to investigate hypotheses with data

Slide 28

Slide 28 text

Bridging the gap in the statistics classroom to teach introductory data science courses

Slide 29

Slide 29 text

Bridging the gap in the classroom to teach introductory data science courses • Educators need to be experienced themselves in creating, connecting and computing • Encourage applied statisticians experienced in creating, connecting, and computing to become involved in the development of courses • Encourage statistics departments to reach out to practicing data analysts, perhaps in other departments or from other disciplines, to collaborate in developing these courses

Slide 30

Slide 30 text

Principles of Teaching Data Science

Slide 31

Slide 31 text

Principles of Teaching Data Science • Organize the course around a set of diverse case studies • Integrate computing into every aspect of the course • Teach abstraction, but minimize reliance on mathematical notation • Structure course activities to realistically mimic a data scientist’s experience • Demonstrate the importance of critical thinking / skepticism through examples

Slide 32

Slide 32 text

Female Male 0 10 20 30 18−24 25−44 18−24 25−44 count What is your age? clincial effectiveness non−degree quantitative methods global health social and behavorial sciences MPH health policy environmental health computational biology biostatistics epidemiology 0 5 10 15 count What is your primary concentration? VB/VBScript Ruby Perl SQL BASIC Java Python C / C++ R 0 10 20 30 count What is your primary programming language? Less comfortable More comfortable 0 5 10 15 20 1 2 3 4 5 count Overall, how comfortable are you with programming? 0 10 20 <6mos 6mos − 1yr 1−3yrs >3yrs count How long have you been programming? A B C D E

Slide 33

Slide 33 text

Public GitHub repository with course materials

Slide 34

Slide 34 text

Private GitHub repos created for each student/ assignment combination

Slide 35

Slide 35 text

Homework assigned in R Markdown

Slide 36

Slide 36 text

Submitted homework assignment in HTML

Slide 37

Slide 37 text

https://jhu-advdatasci.github.io/2018/ http://cs109.github.io/2014/ http://datasciencelabs.github.io/2016/

Slide 38

Slide 38 text

https://opencasestudies.github.io

Slide 39

Slide 39 text

Feel free to send comments/questions: Twitter: @stephaniehicks Email: [email protected] #rladies Thank you! https://opencasestudies.github.io https://jhu-advdatasci.github.io/2018/