$30 off During Our Annual Pro Sale. View Details »

Making data science accessible in the Johns Hopkins Data Science Lab

Stephanie Hicks
February 27, 2019

Making data science accessible in the Johns Hopkins Data Science Lab

Stephanie Hicks

February 27, 2019
Tweet

More Decks by Stephanie Hicks

Other Decks in Research

Transcript

  1. Making data science accessible in the
    Johns Hopkins Data Science Lab
    Stephanie Hicks
    Assistant Professor, Biostatistics
    Johns Hopkins Bloomberg School of Public Health
    Faculty Member
    Johns Hopkins Data Science Lab
    @stephaniehicks

    View Slide

  2. Teaching: Data Science
    Research: Genomics (analyzing single-cell gene expression data)
    • R/Bioconductor user and developer (since 2009/2010)
    Other fun things about me:
    • Co-founded Baltimore
    • Creating a children’s book featuring women statisticians and data scientists
    ABOUT ME JOHNS HOPKINS BLOOMBERG
    SCHOOL OF PUBLIC HEALTH

    View Slide

  3. https://jhudatascience.org

    View Slide

  4. The “OG”s
    ROGER
    BRIAN
    JEFF
    Joined in 2018
    STEPHANIE
    Who are we?

    View Slide

  5. Education

    View Slide

  6. Massive Open Online Courses in Data Science
    • > 4 million enrolled
    • > 500K completed
    courses
    • > 200K completed
    specialization

    View Slide

  7. Can MOOC Programs
    Improve Student
    Employment Prospects?

    View Slide

  8. We don’t just need practicing data scientists

    View Slide

  9. • Variable pricing (including $0)
    • Readers get all edition updates
    • Author friendly royalty split
    • Bound books through 3rd party
    The E-book revolution

    View Slide

  10. The E-book revolution
    • Variable pricing (including $0)
    • Readers get all edition updates
    • Author friendly royalty split
    • Bound books through 3rd party

    View Slide

  11. Outreach

    View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. The Data Science Lab Puppets
    • Creating children’s videos to teach young students about
    statistics and data science
    • Puppets have their own DSL YouTube channel and
    twitter accounts: @LeekPuppet, @puppetpeng

    View Slide

  16. View Slide

  17. View Slide

  18. Research

    View Slide

  19. Why data science?
    Data science is the number one rated job by
    Glassdoor and there are more than 350,000
    new data science jobs expected by 2020.

    View Slide

  20. What do I mean by “data science”?

    View Slide

  21. What do I mean by “data science”?

    View Slide

  22. Here, I focus on the term data science as it refers
    generally to Type A data scientists who process and
    interpret data as it pertains to answering real-world
    questions.

    View Slide

  23. Data Science in Academia?
    • Statistics was born directly from developing solutions to practical
    problems by data analysis problems
    • Galton, Ronald Fisher
    • Wild and Pfannkuch (1999) describe applied statistics as:
    • A department that embraces applied statistics defined above is a natural
    home for data science in academia
    “part of the information gathering and learning process which, in an
    ideal world, is undertaken to inform decisions and actions. With industry,
    medicine and many other sectors of society increasingly relying on data
    for decision making, statistics should be an integral part of the emerging
    information era.”

    View Slide

  24. What is missing in the current statistics
    curriculum?
    Wild and Pfannhuch (1999) complained that:
    “Large parts of the investigative process, such as problem analysis and
    measurement, have been largely abandoned by statisticians and
    statistics educators to the realm of the particular, perhaps to be
    developed separately within other disciplines.”
    They add that “[t]he arid, context-free landscape on which so many
    examples used in statistics teaching are built ensures that large
    numbers of students never even see, let alone engage in, statistical
    thinking.”

    View Slide

  25. What is missing in the current statistics
    curriculum? Computing
    • Need more computing in the curriculum

    View Slide

  26. What is missing in the current statistics
    curriculum? Computing, Connecting
    • Need more computing in the curriculum
    • Need to teach how to connect the subject matter question to
    appropriate dataset and analysis tools

    View Slide

  27. What is missing in the current statistics
    curriculum? Computing, Connecting, Creating
    • Need more computing in the curriculum
    • Need to teach how to connect the subject matter question to
    appropriate dataset and analysis tools
    • Instead of being passive, teach students to be active and how create
    and formulate questions to investigate hypotheses with data

    View Slide

  28. Bridging the gap in the statistics classroom to
    teach introductory data science courses

    View Slide

  29. Bridging the gap in the classroom to teach
    introductory data science courses
    • Educators need to be experienced themselves in creating, connecting
    and computing
    • Encourage applied statisticians experienced in creating, connecting,
    and computing to become involved in the development of courses
    • Encourage statistics departments to reach out to practicing data
    analysts, perhaps in other departments or from other disciplines, to
    collaborate in developing these courses

    View Slide

  30. Principles of Teaching Data
    Science

    View Slide

  31. Principles of Teaching Data Science
    • Organize the course around a set of diverse case studies
    • Integrate computing into every aspect of the course
    • Teach abstraction, but minimize reliance on mathematical notation
    • Structure course activities to realistically mimic a data scientist’s
    experience
    • Demonstrate the importance of critical thinking / skepticism
    through examples

    View Slide

  32. Female Male
    0
    10
    20
    30
    18−24 25−44 18−24 25−44
    count
    What is your age?
    clincial
    effectiveness
    non−degree
    quantitative methods
    global health
    social and
    behavorial sciences
    MPH
    health policy
    environmental
    health
    computational
    biology
    biostatistics
    epidemiology
    0 5 10 15
    count
    What is your primary concentration?
    VB/VBScript
    Ruby
    Perl
    SQL
    BASIC
    Java
    Python
    C / C++
    R
    0 10 20 30
    count
    What is your primary
    programming language?
    Less
    comfortable
    More
    comfortable
    0
    5
    10
    15
    20
    1 2 3 4 5
    count
    Overall, how comfortable are
    you with programming?
    0
    10
    20
    <6mos 6mos − 1yr 1−3yrs >3yrs
    count
    How long have you been programming?
    A B
    C
    D E

    View Slide

  33. Public GitHub
    repository with
    course
    materials

    View Slide

  34. Private GitHub
    repos created
    for each
    student/
    assignment
    combination

    View Slide

  35. Homework
    assigned in
    R Markdown

    View Slide

  36. Submitted
    homework
    assignment in
    HTML

    View Slide

  37. https://jhu-advdatasci.github.io/2018/
    http://cs109.github.io/2014/
    http://datasciencelabs.github.io/2016/

    View Slide

  38. https://opencasestudies.github.io

    View Slide

  39. Feel free to send comments/questions:
    Twitter: @stephaniehicks
    Email: [email protected]
    #rladies
    Thank you!
    https://opencasestudies.github.io
    https://jhu-advdatasci.github.io/2018/

    View Slide