Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Useful Tools for Teaching and Outreach in Data Science

Useful Tools for Teaching and Outreach in Data Science

Presentation given at the Symposium on Data Science and Statistics 2019 in Bellevue, WA on May 29-Jun 1, 2019.

Stephanie Hicks

May 31, 2019
Tweet

More Decks by Stephanie Hicks

Other Decks in Education

Transcript

  1. Useful Tools for
    Teaching and Outreach
    in Data Science
    Stephanie Hicks
    Assistant Professor, Biostatistics
    Johns Hopkins Bloomberg School of
    Public Health
    Faculty Member,
    Johns Hopkins Data Science Lab
    @stephaniehicks

    View Slide

  2. We define Data Science Workflows broadly, as:
    "The reproducible and transparent ways you do your
    data analysis"
    This encompasses:
    • literate code documents
    (e.g. Rmarkdown, Notebooks, etc.)
    • analysis scripts
    • pipelines
    • environment management tools
    • etc.
    Mike Love
    Tiffany Timbers

    View Slide

  3. "The reproducible and transparent ways you do your
    data analysis"
    This encompasses:
    • literate code documents
    (e.g. Rmarkdown, Notebooks, etc.)
    • analysis scripts
    • pipelines
    • environment management tools
    • etc.
    We define Data Science Workflows broadly, as:
    I’m going to focus on
    this as it relates to
    Teaching and Outreach

    View Slide

  4. What is a case study?
    https://ia600203.us.archive.org/11/items/cu31924018826713/cu31924018826713.pdf
    C. C. Langdell
    Dean of Harvard Law School from 1870 to 1895

    View Slide

  5. What is a case study?
    Before Langdell's tenure, the study of law was very technical
    And students were simply told what the law is.
    During Langdell’s tenure, he applied the principles of
    pragmatism to the teaching of law à students were
    compelled to use their own reasoning powers to understand
    how the law might apply in a given case.

    View Slide

  6. What is a case study?
    “Law, considered as a science,
    consists of certain principles or
    doctrines. To have such a mastery
    of these … is what constitutes a true
    lawyer; and hence to acquire that
    mastery should be the business of
    every earnest student of law.”
    – C. C. Langdell
    C. C. Langdell
    Dean of Harvard Law School from 1870 to 1895

    View Slide

  7. What is a case study?

    View Slide

  8. What is a case study?

    View Slide

  9. What is a case study?

    View Slide

  10. What is a case study?

    View Slide

  11. Elements of a case study
    Background: provides context for the problem to be solved
    Problem: a dilemma to be resolved or a decision to be made
    Supporting information: data, exhibits, interviews, supporting
    documentation, etc

    View Slide

  12. Characteristics of a good case study
    Real
    real-world situations / real problems / based on real events / realistic, complex, and
    contextually rich situations / contemporary / recent / tells a real story
    Focused on students
    engages students / student centered / students make choices / active learning
    Link between theory and practice
    application of concepts in practice / bridges the gap between theory and practice / make
    choices about what theory to apply / highlight connections between academic topics and
    real-world situations / connects the academy and the workplace
    Ambiguous
    complex and ambiguous / present unresolved issues, situations, or questions / ”art of
    managing uncertainty” / without a detailed script / coping with ambiguities

    View Slide

  13. Teaching

    View Slide

  14. Guidelines for Assessment and Instruction in Statistics
    Education (GAISE) College Report 2016
    1. Teach statistical thinking. (Teach statistics as an investigative process of
    problem-solving and decision making).
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyze data.
    6. Use assessments to improve and evaluate student learning.

    View Slide

  15. Case studies in data science?
    The American Statistician, Vol. 53, No. 4, pp 370-376
    “The model calls for … substantial exercise[s] with nontrivial
    solutions that leave room for different analyses.

    View Slide

  16. Case studies in data science?
    The American Statistician, Vol. 53, No. 4, pp 370-376
    Elements of a “case study”:
    • Introduction
    • Data
    • Background
    • Investigations
    • Theory

    View Slide

  17. https://opencasestudies.github.io

    View Slide

  18. https://jhu-advdatasci.github.io/2018/
    http://cs109.github.io/2014/
    http://datasciencelabs.github.io/2016/
    (Harvard University – CS – over 400 students online, in person – 25 TAs – Python)
    (Harvard SPH – Biostats -- 150 students – online, in person – 8 TAs – R)
    (Johns Hopkins SPH – 25 students – in person – PhD Biostats – 2 TAs – R)

    View Slide

  19. OCS: elements of a data science case study
    •Motivation
    • What is the question? What is the context/background for the question?
    •What is the data?
    •Data import
    •Data wrangling
    •Exploratory data analysis / data visualization
    •Data analysis
    •Summary of results

    View Slide

  20. Demo
    https://opencasestudies.github.io

    View Slide

  21. Incorporating
    Git/GitHub workflows
    into data science
    workflows in the
    classroom

    View Slide

  22. http://happygitwithr.com

    View Slide

  23. Describes how to integrate
    Git/GitHub workflows into statistics
    courses targeted towards students
    with computational backgrounds

    View Slide

  24. Jacob Fiksel
    @jfisel1
    https://arxiv.org/abs/1811.02021
    Describes how to integrate Git/GitHub
    workflows into statistics courses
    targeted towards students with
    non-mathematical backgrounds
    (e.g. public health, sciences)
    “Using a reproducible workflow in statistics is vital
    to a complete data analysis, yet for faculty and
    students with limited computing background,
    learning version control tools such as Git can be
    difficult and intimidating. [Here] we outline some of
    the ways that the Git workflow can be implemented
    in statistics courses at all levels”

    View Slide

  25. SDSS 2019
    CS01 - Teaching Statistics More Effectively
    to a New Generation of Students
    Thursday 10:30am-12:05pm
    Using GitHub with Statistics
    Undergraduates
    Jo Hardin, Pomona College
    h/t @AmeliaMN

    View Slide

  26. Incorporating Slack
    into the classroom

    View Slide

  27. View Slide

  28. Slack in the Classroom

    View Slide

  29. Using Slack for Communication and
    Collaboration in the Classroom
    Albert Y. Kim, Smith College
    bit.ly/slack_sdss

    View Slide

  30. Leonardo Collado Torres

    View Slide

  31. Slack for Conference Program Committees!

    View Slide

  32. Outreach

    View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. Acknowledgements
    Rafael Irizarry
    Mike Love
    Tiffany Timbers
    Leah Jager
    Margaret Taub Leonardo Collado Torres

    View Slide

  42. Feel free to send comments/questions:
    Twitter: @stephaniehicks
    Email: [email protected]
    #rladies
    Thank you!
    Normal distribution
    Weibull distribution
    Poisson distribution

    View Slide