$30 off During Our Annual Pro Sale. View Details »

Data Science in a Box

Data Science in a Box

Data Science in a Box (datasciencebox.org) is an open-source project that aims to equip educators with concrete information on content and infrastructure for designing and painlessly running a semester-long modern introductory data science course with R. In this talk we outline five guiding pedagogical priniples that underlie the choice of topics and concepts introduced in the course as well as their ordering, highlight a sample of examples and assignments that demonstrate how the pedagogy is put into action, introduce `dsbox` -- the companion R package for datasets used in the course, and share sample student work and feedback. We will also walk through a quick start guide for faculty interested in using all or some of these resources in their teaching.

Mine Cetinkaya-Rundel

July 10, 2019
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. ! rstd.io/dsbox-slides
    mine-cetinkaya-rundel
    [email protected]
    @minebocek
    MINE ÇETINKAYA-RUNDEL
    UNIVERSITY OF EDINBURGH + RSTUDIO

    View Slide

  2. ! rstd.io/dsbox-slides
    Three questions that keep me up at night…
    1 What should my students learn?
    2 How will my students learn best?
    3 What tools will enhance my students’ learning?

    View Slide

  3. ! rstd.io/dsbox-slides
    1 What should my students learn?
    2 How will my students learn best?
    3 What tools will enhance my students’ learning?
    Three questions that keep me up at night…
    Content
    Pedagogy
    Infrastructure

    View Slide

  4. ! rstd.io/dsbox-slides
    Infrastructure
    Pedagogy
    Content

    View Slide

  5. ! rstd.io/dsbox-slides

    View Slide

  6. ! rstd.io/dsbox-slides
    ! datasciencebox.org
    ! rstudio-education/datascience-box

    View Slide

  7. ! rstd.io/dsbox-slides
    AUDIENCE
    I have been teaching with R
    for a while, but I want to update
    my teaching materials
    I’m new to teaching with R
    and need to build up my course
    materials
    This teaching slide
    deck I came across on Twitter
    is pretty cool, but I have no idea
    what type of course it
    belongs in

    View Slide

  8. ! rstd.io/dsbox-slides
    TOPICS
    Fundamentals of
    data & data viz,
    confounding variables,
    Simpson’s paradox
    +
    R / RStudio,
    R Markdown, simple Git
    Tidy data, data frames
    vs. summary tables,
    recoding & transforming,
    web scraping & iteration
    +
    collaboration on GitHub
    Building & selecting
    models,
    visualizing interactions,
    prediction & validation,
    inference via simulation
    Data science ethics,
    interactive viz &
    reporting, text analysis,
    Bayesian inference
    +
    communication &
    dissemination

    View Slide

  9. ! rstd.io/dsbox-slides
    CONTENTS
    "
    27
    slide
    decks
    #
    10
    application
    exercises
    $
    10
    computing
    labs

    6
    homework
    assignments

    2
    take-home
    exams
    '
    1
    open-ended
    project
    (
    (10)
    interactive
    tutorials
    website
    datasciencebox.org
    repository
    package
    dsbox

    View Slide

  10. ! rstd.io/dsbox-slides
    DESIGN PRINCIPLES
    )
    cherish
    day one
    *
    skip baby
    steps
    +
    start
    with cake
    ,
    leverage the
    ecosystem
    -
    hide the
    veggies

    View Slide

  11. DESIGN PRINCIPLES
    Which kitchen would you
    rather bake a cake?

    View Slide

  12. DESIGN PRINCIPLES
    Which kitchen would you
    rather bake a cake?

    View Slide

  13. DESIGN PRINCIPLES
    ) Cherish day one

    View Slide

  14. DESIGN PRINCIPLES
    How do you prefer your
    cake recipes? Words only,
    or words & pictures?

    View Slide

  15. DESIGN PRINCIPLES
    How do you prefer your
    cake recipes? Words only,
    or words & pictures?

    View Slide

  16. DESIGN PRINCIPLES
    + Start with cake
    ‣ Open today’s demo project
    ‣ Knit the document and discuss the results with your neighbor
    ‣ Then, change Turkey to a different country, and plot again

    View Slide

  17. DESIGN PRINCIPLES
    + Start with cake
    With great examples, comes a great amount of code…
    but let’s focus on the task at hand…
    ‣ Open today’s demo project
    ‣ Knit the document and discuss the results with your neighbor
    ‣ Then, change Turkey to a different country, and plot again

    View Slide

  18. DESIGN PRINCIPLES
    + Start with cake
    un_votes %>%
    filter(country %in% c("UK & NI", “US”, "Turkey")) %>%
    inner_join(un_roll_calls, by = "rcid") %>%
    inner_join(un_roll_call_issues, by = "rcid") %>%
    group_by(country, year = year(date), issue) %>%
    summarize(
    votes = n(),
    percent_yes = mean(vote !" "yes")
    ) %>%
    filter(votes > 5) %>% # only use records where there are more than 5 votes
    ggplot(mapping = aes(x = year, y = percent_yes, color = country)) +
    geom_smooth(method = "loess", se = FALSE) +
    facet_wrap(~ issue) +
    labs(
    title = "Percentage of Yes votes in the UN General Assembly",
    subtitle = "1946 to 2015",
    y = "% Yes",
    x = "Year",
    color = "Country"
    )

    View Slide

  19. DESIGN PRINCIPLES
    + Start with cake
    un_votes %>%
    filter(country %in% c("UK & NI", “US”, "Turkey")) %>%
    inner_join(un_roll_calls, by = "rcid") %>%
    inner_join(un_roll_call_issues, by = "rcid") %>%
    group_by(country, year = year(date), issue) %>%
    summarize(
    votes = n(),
    percent_yes = mean(vote !" "yes")
    ) %>%
    filter(votes > 5) %>% # only use records where there are more than 5 votes
    ggplot(mapping = aes(x = year, y = percent_yes, color = country)) +
    geom_smooth(method = "loess", se = FALSE) +
    facet_wrap(~ issue) +
    labs(
    title = "Percentage of Yes votes in the UN General Assembly",
    subtitle = "1946 to 2015",
    y = "% Yes",
    x = "Year",
    color = "Country"
    )

    View Slide

  20. DESIGN PRINCIPLES
    + Start with cake
    un_votes %>%
    filter(country %in% c("UK & NI", “US”, "Turkey")) %>%
    inner_join(un_roll_calls, by = "rcid") %>%
    inner_join(un_roll_call_issues, by = "rcid") %>%
    group_by(country, year = year(date), issue) %>%
    summarize(
    votes = n(),
    percent_yes = mean(vote !" "yes")
    ) %>%
    filter(votes > 5) %>% # only use records where there are more than 5 votes
    ggplot(mapping = aes(x = year, y = percent_yes, color = country)) +
    geom_smooth(method = "loess", se = FALSE) +
    facet_wrap(~ issue) +
    labs(
    title = "Percentage of Yes votes in the UN General Assembly",
    subtitle = "1946 to 2015",
    y = "% Yes",
    x = "Year",
    color = "Country"
    )

    View Slide

  21. DESIGN PRINCIPLES
    + Start with cake
    un_votes %>%
    filter(country %in% c("UK & NI", “US”, “France")) %>%
    inner_join(un_roll_calls, by = "rcid") %>%
    inner_join(un_roll_call_issues, by = "rcid") %>%
    group_by(country, year = year(date), issue) %>%
    summarize(
    votes = n(),
    percent_yes = mean(vote !" "yes")
    ) %>%
    filter(votes > 5) %>% # only use records where there are more than 5 votes
    ggplot(mapping = aes(x = year, y = percent_yes, color = country)) +
    geom_smooth(method = "loess", se = FALSE) +
    facet_wrap(~ issue) +
    labs(
    title = "Percentage of Yes votes in the UN General Assembly",
    subtitle = "1946 to 2015",
    y = "% Yes",
    x = "Year",
    color = "Country"
    )

    View Slide

  22. DESIGN PRINCIPLES
    + Start with cake

    View Slide

  23. DESIGN PRINCIPLES
    Which motivates you
    more to learn how to
    cook: perfectly chopped
    onions or ratatouille?

    View Slide

  24. DESIGN PRINCIPLES
    Which motivates you
    more to learn how to
    cook: perfectly chopped
    onions or ratatouille?

    View Slide

  25. DESIGN PRINCIPLES
    * Skip baby steps
    Re-insert

    View Slide

  26. DESIGN PRINCIPLES
    Which is more likely to
    appeal to someone who
    has never tried broccoli?

    View Slide

  27. DESIGN PRINCIPLES
    Which is more likely to
    appeal to someone who
    has never tried broccoli?

    View Slide

  28. DESIGN PRINCIPLES
    - Hide the veggies
    ‣ Today we go from this to that
    ‣ And do so in a way that is easy to replicate for another state

    View Slide

  29. DESIGN PRINCIPLES
    Lesson: Web scraping essentials for
    turning a structured table into a data
    frame in R.
    - Hide the veggies

    View Slide

  30. DESIGN PRINCIPLES
    Lesson: Web scraping essentials for
    turning a structured table into a data
    frame in R.
    Ex 1: Scrape the table off the web and
    save as a data frame.
    - Hide the veggies

    View Slide

  31. DESIGN PRINCIPLES
    Lesson: Web scraping essentials for
    turning a structured table into a data
    frame in R.
    Ex 1: Scrape the table off the web and
    save as a data frame.
    Ex 2: What other information do we need
    represented as variables to make this figure?
    - Hide the veggies

    View Slide

  32. DESIGN PRINCIPLES
    Lesson: Web scraping essentials for
    turning a structured table into a data
    frame in R.
    Ex 1: Scrape the table off the web and
    save as a data frame.
    Ex 2: What other information do we need
    represented as variables to make this figure?
    Lesson: “Just enough” regex
    - Hide the veggies

    View Slide

  33. DESIGN PRINCIPLES
    If you are already taking a
    baking class, which will be
    easier to venture on to?

    View Slide

  34. DESIGN PRINCIPLES
    If you are already taking a
    baking class, which will be
    easier to venture on to?

    View Slide

  35. DESIGN PRINCIPLES
    , Leverage the ecosystem
    student + instructor instructor

    View Slide

  36. ! rstd.io/dsbox-slides
    USAGE
    in full
    to jumpstart /
    overhaul your
    teaching
    in bits & pieces
    to supplement
    your teaching

    View Slide

  37. ! rstd.io/dsbox-slides
    LICENSE

    View Slide

  38. ! rstd.io/dsbox-slides
    FUTURE
    If you use resources from ,
    hope you’ll let me know / provide feedback!
    rstd.io/dsbox-feedback
    scalability
    ‣ more formative
    assessments
    ‣ automated feedback
    ‣ peer review
    assessment
    ‣ curriculum
    ‣ reach & impact

    View Slide

  39. mine-cetinkaya-rundel
    [email protected]
    @minebocek
    MINE ÇETINKAYA-RUNDEL
    UNIVERSITY OF EDINBURGH + RSTUDIO
    . datasciencebox.org
    / github.com/rstudio-education/dsbox
    " rstd.io/dsbox-slides
    ' rstd.io/dsbox-feedback

    View Slide