Upgrade to Pro — share decks privately, control downloads, hide ads and more …

midfieldr: Data, methods, & metrics for studying student persistence

midfieldr: Data, methods, & metrics for studying student persistence

This talk introduces R users to data and tools for investigating undergraduate persistence metrics using the midfieldr and midfielddata packages. The data are the student records (registrar's data) of approximately 98,000 undergraduates at US institutions from 1987 to 2016. The goal of the talk is to introduce the packages and to share our data, methods, and metrics for intersectional research in student persistence.

Richard Layton

July 11, 2018
Tweet

More Decks by Richard Layton

Other Decks in Education

Transcript

  1. midfieldr
    Data, methods, & metrics for studying student persistence
    Richard Layton
    Russell Long
    Matthew Ohland
    Nichole Ramirez
    Rose-Hulman Institute of Technology
    Purdue University
    Purdue University
    Purdue University
    https://midfieldr.github.io/midfieldr
    useR! Conference, Brisbane, 2018–07–11

    View Slide

  2. In education, cross-sectional designs are typical
    group 1
    group 2
    group 3
    different groups
    at one time
    time
    2 / 27

    View Slide

  3. Longitudinal studies offer some advantages
    same groups over time
    time
    year 1 year 2 year 3 year 4 year 5 year 6
    3 / 27

    View Slide

  4. MIDFIELD is a database for longitudinal studies
    1.6 M undergraduate students at 21 US institutions
    whole-population data from registrars
    1987–present
    4 / 27

    View Slide

  5. MIDFIELD data are curated in four categories
    students courses
    terms degrees
    MIDFIELD : 1.6 M students
    5 / 27

    View Slide

  6. R package midfielddata provides a stratified sample
    students courses
    terms degrees
    midfielddata : 98 000 students
    6 / 27

    View Slide

  7. Each observation is a unique student
    students courses
    terms degrees
    student ID
    institution
    term
    major
    transfer
    sex, race, age
    us citizen
    home zip code
    SAT, ACT
    midfielddata : 98 000 students
    midfieldstudents
    98,000 observations
    19 Mb of memory
    7 / 27

    View Slide

  8. Each observation is one term for one student
    students courses
    terms degrees
    student ID
    institution
    term
    major
    transfer
    sex, race, age
    us citizen
    home zip code
    SAT, ACT
    student ID
    institution
    term
    major
    level
    standing
    co-op
    credit hours
    GPA
    midfielddata : 98 000 students
    midfieldterms
    729,000 observations
    82 Mb of memory
    8 / 27

    View Slide

  9. Each observation is one course for one student
    students courses
    terms degrees
    student ID
    institution
    term
    major
    transfer
    sex, race, age
    us citizen
    home zip code
    SAT, ACT
    student ID
    institution
    term
    major
    level
    standing
    co-op
    credit hours
    GPA
    student ID
    institution
    term
    course
    --section
    --hours
    --type
    --grade
    --instructor
    midfielddata : 98 000 students
    midfieldcourses
    3.5 M observations
    348 Mb of memory
    9 / 27

    View Slide

  10. Each observation is a unique student
    students courses
    terms degrees
    student ID
    institution
    term
    major
    transfer
    sex, race, age
    us citizen
    home zip code
    SAT, ACT
    student ID
    institution
    term
    major
    level
    standing
    co-op
    credit hours
    GPA
    student ID
    institution
    term
    course
    --section
    --hours
    --type
    --grade
    --instructor
    student ID
    institution
    term
    major
    degree
    midfielddata : 98 000 students
    midfielddegrees
    98,000 observations
    10 Mb of memory
    10 / 27

    View Slide

  11. midfielddata provides the data
    midfielddata
    midfieldstudents
    midfieldterms
    midfieldcourses
    midfielddegrees
    11 / 27

    View Slide

  12. midfieldr provides the tools
    midfielddata
    midfieldstudents
    midfieldterms
    midfieldcourses
    midfielddegrees
    library(midfielddata)
    library(midfieldr)
    cip_filter()
    ever_filter()
    grad_filter()
    race_sex_join()
    multiway_order()
    etc.
    midfieldr
    12 / 27

    View Slide

  13. Both packages are currently available on GitHub
    https://midfieldr.github.io/midfieldr 13 / 27

    View Slide

  14. Why R?
    Increase accessibility to the data
    Support a collaborative community of researchers
    Share our methods and metrics
    14 / 27

    View Slide

  15. 15 / 27

    View Slide

  16. Which is stickier: Engineering or Stats/Applied-Math?
    stickiness =
    N graduates of a program
    N students ever enrolled in the program
    16 / 27

    View Slide

  17. We start with the programs’ Classification of
    Instructional Programs (CIP) codes
    midfieldr
    cip_filter()
    midfieldr
    1584 observations
    cip
    17 / 27

    View Slide

  18. cip_filter() helps us find the codes we want
    27-series: Applied math and stats
    #> # A tibble: 4 x 2
    #> cip4 cip4name
    #>
    #> 1 2701 Mathematics
    #> 2 2703 Applied Mathematics
    #> 3 2705 Statistics
    #> 4 2799 Mathematics and Statistics, Other
    14-series: Engineering
    #> # A tibble: 1 x 2
    #> cip2 cip2name
    #>
    #> 1 14 Engineering
    18 / 27

    View Slide

  19. ever_filter() identifies all students ever enrolled
    in these two programs
    midfieldr
    ever_filter()
    midfielddata
    730 000 observations
    midfieldterms
    19 / 27

    View Slide

  20. ever_filter() identifies all students ever enrolled
    in these two programs...nearly 20,000
    #> # A tibble: 19,404 x 2
    #> id program
    #>
    #> 1 MID25783162 Engineering
    #> 2 MID25783166 Engineering
    #> 3 MID25783167 Engineering
    #> 4 MID25783178 Engineering
    #> 5 MID25783197 Engineering
    #> 6 MID25783199 Engineering
    #> 7 MID25783257 Engineering
    #> 8 MID25783259 Engineering
    #> 9 MID25783275 Engineering
    #> 10 MID25783388 Engineering
    #> # ... with 19,394 more rows
    20 / 27

    View Slide

  21. race_sex_join() adds student race and sex
    variables to the data frame
    98 000 observations
    midfieldr
    race_sex_join()
    midfielddata
    midfieldstudents
    21 / 27

    View Slide

  22. We group and summarize these data by program,
    race, and sex
    #> Observations: 28
    #> Variables: 4
    #> $ program "Engineering", "Engineering"
    #> $ race "Asian", "Asian", "Black", "
    #> $ sex "Female", "Male", "Female",
    #> $ ever 302, 998, 734, 1273, 177, 57
    Numbers are a little high because "possible graduation in
    6 years" not yet accounted for.
    22 / 27

    View Slide

  23. grad_filter() identifies the students graduating
    from these two programs
    midfieldr
    grad_filter()
    midfielddata
    98 000 observations
    midfielddegrees
    23 / 27

    View Slide

  24. Again, we group and summarize these data (7500
    graduates) by program, race, and sex
    #> Observations: 27
    #> Variables: 4
    #> $ program "Engineering", "Engineering"
    #> $ race "Asian", "Asian", "Black", "
    #> $ sex "Female", "Male", "Female",
    #> $ grad 129, 445, 276, 395, 55, 206,
    24 / 27

    View Slide

  25. We join the graduates to the ever-enrolleds and
    compute stickiness for programs with > 5 enrolled
    stickiness <- left_join(ever_enrolled,
    graduated) %>%
    filter(ever > 5) %>%
    mutate(stickiness = grad / ever)
    Graph using conventional ggplot2 functions.
    25 / 27

    View Slide

  26. Statistics and Applied Math
    Engineering
    0.1 0.2 0.3 0.4 0.5 0.6
    Black Male
    Hispanic Male
    Native American Female
    Black Female
    Native American Male
    Hispanic Female
    White Female
    White Male
    Asian Female
    Asian Male
    Black Male
    Hispanic Male
    Native American Female
    Black Female
    Native American Male
    Hispanic Female
    White Female
    White Male
    Asian Female
    Asian Male
    Stickiness
    26 / 27

    View Slide

  27. To find out more...
    R packages
    https://midfieldr.github.io/midfieldr
    MIDFIELD Project
    https://engineering.purdue.edu/MIDFIELD
    midfi[email protected]
    Support provided by the US National Science Foundation, Grant 1545667
    Expanding Access to and Participation in the Multiple-Institution Database
    for Investigating Engineering Longitudinal Development
    27 / 27

    View Slide