midfieldr: Data, methods, & metrics for studying student persistence

midfieldr: Data, methods, & metrics for studying student persistence

This talk introduces R users to data and tools for investigating undergraduate persistence metrics using the midfieldr and midfielddata packages. The data are the student records (registrar's data) of approximately 98,000 undergraduates at US institutions from 1987 to 2016. The goal of the talk is to introduce the packages and to share our data, methods, and metrics for intersectional research in student persistence.

3d8701daed9f85124c460cc445d09fab?s=128

Richard Layton

July 11, 2018
Tweet

Transcript

  1. 1.

    midfieldr Data, methods, & metrics for studying student persistence Richard

    Layton Russell Long Matthew Ohland Nichole Ramirez Rose-Hulman Institute of Technology Purdue University Purdue University Purdue University https://midfieldr.github.io/midfieldr useR! Conference, Brisbane, 2018–07–11
  2. 2.

    In education, cross-sectional designs are typical group 1 group 2

    group 3 different groups at one time time 2 / 27
  3. 3.

    Longitudinal studies offer some advantages same groups over time time

    year 1 year 2 year 3 year 4 year 5 year 6 3 / 27
  4. 4.

    MIDFIELD is a database for longitudinal studies 1.6 M undergraduate

    students at 21 US institutions whole-population data from registrars 1987–present 4 / 27
  5. 5.
  6. 6.
  7. 7.

    Each observation is a unique student students courses terms degrees

    student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT midfielddata : 98 000 students midfieldstudents 98,000 observations 19 Mb of memory 7 / 27
  8. 8.

    Each observation is one term for one student students courses

    terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA midfielddata : 98 000 students midfieldterms 729,000 observations 82 Mb of memory 8 / 27
  9. 9.

    Each observation is one course for one student students courses

    terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor midfielddata : 98 000 students midfieldcourses 3.5 M observations 348 Mb of memory 9 / 27
  10. 10.

    Each observation is a unique student students courses terms degrees

    student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor student ID institution term major degree midfielddata : 98 000 students midfielddegrees 98,000 observations 10 Mb of memory 10 / 27
  11. 12.

    midfieldr provides the tools midfielddata midfieldstudents midfieldterms midfieldcourses midfielddegrees library(midfielddata)

    library(midfieldr) cip_filter() ever_filter() grad_filter() race_sex_join() multiway_order() etc. midfieldr 12 / 27
  12. 14.

    Why R? Increase accessibility to the data Support a collaborative

    community of researchers Share our methods and metrics 14 / 27
  13. 15.
  14. 16.

    Which is stickier: Engineering or Stats/Applied-Math? stickiness = N graduates

    of a program N students ever enrolled in the program 16 / 27
  15. 17.

    We start with the programs’ Classification of Instructional Programs (CIP)

    codes midfieldr cip_filter() midfieldr 1584 observations cip 17 / 27
  16. 18.

    cip_filter() helps us find the codes we want 27-series: Applied

    math and stats #> # A tibble: 4 x 2 #> cip4 cip4name #> <chr> <chr> #> 1 2701 Mathematics #> 2 2703 Applied Mathematics #> 3 2705 Statistics #> 4 2799 Mathematics and Statistics, Other 14-series: Engineering #> # A tibble: 1 x 2 #> cip2 cip2name #> <chr> <chr> #> 1 14 Engineering 18 / 27
  17. 19.

    ever_filter() identifies all students ever enrolled in these two programs

    midfieldr ever_filter() midfielddata 730 000 observations midfieldterms 19 / 27
  18. 20.

    ever_filter() identifies all students ever enrolled in these two programs...nearly

    20,000 #> # A tibble: 19,404 x 2 #> id program #> <chr> <chr> #> 1 MID25783162 Engineering #> 2 MID25783166 Engineering #> 3 MID25783167 Engineering #> 4 MID25783178 Engineering #> 5 MID25783197 Engineering #> 6 MID25783199 Engineering #> 7 MID25783257 Engineering #> 8 MID25783259 Engineering #> 9 MID25783275 Engineering #> 10 MID25783388 Engineering #> # ... with 19,394 more rows 20 / 27
  19. 21.

    race_sex_join() adds student race and sex variables to the data

    frame 98 000 observations midfieldr race_sex_join() midfielddata midfieldstudents 21 / 27
  20. 22.

    We group and summarize these data by program, race, and

    sex #> Observations: 28 #> Variables: 4 #> $ program <chr> "Engineering", "Engineering" #> $ race <chr> "Asian", "Asian", "Black", " #> $ sex <chr> "Female", "Male", "Female", #> $ ever <int> 302, 998, 734, 1273, 177, 57 Numbers are a little high because "possible graduation in 6 years" not yet accounted for. 22 / 27
  21. 23.

    grad_filter() identifies the students graduating from these two programs midfieldr

    grad_filter() midfielddata 98 000 observations midfielddegrees 23 / 27
  22. 24.

    Again, we group and summarize these data (7500 graduates) by

    program, race, and sex #> Observations: 27 #> Variables: 4 #> $ program <chr> "Engineering", "Engineering" #> $ race <chr> "Asian", "Asian", "Black", " #> $ sex <chr> "Female", "Male", "Female", #> $ grad <int> 129, 445, 276, 395, 55, 206, 24 / 27
  23. 25.

    We join the graduates to the ever-enrolleds and compute stickiness

    for programs with > 5 enrolled stickiness <- left_join(ever_enrolled, graduated) %>% filter(ever > 5) %>% mutate(stickiness = grad / ever) Graph using conventional ggplot2 functions. 25 / 27
  24. 26.

    Statistics and Applied Math Engineering 0.1 0.2 0.3 0.4 0.5

    0.6 Black Male Hispanic Male Native American Female Black Female Native American Male Hispanic Female White Female White Male Asian Female Asian Male Black Male Hispanic Male Native American Female Black Female Native American Male Hispanic Female White Female White Male Asian Female Asian Male Stickiness 26 / 27
  25. 27.

    To find out more... R packages https://midfieldr.github.io/midfieldr MIDFIELD Project https://engineering.purdue.edu/MIDFIELD

    midfield@purdue.edu Support provided by the US National Science Foundation, Grant 1545667 Expanding Access to and Participation in the Multiple-Institution Database for Investigating Engineering Longitudinal Development 27 / 27