Slide 1

Slide 1 text

midfieldr Data, methods, & metrics for studying student persistence Richard Layton Russell Long Matthew Ohland Nichole Ramirez Rose-Hulman Institute of Technology Purdue University Purdue University Purdue University https://midfieldr.github.io/midfieldr useR! Conference, Brisbane, 2018–07–11

Slide 2

Slide 2 text

In education, cross-sectional designs are typical group 1 group 2 group 3 different groups at one time time 2 / 27

Slide 3

Slide 3 text

Longitudinal studies offer some advantages same groups over time time year 1 year 2 year 3 year 4 year 5 year 6 3 / 27

Slide 4

Slide 4 text

MIDFIELD is a database for longitudinal studies 1.6 M undergraduate students at 21 US institutions whole-population data from registrars 1987–present 4 / 27

Slide 5

Slide 5 text

MIDFIELD data are curated in four categories students courses terms degrees MIDFIELD : 1.6 M students 5 / 27

Slide 6

Slide 6 text

R package midfielddata provides a stratified sample students courses terms degrees midfielddata : 98 000 students 6 / 27

Slide 7

Slide 7 text

Each observation is a unique student students courses terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT midfielddata : 98 000 students midfieldstudents 98,000 observations 19 Mb of memory 7 / 27

Slide 8

Slide 8 text

Each observation is one term for one student students courses terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA midfielddata : 98 000 students midfieldterms 729,000 observations 82 Mb of memory 8 / 27

Slide 9

Slide 9 text

Each observation is one course for one student students courses terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor midfielddata : 98 000 students midfieldcourses 3.5 M observations 348 Mb of memory 9 / 27

Slide 10

Slide 10 text

Each observation is a unique student students courses terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor student ID institution term major degree midfielddata : 98 000 students midfielddegrees 98,000 observations 10 Mb of memory 10 / 27

Slide 11

Slide 11 text

midfielddata provides the data midfielddata midfieldstudents midfieldterms midfieldcourses midfielddegrees 11 / 27

Slide 12

Slide 12 text

midfieldr provides the tools midfielddata midfieldstudents midfieldterms midfieldcourses midfielddegrees library(midfielddata) library(midfieldr) cip_filter() ever_filter() grad_filter() race_sex_join() multiway_order() etc. midfieldr 12 / 27

Slide 13

Slide 13 text

Both packages are currently available on GitHub https://midfieldr.github.io/midfieldr 13 / 27

Slide 14

Slide 14 text

Why R? Increase accessibility to the data Support a collaborative community of researchers Share our methods and metrics 14 / 27

Slide 15

Slide 15 text

15 / 27

Slide 16

Slide 16 text

Which is stickier: Engineering or Stats/Applied-Math? stickiness = N graduates of a program N students ever enrolled in the program 16 / 27

Slide 17

Slide 17 text

We start with the programs’ Classification of Instructional Programs (CIP) codes midfieldr cip_filter() midfieldr 1584 observations cip 17 / 27

Slide 18

Slide 18 text

cip_filter() helps us find the codes we want 27-series: Applied math and stats #> # A tibble: 4 x 2 #> cip4 cip4name #> #> 1 2701 Mathematics #> 2 2703 Applied Mathematics #> 3 2705 Statistics #> 4 2799 Mathematics and Statistics, Other 14-series: Engineering #> # A tibble: 1 x 2 #> cip2 cip2name #> #> 1 14 Engineering 18 / 27

Slide 19

Slide 19 text

ever_filter() identifies all students ever enrolled in these two programs midfieldr ever_filter() midfielddata 730 000 observations midfieldterms 19 / 27

Slide 20

Slide 20 text

ever_filter() identifies all students ever enrolled in these two programs...nearly 20,000 #> # A tibble: 19,404 x 2 #> id program #> #> 1 MID25783162 Engineering #> 2 MID25783166 Engineering #> 3 MID25783167 Engineering #> 4 MID25783178 Engineering #> 5 MID25783197 Engineering #> 6 MID25783199 Engineering #> 7 MID25783257 Engineering #> 8 MID25783259 Engineering #> 9 MID25783275 Engineering #> 10 MID25783388 Engineering #> # ... with 19,394 more rows 20 / 27

Slide 21

Slide 21 text

race_sex_join() adds student race and sex variables to the data frame 98 000 observations midfieldr race_sex_join() midfielddata midfieldstudents 21 / 27

Slide 22

Slide 22 text

We group and summarize these data by program, race, and sex #> Observations: 28 #> Variables: 4 #> $ program "Engineering", "Engineering" #> $ race "Asian", "Asian", "Black", " #> $ sex "Female", "Male", "Female", #> $ ever 302, 998, 734, 1273, 177, 57 Numbers are a little high because "possible graduation in 6 years" not yet accounted for. 22 / 27

Slide 23

Slide 23 text

grad_filter() identifies the students graduating from these two programs midfieldr grad_filter() midfielddata 98 000 observations midfielddegrees 23 / 27

Slide 24

Slide 24 text

Again, we group and summarize these data (7500 graduates) by program, race, and sex #> Observations: 27 #> Variables: 4 #> $ program "Engineering", "Engineering" #> $ race "Asian", "Asian", "Black", " #> $ sex "Female", "Male", "Female", #> $ grad 129, 445, 276, 395, 55, 206, 24 / 27

Slide 25

Slide 25 text

We join the graduates to the ever-enrolleds and compute stickiness for programs with > 5 enrolled stickiness <- left_join(ever_enrolled, graduated) %>% filter(ever > 5) %>% mutate(stickiness = grad / ever) Graph using conventional ggplot2 functions. 25 / 27

Slide 26

Slide 26 text

Statistics and Applied Math Engineering 0.1 0.2 0.3 0.4 0.5 0.6 Black Male Hispanic Male Native American Female Black Female Native American Male Hispanic Female White Female White Male Asian Female Asian Male Black Male Hispanic Male Native American Female Black Female Native American Male Hispanic Female White Female White Male Asian Female Asian Male Stickiness 26 / 27

Slide 27

Slide 27 text

To find out more... R packages https://midfieldr.github.io/midfieldr MIDFIELD Project https://engineering.purdue.edu/MIDFIELD midfi[email protected] Support provided by the US National Science Foundation, Grant 1545667 Expanding Access to and Participation in the Multiple-Institution Database for Investigating Engineering Longitudinal Development 27 / 27