Richard Layton
October 03, 2018
96

# Making MIDFIELD More Accessible: A workshop for R beginners

This workshop introduces data and tools for investigating
undergraduate persistence metrics using R. Student record
data are from MIDFIELD, a database of registrars’ data from
US institutions. The stratified data sample includes demographic,
term, course, and degree information for 98,000 students from
1987 to 2016. The midfieldr package provides functions for
determining persistence metrics such as graduation rates or
program stickiness and for grouping findings by institution,
program, sex, and race/ethnicity. The goal of the workshop is to
share our data, methods, and metrics for intersectional research
in student persistence. The workshop is designed for R beginners.

October 03, 2018

## Transcript

1. ### Making MIDFIELD More Accessible A workshop for R beginners Richard

Layton, Matthew Ohland, Russell Long, Marisa Orr 2018–10–03 FIE Conference, San Jose, CA
2. ### R-Bar volunteers can help with software issues Min Topic 10

Introductions 20 Elements of effective graphs 30 Getting started with R (tutorial) 20 Accessing the MIDFIELD data 20 — break — 40 Using midﬁeldr (tutorial) 10 Extending your repertoire 10 Next steps 20 Conversations 3

4. ### In your handout, list the slices A thru E from

largest to smallest A B C D E Adapted from (Robbins 2013) Ch. 2 5
5. ### In your handout, list the slices A thru E from

largest to smallest A B C D E • B (largest) Adapted from (Robbins 2013) Ch. 2 5
6. ### In your handout, list the slices A thru E from

largest to smallest A B C D E • B (largest) • D • A • C • E (smallest) Adapted from (Robbins 2013) Ch. 2 5
7. ### The same data arranged along a common axis Comparing values

along a common axis is a high-accuracy visual task. E C A D B 17 18 19 20 21 22 23 6
8. ### Slices are what percentage of the whole? A D C

B Fill in the blanks A. The total should be 100% B. C. D. 7
9. ### 3D-effects distort our judgment A D C B Fill in

the blanks A. 20% The total should be 100% B. 20% C. 20% D. 40% 8
10. ### Again, the same data arranged along a common axis A

high-accuracy visual task. A B C D 20 25 30 35 40 9
11. ### Write down the heights of the bars This is a

visual inspection only. Fill in the blanks A. B. C. D. Adapted from (Robbins 2013) p. 22 10
12. ### Again, 3D-effects distort our judgment This is a visual inspection

only. Fill in the blanks A. 2 B. 4 C. 6 D. 8 11
13. ### Again, the same data arranged along a common axis A

high-accuracy visual task. A B C D 2 4 6 8 12
14. ### You can use bars, but must include zero A B

C D 0 2 4 6 8 13
15. ### If you mark the endpoints, you can omit the bar

A B C D 0 2 4 6 8 14
16. ### Producing a “dot plot” with rows ordered per the data

A B C D 0 2 4 6 8 15
17. ### Try estimating areas of three states Visual estimation of area

is a low-accuracy task. South Carolina (SC) ≈ 83,000 sq km. FL x 1000 sq. km GA x 1000 sq. km AL x 1000 sq. km SC 83 x 1000 sq. km Adapted from (Ihaka 2007) 16
18. ### Again, the same data arranged along a common axis FL

x 1000 sq. km GA x 1000 sq. km AL x 1000 sq. km SC 83 x 1000 sq. km 17
19. ### Your estimates have probably improved FL 170 x 1000 sq.

km GA 154 x 1000 sq. km AL 136 x 1000 sq. km SC 83 x 1000 sq. km 18
20. ### When color represents area, what story emerges? Color used deceptively,

2012 election by county: Obama, Romney http://www-personal.umich.edu/~mejn/election/2012/ 19
21. ### When color represents voters? Color used judiciously, each dot 100

votes for: Obama, Romney Color. Color represents a quantity – each dot is 100 votes. http://coach.weinstein.to/lets-get-specific/election-results/ 20
22. ### The experts tell us Optimal design primarily depends on •

The message to be conveyed • The variables to be shown (Doumont 2009) Image from http://www.principiae.be/pdfs/Principiae-2014.pdf 21
23. ### The experts tell us The task of the designer is

to give visual access to the subtle and the difﬁcult — that is, reveal the complex. (Tufte 1983) Image from https://en.wikipedia.org/wiki/Edward_Tufte 22
24. ### The experts tell us What’s your point? Seriously, that’s the

most important question. (Evergreen 2017) Image from https://tei.cgu.edu/people/stephanie-evergreen-phd/ 23
25. ### R is designed with statistical analysis and data graphics in

mind Well-designed data graphics are accessible, even to the beginner • makes graphical exploration of data accessible to all • work in progress is easily disseminated via GitHub And because R is open-source • new packages appear regularly—one might solve your problem • anyone can help us ﬁnd errors and add features to our packages 24

27. ### This self-paced tutorial introduces basic R • Don’t worry about

the pace of your work. • Everyone works and learns new material at a different pace. • Please ask questions of your neighbors as well as the facilitators • If you ﬁnish early, ask if anyone near you needs assistance • Save your work regularly 26
28. ### https://midfieldr.github.io/workshops Create an R project, start an R script, add

code, rinse, repeat. 27

30. ### In education, cross-sectional designs are typical group 1 group 2

group 3 different groups at one time time 29
31. ### Longitudinal studies offer some advantages same groups over time time

year 1 year 2 year 3 year 4 year 5 year 6 30
32. ### MIDFIELD is a database for longitudinal studies • 1.6 M

undergraduate students at 21 US institutions • whole-population data from registrars • 1987–present 31
33. ### MIDFIELD data are curated in four categories students courses terms

degrees MIDFIELD : 1.6 M students 32
34. ### R package midﬁelddata provides a stratiﬁed sample students courses terms

degrees midfielddata : 98 000 students 33
35. ### Each observation is a unique student students courses terms degrees

student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT midfielddata : 98 000 students midfieldstudents 98,000 observations 19 Mb of memory 34
36. ### Each observation is one term for one student students courses

terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA midfielddata : 98 000 students midfieldterms 729,000 observations 82 Mb of memory 35
37. ### Each observation is one course for one student students courses

terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor midfielddata : 98 000 students midfieldcourses 3.5 M observations 348 Mb of memory 36
38. ### Each observation is a unique student students courses terms degrees

student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor student ID institution term major degree midfielddata : 98 000 students midfielddegrees 98,000 M observations 10 Mb of memory 37

40. ### midﬁeldr provides the tools midfieldstudents midfieldterms midfieldcourses midfielddegrees library(midfielddata) library(midfieldr)

cip_filter() ever_filter() grad_filter() race_sex_join() multiway_order() etc. 39

42. ### midﬁeldr provides functions for working with midﬁeldddata Some of those

functions you will use today are: Function Provides cip_ﬁlter() Identify programs by CIP code cip_label() Label your programs ever_ﬁlter() Find all students ever enrolled in your programs grad_ﬁlter() Find all graduates of your programs race_sex_join() Join student race/ethnicity and sex to the data multiway_order() Order the rows and panels of multiway data 41
43. ### midﬁeldr provides functions for working with midﬁeldddata Some of those

functions you will use today are: Function Provides cip_ﬁlter() Identify programs by CIP code cip_label() Label your programs ever_ﬁlter() Find all students ever enrolled in your programs grad_ﬁlter() Find all graduates of your programs race_sex_join() Join student race/ethnicity and sex to the data multiway_order() Order the rows and panels of multiway data • We’ll work with midﬁeldr after the break. 41

45. ### This self-paced tutorial illustrates midﬁeldr functions • Don’t worry about

the pace of your work. • Everyone works and learns new material at a different pace. • Please ask questions of your neighbors as well as the facilitators • If you ﬁnish early, ask if anyone near you needs assistance • Save your work regularly 43
46. ### https://midfieldr.github.io/midfieldr Start a new R script, add a line of

code, run it Examine the result, repeat 44

46

50. ### Starting and destination majors of all women ever in EE

Sankey diagram Non−ENG Other−ENG EE Unknown Non−ENG Other−ENG EE N = 2.5 N = 1.5 N = 1.0 0 1 2 3 0 1 2 3 Starting Major Year 6 Destination Women ever enrolled in Electrical Engineering Number of students (x1000) 20 20 48

54. ### Next steps in learning to use midﬁeldr Several more vignettes

(tutorials) on the midﬁeldr website 52
55. ### Next steps if you want more than a MIDFIELD sample

students courses terms degrees MIDFIELD : 1.6 M students Talk to a member of the MIDFIELD team. Names and emails on the website. 53

58. ### Next steps in learning about graph design Edward Tufte Howard

Wainer Naomi Robbins Charles Kostelnick Michael Hassett 56

57
60. ### References Doumont, Jean-luc. 2009. Trees, Maps, and Theorems: Effective Communication

for Rational Minds. 2nd ed. Kraainem, Belgium: Principiae. Evergreen, Stephanie D. H. 2017. Effective Data Visualization: The Right Chart for the Right Data. Sage. Ihaka, Ross. 2007. “Statistics 787 Lecture Slides.” Kabacoff, Robert. 2015. R in Action: Data Analysis and Graphics with R, 2/e. Manning Publications Co. Kostelnick, Charles, and Michael Hassett. 2003. Shaping Information: The Rhetoric of Visual Conventions. Southern Illinois University. Robbins, Naomi. 2013. Creating More Effective Graphs. Chart House. Tufte, Edward. 1983. The Visual Display of Quantitative Information. Graphics Press. Wainer, Howard. 1997. Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot. Copernicus. Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. Sebastopol, CA: O’Reilly Media, Inc. 58