Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making MIDFIELD More Accessible: A workshop for R beginners

Making MIDFIELD More Accessible: A workshop for R beginners

This workshop introduces data and tools for investigating
undergraduate persistence metrics using R. Student record
data are from MIDFIELD, a database of registrars’ data from
US institutions. The stratified data sample includes demographic,
term, course, and degree information for 98,000 students from
1987 to 2016. The midfieldr package provides functions for
determining persistence metrics such as graduation rates or
program stickiness and for grouping findings by institution,
program, sex, and race/ethnicity. The goal of the workshop is to
share our data, methods, and metrics for intersectional research
in student persistence. The workshop is designed for R beginners.

3d8701daed9f85124c460cc445d09fab?s=128

Richard Layton

October 03, 2018
Tweet

More Decks by Richard Layton

Other Decks in Education

Transcript

  1. Making MIDFIELD More Accessible A workshop for R beginners Richard

    Layton, Matthew Ohland, Russell Long, Marisa Orr 2018–10–03 FIE Conference, San Jose, CA
  2. R-Bar volunteers can help with software issues Min Topic 10

    Introductions 20 Elements of effective graphs 30 Getting started with R (tutorial) 20 Accessing the MIDFIELD data 20 — break — 40 Using midfieldr (tutorial) 10 Extending your repertoire 10 Next steps 20 Conversations 3
  3. Elements of effective graphs 4

  4. In your handout, list the slices A thru E from

    largest to smallest A B C D E Adapted from (Robbins 2013) Ch. 2 5
  5. In your handout, list the slices A thru E from

    largest to smallest A B C D E • B (largest) Adapted from (Robbins 2013) Ch. 2 5
  6. In your handout, list the slices A thru E from

    largest to smallest A B C D E • B (largest) • D • A • C • E (smallest) Adapted from (Robbins 2013) Ch. 2 5
  7. The same data arranged along a common axis Comparing values

    along a common axis is a high-accuracy visual task. E C A D B 17 18 19 20 21 22 23 6
  8. Slices are what percentage of the whole? A D C

    B Fill in the blanks A. The total should be 100% B. C. D. 7
  9. 3D-effects distort our judgment A D C B Fill in

    the blanks A. 20% The total should be 100% B. 20% C. 20% D. 40% 8
  10. Again, the same data arranged along a common axis A

    high-accuracy visual task. A B C D 20 25 30 35 40 9
  11. Write down the heights of the bars This is a

    visual inspection only. Fill in the blanks A. B. C. D. Adapted from (Robbins 2013) p. 22 10
  12. Again, 3D-effects distort our judgment This is a visual inspection

    only. Fill in the blanks A. 2 B. 4 C. 6 D. 8 11
  13. Again, the same data arranged along a common axis A

    high-accuracy visual task. A B C D 2 4 6 8 12
  14. You can use bars, but must include zero A B

    C D 0 2 4 6 8 13
  15. If you mark the endpoints, you can omit the bar

    A B C D 0 2 4 6 8 14
  16. Producing a “dot plot” with rows ordered per the data

    A B C D 0 2 4 6 8 15
  17. Try estimating areas of three states Visual estimation of area

    is a low-accuracy task. South Carolina (SC) ≈ 83,000 sq km. FL x 1000 sq. km GA x 1000 sq. km AL x 1000 sq. km SC 83 x 1000 sq. km Adapted from (Ihaka 2007) 16
  18. Again, the same data arranged along a common axis FL

    x 1000 sq. km GA x 1000 sq. km AL x 1000 sq. km SC 83 x 1000 sq. km 17
  19. Your estimates have probably improved FL 170 x 1000 sq.

    km GA 154 x 1000 sq. km AL 136 x 1000 sq. km SC 83 x 1000 sq. km 18
  20. When color represents area, what story emerges? Color used deceptively,

    2012 election by county: Obama, Romney http://www-personal.umich.edu/~mejn/election/2012/ 19
  21. When color represents voters? Color used judiciously, each dot 100

    votes for: Obama, Romney Color. Color represents a quantity – each dot is 100 votes. http://coach.weinstein.to/lets-get-specific/election-results/ 20
  22. The experts tell us Optimal design primarily depends on •

    The message to be conveyed • The variables to be shown (Doumont 2009) Image from http://www.principiae.be/pdfs/Principiae-2014.pdf 21
  23. The experts tell us The task of the designer is

    to give visual access to the subtle and the difficult — that is, reveal the complex. (Tufte 1983) Image from https://en.wikipedia.org/wiki/Edward_Tufte 22
  24. The experts tell us What’s your point? Seriously, that’s the

    most important question. (Evergreen 2017) Image from https://tei.cgu.edu/people/stephanie-evergreen-phd/ 23
  25. R is designed with statistical analysis and data graphics in

    mind Well-designed data graphics are accessible, even to the beginner • makes graphical exploration of data accessible to all • work in progress is easily disseminated via GitHub And because R is open-source • new packages appear regularly—one might solve your problem • anyone can help us find errors and add features to our packages 24
  26. Getting started with R (tutorial) 25

  27. This self-paced tutorial introduces basic R • Don’t worry about

    the pace of your work. • Everyone works and learns new material at a different pace. • Please ask questions of your neighbors as well as the facilitators • If you finish early, ask if anyone near you needs assistance • Save your work regularly 26
  28. https://midfieldr.github.io/workshops Create an R project, start an R script, add

    code, rinse, repeat. 27
  29. Accessing the MIDFIELD data 28

  30. In education, cross-sectional designs are typical group 1 group 2

    group 3 different groups at one time time 29
  31. Longitudinal studies offer some advantages same groups over time time

    year 1 year 2 year 3 year 4 year 5 year 6 30
  32. MIDFIELD is a database for longitudinal studies • 1.6 M

    undergraduate students at 21 US institutions • whole-population data from registrars • 1987–present 31
  33. MIDFIELD data are curated in four categories students courses terms

    degrees MIDFIELD : 1.6 M students 32
  34. R package midfielddata provides a stratified sample students courses terms

    degrees midfielddata : 98 000 students 33
  35. Each observation is a unique student students courses terms degrees

    student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT midfielddata : 98 000 students midfieldstudents 98,000 observations 19 Mb of memory 34
  36. Each observation is one term for one student students courses

    terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA midfielddata : 98 000 students midfieldterms 729,000 observations 82 Mb of memory 35
  37. Each observation is one course for one student students courses

    terms degrees student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor midfielddata : 98 000 students midfieldcourses 3.5 M observations 348 Mb of memory 36
  38. Each observation is a unique student students courses terms degrees

    student ID institution term major transfer sex, race, age us citizen home zip code SAT, ACT student ID institution term major level standing co-op credit hours GPA student ID institution term course --section --hours --type --grade --instructor student ID institution term major degree midfielddata : 98 000 students midfielddegrees 98,000 M observations 10 Mb of memory 37
  39. midfielddata provides the data midfieldstudents midfieldterms midfieldcourses midfielddegrees 38

  40. midfieldr provides the tools midfieldstudents midfieldterms midfieldcourses midfielddegrees library(midfielddata) library(midfieldr)

    cip_filter() ever_filter() grad_filter() race_sex_join() multiway_order() etc. 39
  41. Preparing for the workshop, you installed both packages https://midfieldr.github.io/midfieldr 40

  42. midfieldr provides functions for working with midfieldddata Some of those

    functions you will use today are: Function Provides cip_filter() Identify programs by CIP code cip_label() Label your programs ever_filter() Find all students ever enrolled in your programs grad_filter() Find all graduates of your programs race_sex_join() Join student race/ethnicity and sex to the data multiway_order() Order the rows and panels of multiway data 41
  43. midfieldr provides functions for working with midfieldddata Some of those

    functions you will use today are: Function Provides cip_filter() Identify programs by CIP code cip_label() Label your programs ever_filter() Find all students ever enrolled in your programs grad_filter() Find all graduates of your programs race_sex_join() Join student race/ethnicity and sex to the data multiway_order() Order the rows and panels of multiway data • We’ll work with midfieldr after the break. 41
  44. Using midfieldr (tutorial) 42

  45. This self-paced tutorial illustrates midfieldr functions • Don’t worry about

    the pace of your work. • Everyone works and learns new material at a different pace. • Please ask questions of your neighbors as well as the facilitators • If you finish early, ask if anyone near you needs assistance • Save your work regularly 43
  46. https://midfieldr.github.io/midfieldr Start a new R script, add a line of

    code, run it Examine the result, repeat 44
  47. Extending your repertoire: Metrics & graphics 45

  48. Graduation rates of starters Figure 4 Graduation rates of starters

    46
  49. Stickiness in major and in any other major 47

  50. Starting and destination majors of all women ever in EE

    Sankey diagram Non−ENG Other−ENG EE Unknown Non−ENG Other−ENG EE N = 2.5 N = 1.5 N = 1.0 0 1 2 3 0 1 2 3 Starting Major Year 6 Destination Women ever enrolled in Electrical Engineering Number of students (x1000) 20 20 48
  51. Migration yield 49

  52. Comparing graduation rates of starters and migrators 50

  53. Next steps 51

  54. Next steps in learning to use midfieldr Several more vignettes

    (tutorials) on the midfieldr website 52
  55. Next steps if you want more than a MIDFIELD sample

    students courses terms degrees MIDFIELD : 1.6 M students Talk to a member of the MIDFIELD team. Names and emails on the website. 53
  56. Talk to a member of the MIDFIELD team 54

  57. Next steps in learning R Hadley Wickham Garrett Grolemund Robert

    Kabacoff StackExchange.com Or just google it Your problem may already be solved 55
  58. Next steps in learning about graph design Edward Tufte Howard

    Wainer Naomi Robbins Charles Kostelnick Michael Hassett 56
  59. Conversations An unstructured time to relax, talk, question, and share.

    57
  60. References Doumont, Jean-luc. 2009. Trees, Maps, and Theorems: Effective Communication

    for Rational Minds. 2nd ed. Kraainem, Belgium: Principiae. Evergreen, Stephanie D. H. 2017. Effective Data Visualization: The Right Chart for the Right Data. Sage. Ihaka, Ross. 2007. “Statistics 787 Lecture Slides.” Kabacoff, Robert. 2015. R in Action: Data Analysis and Graphics with R, 2/e. Manning Publications Co. Kostelnick, Charles, and Michael Hassett. 2003. Shaping Information: The Rhetoric of Visual Conventions. Southern Illinois University. Robbins, Naomi. 2013. Creating More Effective Graphs. Chart House. Tufte, Edward. 1983. The Visual Display of Quantitative Information. Graphics Press. Wainer, Howard. 1997. Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot. Copernicus. Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. Sebastopol, CA: O’Reilly Media, Inc. 58