Using Diagnostic Models to Evaluate Student Learning Hierarchies in a Large-Scale Assessment

Using Diagnostic Models to Evaluate Student Learning Hierarchies in a
Large- Scale Assessment W. Jake Thompson, Brooke Nash, & Jeffrey C. Hoover Accessible Teaching, Learning, and Assessment Systems (ATLAS) University of Kansas

Diagnostic Classification Models • DCMs (or CDMs) are confirmatory latent
class models to probabilistically place students into profiles of mastered skills (called attributes) • Facilitate fine-grained reporting of skill mastery to support instructional decision-making • Enable the examination of skill dependencies or hierarchies

Benefits of DCMs • Because the goal is classification, we
can get reliable results with fewer items than a comparable scale- score assessment – Summarize mastery across skills (Thompson et al., 2019) – Longitudinal extensions (e.g., Madison & Bradshaw, 2018) • DCMs allow us to both provide instructionally relevant assessment results and answer research questions about student learning

DCMs in Practice • Despite benefits, DCMs are not widely
used in applied or operational settings – Lack of training and tools – Constraints on innovation from (U.S.) regulations and guidelines (e.g., Standards; AERA et al., 2014) • One application for today's discussion: Dynamic Learning Maps (DLM) Alternate Assessment System

Dynamic Learning Maps • 3–12 alternate assessment for students with
significant cognitive disabilities – Assessments in English language arts, mathematics, and science • Academic content available at multiple levels of complexity for each standard • Results are reported as a profile of mastered skills

Example Science Results Student’s performance in high school science Essential
Elements is summarized below. This information is based on all of the D tests Student took during Spring 2023. Student was assessed on 9 out of 9 Essential Elements and 3 out of 3 Domains expecte high school science. Demonstrating mastery of a Level during the assessment assumes mastery of all prior Levels in the Essential Element. This ta describes what skills your child demonstrated in the assessment and how those skills compare to grade level expectations. Estimated Mastery Level Essential Element 1 2 3 (Target) SCI.EE.HS.PS1-2 Recognize a change during a chemical reaction Identify changes during a chemical reaction Use evidence to explain patterns in chemical properties SCI.EE.HS.PS2-3 Identify safety devices that lessen force Use data to compare the e ect of safety devices Evaluate safety devices and minimize force SCI.EE.HS.PS3-4 Compare the temperatures of two liquids Compare the temperatures of liquids before and after mixing Investigate and predict the temperatures of liquids before and after mixing SCI.EE.HS.LS1-2 Recognize that organs have di erent functions Identify which organs have a speciﬁc function Model the organization and interaction of organs SCI.EE.HS.LS2-2 Identify food and shelter needs for wildlife Recognize the relationship between population size and resources Explain the dependence of an animal population on other organisms SCI.EE.HS.LS4-2 Match species to their environments Identify factors that require special traits to survive Explain how traits allow a species to survive Levels mastered this year No evidence of mastery on this Essential Element Essential Element not tested

Example DLM Hierarchy • Investigate and predict the change in
motion of objects based on the forces acting on those objects Identify ways to change motion. Investigate and identify ways to change motion. Investigate and predict changes in motion. Level 1 Level 2 Level 3

Skill Hierarchies in DCMs

Evaluating Skill Hierarchies With DCMs • Thompson & Nash (2022):
A diagnostic framework for the empirical evaluation of learning maps • Three methods for evaluating skill hierarchies using DCMs • Descriptions and examples using Dynamic Learning Maps

Method 1: Patterns of Mastery Profiles • Estimate two models
– LCDM (Henson et al., 2009): Saturated; all possible profiles – HDCM (Templin & Bradshaw, 2014): Constrained; only hypothesized profiles • Evaluate model fit – Posterior predictive model checks – Model comparisons Level 1 Level 2 Level 3 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1

Method 1: Flagging Criteria • Model fit evaluated using posterior
predictive checks of the raw score distribution (Thompson, 2019) • Sufficient fit for the HDCM (constrained model) indicates support for the hierarchy • Flags when the LCDM shows sufficient model fit and the HDCM does not – Indicates the unexpected classes are needed to fully represent the data

Method 1: Example Output

Method 1: Limitations • The number of profiles in the
saturated model (LCDM) increases exponentially with the number of attributes • Need students in all profiles to get reliable parameter estimates • Extremely small classes can cause estimation problems (Templin & Bradshaw, 2014)

Method 2: Patterns of Attribute Mastery • Estimate each skill
as a separate 1-attribute DCM • Make mastery determinations for each assessed skills • Look for unexpected patterns in attribute mastery Student Level 1 Level 2 Level 3 1 1 1 — 2 1 0 0 3 — 1 1 4 1 0 1 5 0 0 0 6 1 — — … … … …

Method 2: Flagging Criteria • Multiple flagging methods – Total
number of students with unexpected pattern across assessed attributes – Total number of reversals for a specific pair of attributes • Actual thresholds for flagging depend on characteristics of the assessment – Simulation studies to determine an expected number of reversals

Method 2: Example Output Overall, 10% of students had an
unexpected pattern (Threshold: 14%) 30% of students testing on Levels 2 and 3 had an unexpected pattern (Threshold: 24%)

Method 2: Limitations • No direct test of relationships between
attributes • Number of models to estimate increases with the number of attributes • Additional analyses needed to set reasonable thresholds for flagging unexpected patterns

Method 3: Patterns of Attribute Difficulty • Group students into
cohorts • Measure attribute difficulty using average p-values of items for each cohort • Within each cohort, p- values should decrease at higher hierarchy levels

Method 3: Flagging Criteria • Calculate Cohen’s h effect size
for the difference in p-values – Subtract lower level from higher level (e.g., Level 3 − Level 2) – Difference should be negative (i.e., Level 3 should have a lower p-value) • Flag instances where Cohen’s h ≥ 0.2 – Moderate or larger effect in unexpected direction

Method 3: Example Output

Method 3: Limitations • Not a DCM-based model – p-values
as a proxy for attribute difficulty • No direct test of attribute relationships • Potential violations of the hierarchy must be inferred from patterns of flags across student cohorts

Revisiting Our Hierarchy • Results indicated that students could be
proficient on Level 3 without being proficient on Level 2 Identify ways to change motion. Investigate and identify ways to change motion. Investigate and predict changes in motion.

Alternate Skill Hierarchy • Multiple pathways to Level 3 proficiency
Identify ways to change motion. Investigate and identify ways to change motion. Investigate and predict changes in motion.

Summary • DCM framework provides multiple methods for analyzing attribute
hierarchies with DCMs – Complementary strengths and weakness – Can be adapted to non-linear relationships • DCMs are a powerful tool for understanding student learning and providing instructionally relevant results for students

Get in Touch! atlas.ku.edu [email protected] company/atlas-ku / @atlas4learning https://dynamiclearningmaps.org/ wjakethompson.com
[email protected] in/wjakethompson / / / @wjakethompson

26 American Educational Research Association (AERA), American Psychological Association, &
National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5 Madison, M. J., & Bradshaw, L. P. (2018). Assessing growth in a diagnostic classification model framework. Psychometrika, 83(4), 963-990. https://doi.org/10.1007/s11336-018-9638-5 Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339. https://doi.org/10.1007/s11336-013-9362-0 Thompson, W. J. (2019). Bayesian psychometrics for diagnostic assessments: A proof of concept (Research Report No. 19-01). University of Kansas; Accessible Teaching, Learning, and Assessment Systems. https://doi.org/10.35542/osf.io/jzqs8 Thompson, W. J., Clark, A. K., & Nash, B. (2019). Measuring the reliability of diagnostic mastery classifications at multiple levels of reporting. Applied Measurement in Education, 32(4), 298–309. https://doi.org/10.1080/08957347.2019.1660345 [Preprint] Thompson, W. J. & Nash, B. (2022). A diagnostic framework for the empirical evaluation of learning maps. Frontiers in Education, 6, 714736. https://doi.org/10.3389/feduc.2021.714736 References

Using Diagnostic Models to Evaluate Student Lea...

Using Diagnostic Models to Evaluate Student Learning Hierarchies in a Large-Scale Assessment

Jake Thompson

More Decks by Jake Thompson

Other Decks in Education

Featured

Transcript