Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Technical evidence for diagnostic assessments

Technical evidence for diagnostic assessments

Diagnostic classification models (DCMs) have grown in popularity over the past decade. However, their adoption in applied settings, especially operational assessment programs, has been minimal to slow. One potential barrier to adoption is the technical evidence recommended for all assessments in the Standards for Educational and Psychological Testing. Many of the methods widely used to provide evidence to meet these recommendations have implicit or explicit assumptions of a continuous unidimensional scale, such as those found in classical test theory and item response theory. In this paper, we describe how the use of a DCM impacts the type of technical evidence that should be provided for an assessment system, as well as methods for providing that evidence. An applied example from an operational assessment program that uses a DCM for reporting is provided, demonstrating how technical evidence can be provided for DCM-based assessments. We provide recommendations for other programs seeking to adopt a diagnostic assessment.

Jake Thompson

June 09, 2021
Tweet

More Decks by Jake Thompson

Other Decks in Research

Transcript

  1. 1 Technical Evidence for Diagnostic Assessments W. Jake Thompson, Amy

    K. Clark, & Brooke Nash Accessible Teaching, Learning, and Assessment Systems University of Kansas
  2. 2 DCMs for Large-Scale Assessments • Alternative to traditional scale-score

    assessments • Provide fine-grained results in the form of a profile of specific skills have been mastered – Multi-dimensional – Categorical, usually binary • Fine-grained results are more actionable, can be used to inform instructional practices
  3. 3 Potential Barriers to Adoption • Interplay between scoring model

    and assessment design • Policy and political context • Providing technical evidence
  4. 4 Technical Evidence • Best practices outlined in the Standards

    for Educational and Psychological Testing • Many methods assume a continuous unidimensional construct • How to provide evidence for DCMs? – Validity – Reliability – Fairness
  5. 5 Dynamic Learning Maps • ELA, mathematics, and science alternate

    assessments used for accountability reporting in over 20 states • Scored using DCMs, with 27–100 unique attributes in each grade and subject • Results provided as discrete mastery profile and aggregated summary of performance, including overall performance level
  6. 6 Validity • Validity arguments provide evidence for the intended

    uses of the assessment • Intended uses might be different in a DCM-based assessment context – Fine-grained results inform subsequent instruction – State accountability systems • How to structure a validity argument to encompass these intended uses?
  7. 8 Required Evidence • Mastery classifications accurately represent students’ knowledge,

    skills, and understandings • Aggregation of mastery classifications is an accurate and reliable indicator of summative performance • Classifications are understandable and actionable
  8. 9 Reliability • Many methods for estimating the reliability of

    attributes in a DCM-based assessment – See Sinharay & Johnson’s (2019) Measures of agreement: Reliability, classification accuracy and classification consistency • What about reliability for more than just the attributes?
  9. 10 • Simulated retests • For each new retest, calculate

    all relevant results (individual and aggregated classifications) • Compare scores from simulated retests to observed scores • Summarize agreement Reliability of Aggregations
  10. 11 Fairness • Very broad area that touches almost all

    aspects of an assessment • Focus here on differential item functioning
  11. 12 Differential Item Functioning • Much theoretical research, less applied

    • Methods are similar to those for CTT- and IRT-based assessments – Logistic regression – Mantel-Haenszel – Model based • Consideration: what is the “ability” matching variable?
  12. 13 Matching Variable: Mastery Profile • Option 1: Matching based

    on mastery of attributes required by the item. – Consistent with psychometric theory, BUT – Sparse data – Fewer items per attribute results in contamination
  13. 14 Matching Variable: Aggregated Mastery • Option 2: Matching based

    on mastery of attributes required by the item. – Fewer data concerns, “continuous” measure; BUT – Possibly introduces construct irrelevant variance – Multiple ways to master the same number of attributes • Used operationally for DLM DIF analyses • Very few differences in which items are flagged for DIF
  14. 15 Evaluating DCM Technical Evidence • Methods and solutions exist

    for evidence requirements related to DCMs • Evidence for the DLM assessments has been used by state partners to meet peer review requirements • More areas to grow – Evaluating model fit and classification accuracy – Alternative methods for providing summative scores
  15. 16 Lessons Learned • Engage with stakeholders early and often

    • Adapt existing solutions to meet the needs and unique characteristics of each assessment • Evidentiary requirements should not be viewed as prohibitive