Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Psychometric considerations for learning maps-based assessments

Jake Thompson
February 22, 2019

Psychometric considerations for learning maps-based assessments

Learning map models are a type of cognitive model composed of multiple interconnected learning targets and other critical knowledge and skills. The Dynamic Learning Maps (DLM) Alternate Assessment System uses learning maps models as the basis for assessment for students with significant cognitive disabilities. The DLM maps and corresponding assessments provide access to multiple and alternate routes to achieving the learning targets, making it more inclusive for learners with various disabilities. However, the unique approach and design of the assessment system designed to maximize accessibility also poses unique psychometric challenges. In this presentation, we will discuss: (1) the DLM assessment design; (2) the diagnostic classification model (DCM) used to evaluate student performance; (3) approaches to empirically evaluating the map structures including future directions for data collection; and (4) results from research conducted on teachers’ interpretations of diagnostic assessment results.

Jake Thompson

February 22, 2019
Tweet

More Decks by Jake Thompson

Other Decks in Education

Transcript

  1. 1
    February 22nd, 2019
    Psychometric Considerations for
    Learning Maps-Based Assessments
    EPSY 896

    View full-size slide

  2. 2
    Topics
    • Overview of DLM assessment design
    • Diagnostic classification modeling
    • Map validation research
    • Score report research

    View full-size slide

  3. 3
    OVERVIEW OF DLM ASSESSMENT DESIGN

    View full-size slide

  4. 4
    DLM Background
    • Serves students with the most significant cognitive
    disabilities (SCD)
    • Provides opportunity for students to show what
    they know and can do in:
    – English language arts
    – Mathematics
    – Science
    • Consortium of 18 states and the District of
    Columbia

    View full-size slide

  5. 5
    Defining the Domain with Learning Map Models
    • DLM Alternate Assessment System uses highly
    connected learning map models
    • Nodes in the learning maps represent:
    – Knowledge
    – Skills
    – Understanding
    – Foundational Skills
    • Includes multiple and alternate pathways by which
    students may demonstrate content knowledge and
    skills

    View full-size slide

  6. Example Math Mini-Map

    View full-size slide

  7. 8
    Essential Elements
    • Alternate grade-level expectations (content standards)
    • Provides students access to the maps at five linkage
    levels:
    – Initial Precursor (IP)
    – Distal Precursor (DP)
    – Proximal Precursor (PP)
    – Target (T)
    – Successor (S)
    • Linkage levels are collections nodes on the path
    toward the standard

    View full-size slide

  8. 10
    Testlets
    • Items are administered in short testlets
    • Testlets are collections of 3-9 items centered
    around an engagement activity
    – Story or context
    • Testlets measure a single linkage level
    • Items measure a single EE

    View full-size slide

  9. 11
    Testlets Measure Linkage Levels
    Initial
    Precursor
    Target
    Successor
    Connects the learning map …
    Behavior
    IP
    Testlet
    …to the items delivered.
    Distal
    Precursor
    Proximal
    Precursor
    Behavior
    Behavior
    Behavior
    Behavior
    DP
    Testlet
    PP
    Testlet
    T
    Testlet
    S
    Testlet
    *Science has 3 linkage levels: Initial, Precursor, and Target

    View full-size slide

  10. 12
    Goals of the Testlet Assignment Process
    • Assign first testlet content that is both rigorous and
    also something the student can access.
    • Base subsequent testlet assignments on student
    performance to provide the closest match to
    students’ knowledge and skills while covering
    blueprint requirements.

    View full-size slide

  11. 13
    Testlet Assignment Process
    The DLM spring testlet assignment process
    involves two main steps:
    1. the selection of linkage level for the first
    administered testlet is determined from
    survey information about the student,
    and
    2. the assignment of linkage level for all
    subsequent administered testlets occurs
    through adaptive routing.

    View full-size slide

  12. 14
    DIAGNOSTIC CLASSIFICATION MODELING

    View full-size slide

  13. 15
    Traditional (IRT) Scoring

    View full-size slide

  14. 16
    Moving to a More Fine-Grained Model
    Distinguish things that
    grow from things that
    don’t grow.
    Initial
    Provide evidence that
    plants grow.
    Precursor
    Provide evidence that
    plants need air and
    water to grow.
    Target
    SCI.5.LS.1.1: Provide evidence that plants need air and water to grow.

    View full-size slide

  15. 17
    Diagnostic Classification Modeling
    • Diagnostic classification
    modeling (DCM) is a
    statistical method that
    provides diagnostic
    feedback about students’
    mastery of discrete skills
    • Latent class analyses are
    conducted separately for
    each linkage level for
    each EE.

    View full-size slide

  16. 18
    DLM Scoring Overview
    • DCM is used to create a profile of skill mastery
    ! "#
    = %
    &'(
    )
    *&
    +
    ,'(
    -
    .
    ,&
    /0(1 − .,&
    )(5/0
    • To create the mastery profile, each student is
    classified as either a master or non-master of each
    linkage level (LL) within an Essential Element

    View full-size slide

  17. 19
    Defining Mastery
    For DLM assessments, there are three ways to be
    considered a master of a linkage level:
    1. The student’s probability of mastery from the
    diagnostic model is estimated to be ≥0.8, OR
    2. The student answered ≥80% of items correctly for the
    linkage level, OR
    3. If neither of the first two conditions occurs, mastery
    status is assigned two levels down from the linkage
    level assessed.

    View full-size slide

  18. 20
    Linkage Level Mastery:
    Probability
    Using all student responses to items for a given linkage
    level within an Essential Element, the statistical model is
    applied to determine the probability that a student is a
    master of that linkage level:
    Definitely Not Mastered
    (0% chance of mastery)
    Definitely Mastered
    (100% chance of mastery)
    0 100

    View full-size slide

  19. 21
    Linkage Level Mastery:
    Probability
    The statistical model tells us the probability that the
    student is a master. For DLM assessments, the student
    must have an 80% or greater chance of mastery to be
    considered a master.
    Definitely Not Mastered
    (0% chance of mastery)
    Definitely Mastered
    (100% chance of mastery)
    0 100
    27%
    chance
    53%
    chance
    86%
    chance

    View full-size slide

  20. 22
    Linkage Level Mastery:
    Percent Correct
    • If mastery is not demonstrated based on
    probability, mastery can alternately be achieved by
    percent correct.
    • Using all student responses to items for a given
    linkage level within an Essential Element, if the
    percent correct is ≥80%, the student is classified as
    a master.

    View full-size slide

  21. 23
    Linkage Level Mastery:
    Two-Down Rule
    • If mastery is not demonstrated by probability or
    percent correct, mastery status is assigned two
    linkage levels down from the linkage level assessed
    No
    Mastery
    Initial
    Precursor
    Distal
    Precursor
    Proximal
    Precursor
    Target Successor
    No Mastery Initial Precursor Target

    View full-size slide

  22. 24
    2017-2018 Mastery Assignment

    View full-size slide

  23. 25
    Aggregating Linkage Level Performance
    • Linkage level results must be combined to
    determine how the student performed on the
    Essential Element
    • When mastery is demonstrated for higher linkage
    levels, students are also deemed masters of lower
    linkage levels within an Essential Element.
    • The total mastered linkage levels are summed to
    determine overall performance in the subject

    View full-size slide

  24. 26
    Example EE Mastery
    • Student tests on the Target
    – Answers 80% of items correctly
    – Posterior probability of mastery is 97%
    • Master of Target
    • Master of all below
    Initial
    Precursor
    Distal
    Precursor
    Proximal
    Precursor
    Target Successor

    View full-size slide

  25. 27
    Example EE Mastery
    • Student tests on the Target
    – Answers 75% of items correctly
    – Posterior probability of mastery is 63%
    • Master of Distal Precursor
    • Master of all below
    Initial
    Precursor
    Distal
    Precursor
    Proximal
    Precursor
    Target Successor

    View full-size slide

  26. 28
    DLM Aggregated Levels of Reporting

    View full-size slide

  27. 29
    Summary of Stages of Scoring
    Performance
    Level
    Classification
    Highest LL
    Mastered by EE
    Total LLs
    Mastered
    Statistical
    Modeling of LL
    Mastery
    Items
    Administered

    View full-size slide

  28. 30
    MAP VALIDATION RESEARCH

    View full-size slide

  29. 31
    It All Starts With the Map
    • Key assumption: the map is correct
    • Two levels to the assumption:
    1. Hierarchical ordering of linkage levels
    2. Map structure
    • How do validate this assumption?
    – Procedural Evidence
    – Empirical Evidence

    View full-size slide

  30. 32
    M.3.NF.1-3: Differentiate a fractional part from a whole.
    Procedural Evidence
    Recognize “some”
    Recognize
    wholeness and
    separateness
    Divide shapes into
    distinct parts
    Recognize parts of
    whole/unit; know
    unit fraction
    Recognize fraction,
    whole, and one-half

    View full-size slide

  31. 33
    Empirical Methods
    • Current efforts focused on Phase I: Linkage Level
    Ordering
    • Three methods
    – Patterns of Mastery Profiles
    – Patterns of Mastery Assignment
    – Patterns of Attribute Difficulty

    View full-size slide

  32. 34
    Map Structure in a DCM Context

    View full-size slide

  33. 35
    Attribute Hierarchies
    ELA and Mathematics Science
    [0,0,0]
    [1,0,0]
    [1,1,0]
    [1,1,1]
    [0,0,0,0,0]
    [1,0,0,0,0]
    [1,1,0,0,0]
    [1,1,1,0,0]
    [1,1,1,1,0]
    [1,1,1,1,1]

    View full-size slide

  34. 36
    Patterns of Mastery Profiles
    • Estimate two models
    – Saturated model: all possible
    profiles
    – Reduced model: only
    hypothesized profiles
    • Assess model fit
    – Posterior predictive model
    checks
    – Model comparisons
    Initial Precursor Target
    0 0 0
    1 0 0
    0 1 0
    0 0 1
    1 1 0
    1 0 1
    0 1 1
    1 1 1

    View full-size slide

  35. 37
    Posterior Raw Score Distribution

    View full-size slide

  36. 38
    !"
    Posterior Distribution

    View full-size slide

  37. 39
    Relative Fit
    • Leave-one-out cross
    validation (LOO)
    • Predictive density
    • Balances predictive power
    with model complexity
    Model LOO ELPD Standard Error
    Saturated -88717.4 174.9
    Reduced -89691.3 377.5
    Comparison -973.9 259.7

    View full-size slide

  38. 40
    Limitations
    • Both models must converge
    – Requires respondents in all possible classes
    – Requires respondents to test on multiple attributes
    • What to do when these requirements are not met?

    View full-size slide

  39. 41
    Patterns of Attribute Mastery
    • Estimate each attribute as a separate 1-attribute
    DCM (equivalent to LCA)
    • Set mastery threshold (0.8)
    Student Initial Precursor Target
    1 .97 .85 .43
    2 .86 .52 .13
    3 .92 .89 .83
    4 .88 .65 .85
    5 .55 .70 .33
    … … … …
    Student Initial Precursor Target
    1 1 1 0
    2 1 0 0
    3 1 1 1
    4 1 0 1
    5 0 0 0
    … … … …
    Student Initial Precursor Target
    1 1 1 0
    2 1 0 0
    3 1 1 1
    4 1 0 1
    5 0 0 0
    … … … …

    View full-size slide

  40. 42
    Analyzing Reversals
    • 9.4% of students had an
    unexpected attribute mastery
    profile
    – 51% flagged for reversal
    between Initial and Precursor
    levels
    – 49% flagged for reversal
    between Precursor and Target

    View full-size slide

  41. 43
    Limitations
    • Doesn’t directly account for the relationships
    between attributes
    • Different mastery thresholds will give different
    results
    • Doesn’t account for error in the mastery
    classifications

    View full-size slide

  42. 44
    Patterns of Attribute Difficulty
    • Measure difficulty of
    linkage levels using p-
    values
    • Group similar students
    • Calculate the weighted
    average p-value for a
    linkage level (attribute)
    and group
    Item p-value SE Weight Scaled
    1 0.20 0.03 874.47 0.09
    2 0.18 0.03 979.15 0.10
    3 0.23 0.04 814.34 0.08
    4 0.21 0.03 852.96 0.09
    5 0.13 0.03 1,280.77 0.13
    6 0.23 0.04 796.96 0.08
    7 0.09 0.02 1,708.21 0.17
    8 0.25 0.04 749.40 0.08
    9 0.18 0.03 949.76 0.10
    10 0.20 0.03 874.47 0.09
    Avg. 0.18 0.13

    View full-size slide

  43. 45
    Difficulty Patterns
    • Most groups follow the
    expected pattern
    • Band 3 reversed, but
    within the margin of error

    View full-size slide

  44. 46
    Limitations
    • Not model based
    • Single p-value obscures a property of diagnostic
    models
    – p-value for masters
    – p-value for non-masters
    • Assumes some level of consistency within groups

    View full-size slide

  45. 47
    Ongoing Research
    • Field Test
    – Assigning tests from operational pool at linkage levels
    adjacent to those tested operationally
    • I-SMART
    – Designing multi-node science testlets
    – Overlapping assignment of tests along “mini-
    progressions”
    • AAI-AICFE
    – Fixed form, multi-node design

    View full-size slide

  46. 48
    SCORE REPORT RESEARCH

    View full-size slide

  47. 49
    Large-Scale Assessment Context
    • Summative assessment results for specific purposes
    – Inclusion in state accountability metrics
    – Program evaluation
    – Resource allocation
    • Less emphasis on use in classrooms to inform
    learning

    View full-size slide

  48. 50
    Challenges to Instructional Use of
    Large-Scale Assessment Results
    • Typically created for summative purposes
    • Results are useful for reporting aggregated results,
    but less so for instructional practice
    • Score reports are often delivered after the
    conclusion of the academic year
    • Students advance a grade and are taught the new
    grade’s academic content standards

    View full-size slide

  49. 51
    System Feedback Loop
    Assessment
    Results
    Instruction

    View full-size slide

  50. 52
    Consequential Evidence
    The Standards for Educational and Psychological
    Testing state:
    “The validation process involves gathering
    evidence to evaluate the soundness of proposed
    interpretations for their intended uses.”

    View full-size slide

  51. 54
    Research Questions
    1. How do teachers use diagnostic score reports to
    inform instructional decision-making?
    2. How do teachers talk to parents about diagnostic
    score reports?
    3. Are there additional resources teachers need to
    support their use of diagnostic score reports for
    instructional decision-making?

    View full-size slide

  52. 57
    Data Collection
    • Focus groups with 17 teachers from 3 states
    • Eligible teachers indicated they:
    – currently taught one or more students who took DLM
    assessments in 2017-2018,
    – received DLM 2017 summative score reports for their
    2017-2018 students, and
    – used the DLM 2017 reports during the 2017-2018
    academic year.

    View full-size slide

  53. 58
    Receiving Reports
    • All received reports in the fall
    – Ranged from email notification to district meeting with
    discussion
    • Shared a desire for more information when
    receiving reports, including direct access to
    interpretive materials and meetings to discuss how
    to interpret and use results

    View full-size slide

  54. 59
    Instructional Use
    • Observed differences by grade level
    • For elementary and middle school teachers, whose
    students take assessments annually, reports were
    more useful for instructional decision-making
    • High school teachers reported more challenges,
    particularly for 11th grade teachers whose students
    were last assessed in 8th grade

    View full-size slide

  55. 60
    Instructional Use: Planning
    • Use fine-grained mastery to plan instruction on
    similar standards
    – Varied in prioritizing depth versus breadth
    • PLDs and conceptual area percent of skills
    mastered to more generally plan instruction for
    collections of related content standards
    – Combined with results from other assessments

    View full-size slide

  56. 61
    Instructional Use: IEP Goals
    “Their IEP goals are very similar to their linkage level
    [statement]. I can say, ‘Hey, let’s look at this linkage
    level, and let’s look at this target skill and this is what
    we’re working on in your IEP.’ It’s real easy for me to
    tie all these things together so we don’t have this weird
    zigzag of skills. [It’s] more streamlined and better
    growth.”

    View full-size slide

  57. 62
    Instructional Use: IEP Goals
    “We have a district assessment in the fall, they provide a
    report and summary. I try to see if there is still a
    deficiency based on the DLM [results from] the spring in
    the new report in the fall to see if that is an area that
    there’s still a weakness. If there is then that’s definitely
    something I would spend more time on. That’s more of
    how I create my goals.”

    View full-size slide

  58. 63
    Instructional Use: Groupings
    • Using mastery to plan instruction for students
    working on the same skills across standards
    • Desire for an aggregated report that made
    instructional groupings more clear, particularly
    around standards and levels students were working
    on in common

    View full-size slide

  59. 64
    Talking With Parents
    • With a few exceptions, parents generally did not ask
    questions about the DLM assessment or score reports
    • The extent of information parents received about the
    assessment and results was dependent upon what the
    teacher offered
    – Teachers did not receive a copy of the Parent Interpretive
    Guide to distribute with reports
    – Teachers highlighted importance of understanding
    assessment and results when talking to parents

    View full-size slide

  60. 65
    Resources: Parents
    • Conferences and IEP meetings often inundate
    parents with information
    • Making resources available online
    – Brief overview, such as a short video explaining system
    and calculation of results
    – Parent Interpretive Guide
    – Cheat sheets for tying academic content to day-to-day
    interactions (e.g., shopping)

    View full-size slide

  61. 66
    Resources: Teachers
    • More training:
    – e.g., 1) complete required training; 2) receive reports
    and discuss how to interpret; 3) planning instruction
    from report, including cross-grade collaborations
    • Aggregate reporting:
    – Summary information to make instructional groupings
    more readily apparent

    View full-size slide

  62. 67
    Resources: District
    • More training at district level on assessment and
    interpretation to facilitate professional
    development
    • District aggregated reports to identify standards or
    conceptual areas that tend to be more challenging
    – Use to identify resources to facilitate instruction in
    those areas

    View full-size slide

  63. 68
    Key Takeaways
    1. Challenge identifying eligible teachers
    – Both receiving reports and use
    2. Utility for diagnostic results
    – Instructional use, IEP goal setting, & instructional groupings
    3. Teachers desire professional development to better
    understand score report contents and ways to use
    4. Programmatic opportunities for making instructional
    and assessment resources more widely available to
    parents, teachers, and districts
    – Finding the right balance of local control and availability of
    resources

    View full-size slide

  64. 69
    Considerations for the Field
    1. Expanding collection of consequential evidence
    2. Availability of resources and interpretive guides
    and how to ensure the appropriate groups have
    access
    3. Making reports available at a level of reporting
    that supports instructional practice
    – Balanced with technically sound measurement that
    supports making inferences at this level

    View full-size slide

  65. 70
    THANK YOU!
    www.dynamiclearningmaps.org

    View full-size slide