4 Computer Adaptive Testing (CAT)
Computer Adaptive Testing (CAT) is a method of administering assessments that adapts to the participant’s ability level. As a dynamic approach to assessment, CAT uses algorithms to select items based on the participant’s previous answers with the goal of delivering items that are best suited to a participant’s ability level. For example, after a correct answer, the next item will be slightly more challenging; if the answer is incorrect, the following item will be easier. This process continues throughout the test, allowing the CAT algorithm to pinpoint the participant’s ability level with greater precision and efficiency than traditional fixed tests. CAT offers several advantages, including shorter testing times, reduced test anxiety due to fewer irrelevant questions that are far too easy or difficult for a participant’s ability level, and enhanced test security, as each test is unique to the individual. One of the requirements of a computer adaptive test is that responses can be scored in real-time so that the next item can be selected. This is a strength of ROAR as all items are scored immediately. Most of the ROAR assessments including ROAR-Phoneme (see Chapter 7), ROAR-Letter (see Chapter 8), and ROAR-Word (see Chapter 5), are implemented as CATs. Timed assessment including ROAR-Sentence (see Chapter 6) and ROAR-RAN (see Section 9.2.1) are not computer adaptive because for these “fluency” measures response time is fundamental to the measurement so it is critical that each participant sees the same items (or equated items). All computer adaptive ROAR measures use jsCAT, an open-source, Javscript CAT package developed by the ROAR team.
The implementation of CAT within each ROAR measure allows for three unique properties of ROAR:
- Since each measure adapts to the participants ability level, students across a very large age range can be compared on the same “vertical” measurement scale. A vertical scale in educational assessment is a single, continuous scale used to measure student achievement or ability across multiple grade levels or age groups. This type of scaling allows for the comparison of test scores over time, providing a coherent framework to track academic growth and development. For example, a 1st grader and an 8th grader can both take ROAR-Word and the CAT algorithm will ensure that the 1st grader is presented easier items than the 8th grader. The score that is returned by ROAR-Word will put them on the same measurement scale so that an individuals growth can be tracked across the grades, over the course of an intervention, or can be compared to scores in different grades.
- Individual ROAR measures are highly efficient. Since each individual subtest is controlled by a CAT algorithm, they produce very reliable scores with fewer items than a traditional approach.
- Our CAT implementation allows ROAR to simultaneously operate as an efficient screener that returns risk metrics based on composite scores while also providing precise measures of specific, actionable sub-skills.
4.1 CAT parameters and item selection
Unless otherwise specified, ROAR assessments use a Rasch model and items are selected based on Fisher information (Linden 2000; Ma et al. 2023).
4.2 IRT models and item validation
Unless otherwise specified, ROAR uses a Rasch model (one parameter logistic), to put items on a vertical scale. Infit and outfit statistics are used to ensure that each item fits the measurement scale well. Any items with with infit/outfit statistics outside the range of 0.7 - 1.3 are removed (Wu and Adams 2013). Ensuring that each item fits the measurement scale validates that the item taps into the same latent construct as the other items in the assessment. Finally, to ensure that the measurement scale does not have bias against any demographic group we take two approaches:
- We run validation studies to ensure that the reliability and criterion validity are equivalent across race, ethnicity, and socio-economic status, school district, and level of English proficiency. We take particular care to validate ROAR for English language learners (ELLs).
- We run studies of parameter invariance (see (Ma et al. 2023)) to ensure that the difficulty of each item is consistent across samples spanning different school districts with different demographics. Parameter invariance ensures that the assessments function equivalently across diverse groups of participants.