19  Reliability of ROAR-Letter

19.1 Reliability of calibration sample

A Rasch model was fit to the ROAR-Letter calibration sample (see Table 19.1 and Table 8.1). All ROAR-Letter items fit the model well (see Chapter 4 for fit criteria). Calibration data was obtained from 5187 students who took ROAR-Letter. Two versions of ROAR-Letter were administered: 1086 students took the full, 88 item version (with unlimited time) and 3604 students took a shorter version consisting of 10 letter names and up to 36 letter-sound items, with testing time to limited 5 minutes. On average the latter group completed 45 items (10 letter-name and 35 letter-sound items). Based on this IRT model, overall marginal reliability for ROAR-Letter was calculated to be 0.88 using 25 trials from each participant. Figure 19.2 shows the upper and lower bounds on reliability as a function of the number of items that a participant completes.

Figure 19.1: ROAR-Letter Reliability

25 Trials

Item selection: Fisher information

Marginal reliability = 0.86

Figure 19.2: Letter-CAT simulation based on item-response data in 5187 Kindergarten-2nd grade students. Items were sampled in 3 different ways and marginal reliability was calculated as a function of the number of items that each participant completed. The simulation shows that the choice of items has a major impact on the reliability of the measure. For Optimal sampling (green) the N items with difficulty closest to the participant’s theta estimate were used. For Random sampling (orange) a random sample of N items were taken for each participant. For Worst sampling (purple) the N items with difficulty furthest from the participant’s theta estimate were used. This simulation highlights the massive efficiency gain that would be possible from an optimized CAT.

19.2 Reliability of computer-adaptive ROAR-Letter

A computer-adaptive version of ROAR-Letter is planned for release in fall 2024 and will be more efficient and reliable with fewer items. Using data from the calibration sample, we ran a CAT simulation as described in (Ma et al. 2023) to determine the final item selection criteria that would maximize reliability in the fewest number of trials (Figure 19.2). After selecting 25 trials as the most efficient number, we simulated a 25-trial computer adaptive test using participant responses.

Reliability (\(\rho_{xx^\prime}\)) is computed based on the estimated variance of \(\hat{\theta}\) relative to the estimated standard error (\(\widehat{SE}(\hat{\theta})^2\)) using Equation 23.1.

Table 19.2 reports marginal reliability by grade, based on a 25-item simulation of an optimal item selection algorithm using participant data. To ensure that ROAR-Letter is fair and equitable for different demographic groups, we also report reliability by gender Table 19.3, eligibility for free and reduced price lunch Table 19.4, English learner status based on state of California designations (Table 19.5), primary langauge spoken ?tbl-letter-primarylanguage-reliability, special education Table 19.7, ethnicity Table 19.8, and race Table 19.9.

N % % Missing
Female 1748 0.00 0.00
Free or Reduced Lunch 91 0.00 0.00
Race/Ethnicity
Hispanic Ethnicity 996 19.22 43.64
White 1627 31.39 50.24
Black or African American 410 7.91 50.24
Asian 283 5.46 50.24
American Indian or Alaska Native 64 1.23 50.24
Hawaiian or Other Pacific Islander 42 0.81 50.24
Multiracial 44 0.85 50.24
Total 5183
Table 19.1: Demographics of ROAR-Letter calibration sample.
Grade Empirical Reliability N
All 0.86 5183
Kindergarten 0.89 2799
1 0.75 1296
2 0.61 1088
Table 19.2: Reliability of computer-adaptive ROAR-Letter by Grade (simulated 25-item CAT)
Gender Empirical Reliability N
All 0.86 3746
Female 0.83 1769
Male 0.83 1977
Table 19.3: Reliability of computer-adaptive ROAR-Letter by Gender (simulated 25-item CAT)
Free/Reduced Lunch Status Empirical Reliability N
All 0.86 983
Free/Reduced 0.81 432
Paid 0.63 551
Table 19.4: Reliability of computer-adaptive ROAR-Letter by FRL (simulated 25-item CAT)
English Learner Status Empirical Reliability N
All 0.86 1005
English Learner 0.82 313
English Only 0.68 622
Initial Fluent English Proficient 0.73 57
Reclassified Fluency English Proficient NA 13
Table 19.5: Reliability of computer-adaptive ROAR-Letter by EL Status (simulated 25-item CAT)
Primary Language Empirical Reliability N
All 0.86 972
English 0.68 739
Spanish 0.83 233
Table 19.6: Reliability of computer-adaptive ROAR-Letter by Home Language (simulated 25-item CAT)
Special Education Status Empirical Reliability N
All 0.86 966
No 0.74 869
Yes 0.78 97
Table 19.7: Reliability of computer-adaptive ROAR-Letter by Special Education Status (simulated 25-item CAT)
Hispanic Ethnicity Empirical Reliability N
All 0.86 2938
No 0.78 1941
Yes 0.87 997
Table 19.8: Reliability of computer-adaptive ROAR-Letter by Hispanic Ethnicity (simulated 25-item CAT)
Race Empirical Reliability N
All 0.86 2612
American Indian/Alaska Native NA 45
Asian 0.71 269
Black/African American 0.80 395
Multiracial NA 44
Native Hawaiian/Other Pacific Islander NA 31
White 0.85 1266
Table 19.9: Reliability of computer-adaptive ROAR-Letter by Race (simulated 25-item CAT) (not reported for N<50)

References

Ma, Wanjing A, Adam Richie-Halford, Klint Burkhardt Amy and Kanopka, Clementine Chou, and Jason D Domingue Benjamin and Yeatman. 2023. ROAR-CAT: Rapid Online Assessment of Reading Ability with Computerized Adaptive Testing.”