21 Reliability of ROAR-Palabra

21.1 Background: Published studies

Bhat et al. (2024) reported a large study (N=1,337) examining the relationship between reading skills, math skills, and phonological awareness in Spanish speaking students in Colombia. In this sample, the reported marginal reliability of ROAR-Palabra was 0.92, reliability of ROAR-Frase was 0.82, and ROAR-Fonema was 0.85.

21.2 Criteria for identifying disengaged participants and flagging unreliable scores

To account for unreliable scores and disengaged participation (as discussed in Chapter 16 and shown in Chapter 12), participants with a median response time <450ms are flagged in ROAR-Score reports and their data is excluded from analyses.

21.3 Reliability of fixed-length ROAR-Palabra

ROAR-Palabra runs as fixed-length test and scores are computed based on a Rasch model. The current version of ROAR-Palabra takes about 5 minutes (70 items). In the near future, a computer-adaptive version will become available leveraging the full item bank. Then, more items can be administered for a more precise measure or fewer items can be administered as a quick screener. Table 21.1 reports marginal reliability computed based on data from 6470 students under the IRT model for the standard, 70 item version of ROAR-Palabra. Reliability (\(\rho_{xx^\prime}\)) is computed based on the estimated variance of \(\hat{\theta}\) relative to the estimated standard error (\(\widehat{SE}(\hat{\theta})^2\)) using Equation 23.1:

\[ \hat{\rho}_{xx^\prime} = \frac{\widehat{VAR}(\hat{\theta})}{\widehat{VAR}(\hat{\theta}) + \widehat{SE}(\hat{\theta})^2}, \tag{21.1}\]

Grade	Empirical Reliability	N
All	0.94	5752
1	0.73	561
2	0.88	793
3	0.91	515
4	0.91	510
5	0.91	580
6	0.89	603
7	0.85	531
8	0.84	427
9	0.82	412
10	0.79	421
11	0.79	398

Table 21.1: Reliability of ROAR-Palabra by Grade

To ensure that ROAR-Palabra is fair and equitable for different demographic groups, we also report reliability by gender (Table 21.2), eligibility for free and reduced price lunch (Table 21.3), English learner status based on state of California designations (Table 21.4), primary langauge spoken (Table 21.5), special education (Table 21.6), ethnicity (Table 21.7), and race (@Table 21.8)

Gender	Empirical Reliability	N
All	0.94	4640
F	0.94	2343
M	0.94	2297

Table 21.2: Reliability of ROAR-Palabra by Gender

Free/Reduced Lunch Status	Empirical Reliability	N
All	0.85	541
Free	0.84	284
Paid	0.85	169
Reduced	0.88	88

Table 21.3: Reliability of ROAR-Palabra by FRL (California Sub-sample Only)

English Learner Status	Empirical Reliability	N
All	0.85	541
English Learner	0.85	319
English Only	0.85	148
Initial Fluent English Proficient	0.86	51
Reclassified Fluent English Proficient	NULL	22

Table 21.4: Reliability of ROAR-Palabra by EL Status (California Sub-sample Only)

Primary Language	Empirical Reliability	N
All	0.86	518
English	0.86	230
Spanish	0.85	288

Table 21.5: Reliability of ROAR-Palabra by Primary Language (California Sub-sample Only)

Special Education Status	Empirical Reliability	N
All	0.85	541
No	0.86	502
Yes	0.77	39

Table 21.6: Reliability of ROAR-Palabra by Special Education Status (California Sub-sample Only)

Hispanic Ethnicity	Empirical Reliability	N
All	0.85	523
No	0.79	44
Yes	0.85	479

Table 21.7: Reliability of ROAR-Palabra by Hispanic Ethnicity (California Sub-sample Only)

Race	Empirical Reliability	N
All	0.85	523
Asian	NULL	7
Black or African American	NULL	5
Hawaiian or Other Pacific Islander	NULL	2
Hispanic	0.85	479
White	0.81	30

Table 21.8: Reliability of ROAR-Palabra by Race (California Sub-sample Only)

References

Bhat, Kruttika G., Alexa Mogan, Ana Saavedra, Mia Fuentes-Jimenez, Julian M. Siebert, Wanjing Anya Ma, Carrie Townley-Flores, et al. 2024. “Shared and Unique Influences of Phonological Processing on Reading and Math.” OSF Preprints. https://doi.org/10.31219/osf.io/em3bg.