21 Reliability of ROAR-Frase

ROAR-Frase is a timed Spanish reading measure where the student reads sentences and decides if the statement is true or false. The score is computed as the number of correct trials minus the number of incorrect trials in the alloted period time window. Each participant completed 2 90-second blocks which randomly sampled from a large item bank. We had two administrations, one in 3 different regions in Colombia and one in a region of California where a majority of the students speak Spanish. Colombian students were primarily monolingual Spanish speakers and students in California were bilingual, but many entered school with Spanish as their primary language.

21.1 Criteria for flagging unreliable scores

ROAR-Frase is designed to be totally automated where the student can complete the assessment independent of any assistance from an educator or adult. Instructions are delivered through headphones with an engaging story-line. Additionally, students complete practice trials with feedback to ensure the task instructions are clear. Sentences are presented onscreen and reading is done silently. Students respond with their keyboards. Items are designed in a way that does not require background information to discern if a sentence is true or false.

A potential concern with automated assessments is that, in the absence of a teacher to administer items individually, monitor responses, and score them, some students may disengage from the task, leading to data that does not accurately reflect their actual abilities. ROAR-Frase, having items that are unambiguous and clear, can detect students who were not engaged during the assessment. Our approach to identifying and highlighting disengaged participants with scores that are thought to be unreliable can be seen below. Figure 21.1 shows a plot of median response time (RT) for each participant against the proportion correct on the assessment, collapsed across both 90-second blocks. It is clear that there is a bimodal distrubition that indicates a group of paritipants who were performing at chance and responding very quickly. Participants with a median response time <1,000ms and proportion correct <0.65 are flagged as unreiliable scores in the ROAR score report and removed from the following analyses as it is believed that these scores do no represent a participant’s true ability.

Figure 21.1: Criteria for identifying disengaged participants and flagging unreliable scores on ROAR-Frase. Participants displaying extremely rapid responses performed near chance on ROAR-Frase.

21.2 Alternate form reliability - Colombia

Alternate form reliability for Frase is computed as the Pearson correlation adjusted with the Spearman-Brown formula between scores on the two 90-second blocks that were completed during the same testing session. Figure 21.2 shows a plot of student scores on alternate test forms combining grades and Figure 21.3 shows separate plots for each grade. Table 21.1 reports alternate form reliability for the full Colombian sample and separately by grade. Table 21.2 depicts alternate form reliability for the full Colombian sample separated by gender.

Figure 21.2: ROAR-Frase Colombia alternate form reliability across grades. Alternate form reliability is calculated as the Pearson correlation between scores on the two 90-second blocks that were completed in one sitting and adjusted by the Spearman-Brown formula.

Figure 21.3: ROAR-Frase Colombia alternate form reliability within grades. Alternate form reliability is calculated as the Pearson correlation between scores on the two 90-second blocks that were completed in one sitting and adjusted by the Spearman-Brown formula.

Table 21.1: Alternate form reliability for ROAR-Frase in Colombia split by student grade

Grade	Alternate Form Reliability	N
All	0.90	4512
2	0.69	413
3	0.74	490
4	0.79	475
5	0.80	561
6	0.79	569
7	0.77	496
8	0.82	417
9	0.82	408
10	0.82	346
11	0.84	337

Table 21.2: Alternate form reliability for ROAR-Frase in Colombia split by student gender

Gender	Alternate Form Reliability	N
All	0.90	4512
Female	0.90	1888
Male	0.91	1849

21.3 Alternate form reliability - United States

Here we show reliability between blocks 1 and 2 for all United States data (California). As with Colombia data, alternate form reliability for Frase is computed as the Pearson correlation adjusted with the Spearman-Brown formula between scores on the two 90-second blocks that were completed during the same testing session. Figure 21.4 shows a plot of student scores on alternate test forms combining grades and Figure 21.3 shows separate plots for each grade.

Table 21.3 reports alternate form reliability for the California sample and separately by grade, Table 21.4 shows the breakdown of alternate form reliability by gender, Table 21.5 depicts alternate form reliability for the full California sample separated by English Learner Status, Table 21.6 shows alternate form reliability separated by primary language, Table 21.7 shows breakdown by special education status, and finally, Table 21.8 shows breakdown of reliability by free and reduced lunch status.

Figure 21.4: ROAR-Frase U.S. alternate form reliability across grades. Alternate form reliability is calculated as the pearson correlation between scores on the two 90-second blocks that were completed in one sitting and adjusted by the Spearman-Brown formula.

Figure 21.5: ROAR-Frase U.S. alternate form reliability within grade. Alternate form reliability is calculated as the pearson correlation between scores on the two 90-second blocks that were completed in one sitting and adjusted by the Spearman-Brown formula.

Table 21.3: Alternate form reliability for ROAR-Frase in U.S. by student grade.

Grade	Alternate Form Reliability	N
All	0.76	256
1	0.69	106
2	0.73	150

Table 21.4: Alternate form reliability for ROAR-Frase in U.S. by student gender

Gender	Alternate Form Reliability	N
All	0.76	256
Female	0.78	118
Male	0.73	134

Table 21.5: Alternate Form Reliability reliability for ROAR-Frase in U.S. by English Learner Status

English Learner Status	Alternate Form Reliability	N
All	0.76	256
English Learner	0.71	144
English Only	0.79	73
Initial Fluent English Proficiency	0.74	22
Reclassified Fluent English Proficiency	0.75	13

Table 21.6: Alternate Form Reliability reliability for ROAR-Frase in U.S. by Primary Language

Primary Language	Alternate Form Reliability	N
All	0.76	256
English	0.78	117
Spanish	0.73	125

Table 21.7: Alternate Form Reliability reliability for ROAR-Frase in U.S. by Special Education Status

Special Education Status	Alternate Form Reliability	N
All	0.76	256
Yes	0.79	15
No	0.75	237

Table 21.8: Alternate Form Reliability reliability for ROAR-Frase in U.S. by Free and Reduced Lunch Status

Free and Reduced Lunch	Alternate Form Reliability	N
All	0.76	256
Pays	0.81	82
Reduced	0.81	45
Free	0.65	125