18 ROAR-Written Vocabulary Study Description

In 2023-24 and 2024-15, we conducted a study to validate the measurement approach for ROAR-Written Vocabulary and examine its technical qualities. The study employed a cross-sectional design. The initial dataset included a total of 4,752 students from kindergarten through twelfth grade who took the ROAR-Written-Vocabulary for the first time. We excluded 39 students who opted out from the study. Additionally, excluded were 75 students who was missing a time stamp for finishing the assessment. This resulted in analytic sample of 4638 for the main (calibration) analysis with a retention rate of 97.6%.

18.1 Study Components

The study was comprised of the following three parts:

Main (Calibration) Analysis (an overall sample, N = 4638): This full sample was used to established measurement properties including reliability estimates, item analysis, construct validity through the WrightMap. Student grade levels ranged from first through twelve-th grade.
External Validity Analysis (a sub sample, n = 810): Of the 4638 students, we obtained data from 810 students in grades 3-5 in California on the following additional measures: ELA scores from the state standardized test (SBAC-ELA), ROAR-Word and ROAR-Sentence. This subset was used for external validity analysis.
Fairness Analysis (a subsample, n = 3716): Of the 4638 students, we obtained information on gender via district administered records for 3716 students. This sub sample (Male, 50.7%, vs. Female, 49.3%) was used to investigate differential item functioning. Student grade levels ranged from first through twelve-th grade.

This multi-component approach enables comprehensive evaluation of assessment quality while addressing practical constraints inherent in large-scale educational research. The nested sampling design maximizes the use of available data while maintaining appropriate sample sizes for specialized analyses that require specific criterion measures or demographic characteristics.

Participants were recruited through established partnerships with research-oriented districts and organizations (n = 25) that had previously collaborated on literacy assessment research and demonstrated commitment to evidence-based educational practice. The majority of these organizations were public school districts (n =14), although there were seven private schools serving special needs students and four community organizations offering after-school tutoring.

Geographically, seven of the 25 organizations were located in California, three in Georgia, two each in Florida, Illinois and Michigan. The remaining nine organizations were from nine other states scattered across the country (CO, DC, IL, IN, MN, MS, NY, SC, VA).

Table 18.1 and Table 18.2 below show the three samples by grade level and race/ethnicity.

Grade	Main Sample (n = 4638)	External Validity (n = 810)	DIF Sample (n = 3716)
1	0.2%	NA	0.2%
2	8.1%	NA	10.1%
3	10.7%	38.9%	12.6%
4	14.9%	47.3%	12.0%
5	12.2%	13.8%	4.8%
6	9.8%	NA	10.3%
7	9.8%	NA	11.0%
8	7.5%	NA	8.1%
9	9.8%	NA	11.0%
10	7.4%	NA	8.6%
11	5.6%	NA	6.5%
12	3.9%	NA	4.8%
Total	100.0%	100.0%	100.0%
Note: Only grades 3-5 were included in the External Validity sample as state standardized test scores were available for only those grades.

Table 18.1: ROAR-Written-Vocabulary: Grade-Level Compositions

Race/Ethnicity	Main Sample (n = 4638)	External Validity (n = 810)	DIF Sample (n = 3716)
Asian	8.0%	25.7%	10.0%
Black/African American	2.2%	0.9%	2.7%
Hispanic/Latinx	35.2%	9.6%	43.9%
Other/Multiracial	4.7%	17.8%	5.8%
White	13.2%	16.2%	16.0%
NA	36.7%	29.9%	21.6%
Total	100.0%	100.0%	100.0%

Table 18.2: ROAR-Written-Vocabulary: Race & Ethnicity

18.2 Item Sample

Eighty nine items were retained from 101 administered items. Twelve items were dropped based on the initial calibration results. A list of these dropped items along with reasons for exclusion can be found in Table 21.1 in Appendix.The remaining 89 items Flesh-Kincaid grade level values ranged from ~2 to 12 with mean of 6.5. The middle 75% of the items were in the range of 5th grade through 8th grade level.

18.3 Study Procedure

Students took the ROAR-Written-Vocabulary assessment online in their classrooms between 2024-01-17 and 2025-06-09, proctored by teachers or other adults (e.g., reading specialist).

ROAR-Written Vocabulary was administered through the ROAR platform using standardized procedures developed for the comprehensive assessment suite. Students completed the assessment in their regular classrooms using individual computing devices, with administration proctored by classroom teachers or other school personnel trained in ROAR procedures.

The vast majority (99%) of students completed the assessment in about 11 minutes, with an average of 5.75 minutes to complete, with the middle 75% of students finishing between 5.1 and 6 minutes.

For this study, items were randomly ordered in each session. The number of items students took ranged from 5 to 94 with the middle 75% of students taking between 22 to 30. Future versions will implement computer-adaptive testing (CAT) to further improve assessment efficiency, which is expected to reduce average completion times while maintaining or enhancing measurement precision.

Students answered an average of 59.64% of items correctly, with individual performance ranging from 0% to 100% correct responses.