12 ROAR-Morphology Study Description

In Spring 2024, we conducted a study to validate the measurement approach for ROAR-Morphology and examine its technical qualities. The study employed a cross-sectional design with a total of 717 students in grades 2-5 from 4 school districts in Northern California and included 40 items.

12.1 Study Components

The study was comprised of the following three parts:

Main (Calibration) Analysis (an overall sample, N = 717): The full sample used to established measurement properties including reliability estimates, item analysis, construct validity through the WrightMap
External Validity Analysis (a sub sample, n = 296): Of the 717 students, we obtained data from 296 students on the following additional measures: SBAC-ELA, SBAC-Math, ROAR-Word and ROAR-Sentence. This subset was used for external validity analysis.
Fairness Analysis (a subsample, n = 667): Of the 717 students, we obtained information on primary language via district administered records for 667 students. This sub sample (English, 58.2%, vs. non-English, 41.8%) was used to investigate differential item functioning.

12.2 Student Sample

The study began with 745 initial participants across grades 2-5, ultimately resulting in the main calibration sample of 717 students, representing a retention rate of 96.24%. Parental opt out, absence, not completing the assessment were the main reasons for attrition. The minimal attrition rate provides strong evidence for the feasibility of ROAR-Morphology for large-scale administration across diverse students.

12.2.1 Grade level composition

Table 12.1 shows the grade level compositions of the main sample as well as the two subsamples.

Grade	Main Sample (n = 717)	External Validity (n = 296)	DIF Sample (n = 667)
2	43.8%	NA	42.0%
3	19.9%	45.6%	19.3%
4	17.4%	20.9%	18.4%
5	18.8%	33.4%	20.2%
Total	100.0%	100.0%	100.0%
Note: No second graders were included in the External Validity sample as they do not participate in the state standardized tests.

Table 12.1: ROAR-Morphology: Grade-Level Compositions

12.2.2 Demographic characteristics

Table 12.2, Table 12.3 and Table 12.4 below show the three samples by race/ethnicity, EL-status, and primary language, respectively. These tables show notable cultural and linguistic diversity with ~15% English Learners and ~40% identifying a language other than English as primary language, reflecting the Northern California regional context. Table 12.5 shows that 2% of the sample qualified for 504 plans.

Race/Ethnicity	Main Sample (n = 717)	External Validity (n = 296)	DIF Sample (n = 667)
American Indian/Alaska Native	0.8%	1.7%	0.9%
Asian	31.8%	36.5%	31.6%
Black/African American	0.4%	0.3%	0.3%
Hispanic/Latino	17.7%	14.9%	19.0%
Pacific Islander	0.1%	0.3%	0.1%
Two or more races	16.5%	11.5%	16.3%
White	26.9%	29.7%	27.0%
NA	5.7%	5.1%	4.6%
Total	100.0%	100.0%	100.0%

Table 12.2: ROAR-Morphology: Race & Ethnicity

EL-status	Main Sample (n = 717)	External Validity (n = 296)	DIF Sample (n = 667)
EL	14.5%	12.2%	15.6%
EO	54.1%	53.7%	58.2%
IFEP	17.6%	17.9%	18.9%
RFEP	6.8%	11.8%	7.3%
NA	7.0%	4.4%	NA
Total	100.0%	100.0%	100.0%

Table 12.3: ROAR-Morphology: EL status

Primary Language	Main Sample (n = 717)	External Validity (n = 296)	DIF Sample (n = 667)
English	54.1%	53.7%	58.2%
non-English	38.9%	41.9%	41.8%
NA	7.0%	4.4%	NA
Total	100.0%	100.0%	100.0%

Table 12.4: ROAR-Morphology: Primary Language

Special Education Status	Main Sample (n = 717)	External Validity (n = 296)	DIF Sample (n = 667)
non-sped	91.1%	93.6%	97.9%
sped	2.0%	2.0%	2.1%
NA	7.0%	4.4%	NA
Total	100.0%	100.0%	100.0%

Table 12.5: ROAR-Morphology: Special Education Status

12.3 Generalizability and Regional Context

Comparison with national public school enrollment data reveals distinct demographic differences in the main calibration sample. Specifically, Table 12.6 shows that Asian and multiracial students are overrepresented in our main calibration sample concentrated in Northern California, while Hispanic/Latino, White, and Black/African American students appear in lower proportions than the national public school enrollment statistics. The representation of American Indian/Alaska Native and Pacific Islander students is low, aligning with national statistics.

Race/Ethnicity	Main Sample (n = 676)	National*	Difference
Hispanic/Latino	18.8%	29.0%	-10.2%
White	28.6%	44.6%	-16.0%
Asian	33.7%	5.4%	+28.3%
Two or more races	17.5%	5.0%	+12.5%
Black/African American	0.4%	14.9%	-14.5%
American Indian/Alaska Native	0.9%	0.9%	0.0%
Pacific Islander	0.1%	0.4%	-0.3%
Note:
Note. Main sample (n=676) excludes 41 students missing the race/ethnicity information.
^* source: https://nces.ed.gov/programs/coe/indicator/cge/racial-ethnic-enrollment

Table 12.6: ROAR-Morphology: Race/Ethnicity Compared to National Demographics

The demographic composition provides both strengths and considerations for generalizability. The sample’s substantial linguistic diversity, with ~15% English Learners compared to 10.6% nationally and ~40% speaking a language other than English at home, offers valuable insights into assessment performance across multilingual populations. This linguistic diversity is particularly relevant for morphological assessment, as language background and exposure may influence how students develop morphological knowledge (Nagy, Carlisle, and Goodwin 2013; Ramirez et al. 2010).

12.4 Item Sample

Forty items were retained from 45 administered items. Five items were dropped from the study due to inconsistency with item design: one item had the base word as the correct answer, requiring no morphological shift, while the remaining four items had more than one correct answer syntactically and semantically. The retained 40 items showed systematic variation across the key design features, as shown in Table 12.7 below.

	N = 40
Target Word Type
derivational-common	15 (38%)
derivational-less-common	13 (33%)
inflectional	12 (30%)
Derivational distractors
0 or 1 derivational distractors	26 (65%)
2-derivational distractors	14 (35%)
Target Word Frequency
Mean (SD)	2.53 (0.86)
Min, Max	0.85, 4.24
Sentence Syntax
not simple	6 (15%)
simple	34 (85%)
Note. ‘Common’ and ‘less-common’ indicate suffix frequency determined by lists provided by Honig et al. (2000). Target word frequency is the log10 version of frequency norms based on the SUBTLEXus corpus (Brysbaert & New, 2009). Sentence syntax was coded by a researcher with an AI tool.

Table 12.7: ROAR-Morphology items retained in analyses, by word/item features

The forty items were designed in accordance with an initial construct map as shown in Table 12.8.

Waypoint	Students successfully recognize and manipulate morphemes to transform a base word into:
3	derivational word with less common suffix
2	derivational word with common suffix
1	inflectional word with common suffix
0	Does not show morphological knowledge.

Table 12.8: ROAR-Morphology original construct map

12.5 Study Procedure

Students took the ROAR-Morphology assessment online in their classrooms between April 30 and June 5 in 2024, proctored by teachers or other adults (e.g., reading specialist).

Initially, there was no time limit on the assessment. However, based on feedback from participating schools, 8- and 5- minute time limits were later implemented for some testing sessions. Despite this time constraint, the vast majority of students were still able to complete the full assessment. The assessment took students an average of 6.8 minutes to complete, with the middle 33% of students finishing between 5.7 and 7.4 minutes. While most students completed the assessment within this typical range, completion times varied from 3.9 minutes (5th percentile) to 10.6 minutes (95th percentile). For this study, items were randomly ordered in each session.Future versions will implement computer-adaptive testing (CAT) to further improve assessment efficiency, which is expected to reduce average completion times while maintaining or enhancing measurement precision. Students answered an average of 75% of items correctly, with individual performance ranging from 7% to 100% correct responses.