| Grade | Main Sample (n = 717) | External Validity (n = 296) | DIF Sample (n = 667) |
|---|---|---|---|
| 2 | 43.8% | NA | 42.0% |
| 3 | 19.9% | 45.6% | 19.3% |
| 4 | 17.4% | 20.9% | 18.4% |
| 5 | 18.8% | 33.4% | 20.2% |
| Total | 100.0% | 100.0% | 100.0% |
| Note: No second graders were included in the External Validity sample as they do not participate in the state standardized tests. |
12 ROAR-Morphology Study Description
In Spring 2024, we conducted a study to validate the measurement approach for ROAR-Morphology and examine its technical qualities. The study employed a cross-sectional design with a total of 717 students in grades 2-5 from 4 school districts in Northern California and included 40 items.
12.1 Study Components
The study was comprised of the following three parts:
Main (Calibration) Analysis (an overall sample, N = 717): The full sample used to established measurement properties including reliability estimates, item analysis, construct validity through the WrightMap
External Validity Analysis (a sub sample, n = 296): Of the 717 students, we obtained data from 296 students on the following additional measures: SBAC-ELA, SBAC-Math, ROAR-Word and ROAR-Sentence. This subset was used for external validity analysis.
Fairness Analysis (a subsample, n = 667): Of the 717 students, we obtained information on primary language via district administered records for 667 students. This sub sample (English, 58.2%, vs. non-English, 41.8%) was used to investigate differential item functioning.
12.2 Student Sample
The study began with 745 initial participants across grades 2-5, ultimately resulting in the main calibration sample of 717 students, representing a retention rate of 96.24%. Parental opt out, absence, not completing the assessment were the main reasons for attrition. The minimal attrition rate provides strong evidence for the feasibility of ROAR-Morphology for large-scale administration across diverse students.
12.2.1 Grade level composition
Table 12.1 shows the grade level compositions of the main sample as well as the two subsamples.
12.2.2 Demographic characteristics
Table 12.2, Table 12.3 and Table 12.4 below show the three samples by race/ethnicity, EL-status, and primary language, respectively. These tables show notable cultural and linguistic diversity with ~15% English Learners and ~40% identifying a language other than English as primary language, reflecting the Northern California regional context. Table 12.5 shows that 2% of the sample qualified for 504 plans.
| Race/Ethnicity | Main Sample (n = 717) | External Validity (n = 296) | DIF Sample (n = 667) |
|---|---|---|---|
| American Indian/Alaska Native | 0.8% | 1.7% | 0.9% |
| Asian | 31.8% | 36.5% | 31.6% |
| Black/African American | 0.4% | 0.3% | 0.3% |
| Hispanic/Latino | 17.7% | 14.9% | 19.0% |
| Pacific Islander | 0.1% | 0.3% | 0.1% |
| Two or more races | 16.5% | 11.5% | 16.3% |
| White | 26.9% | 29.7% | 27.0% |
| NA | 5.7% | 5.1% | 4.6% |
| Total | 100.0% | 100.0% | 100.0% |
| EL-status | Main Sample (n = 717) | External Validity (n = 296) | DIF Sample (n = 667) |
|---|---|---|---|
| EL | 14.5% | 12.2% | 15.6% |
| EO | 54.1% | 53.7% | 58.2% |
| IFEP | 17.6% | 17.9% | 18.9% |
| RFEP | 6.8% | 11.8% | 7.3% |
| NA | 7.0% | 4.4% | NA |
| Total | 100.0% | 100.0% | 100.0% |
| Primary Language | Main Sample (n = 717) | External Validity (n = 296) | DIF Sample (n = 667) |
|---|---|---|---|
| English | 54.1% | 53.7% | 58.2% |
| non-English | 38.9% | 41.9% | 41.8% |
| NA | 7.0% | 4.4% | NA |
| Total | 100.0% | 100.0% | 100.0% |
| Special Education Status | Main Sample (n = 717) | External Validity (n = 296) | DIF Sample (n = 667) |
|---|---|---|---|
| non-sped | 91.1% | 93.6% | 97.9% |
| sped | 2.0% | 2.0% | 2.1% |
| NA | 7.0% | 4.4% | NA |
| Total | 100.0% | 100.0% | 100.0% |
12.3 Generalizability and Regional Context
Comparison with national public school enrollment data reveals distinct demographic differences in the main calibration sample. Specifically, Table 12.6 shows that Asian and multiracial students are overrepresented in our main calibration sample concentrated in Northern California, while Hispanic/Latino, White, and Black/African American students appear in lower proportions than the national public school enrollment statistics. The representation of American Indian/Alaska Native and Pacific Islander students is low, aligning with national statistics.
| Race/Ethnicity | Main Sample (n = 676) | National* | Difference |
|---|---|---|---|
| Hispanic/Latino | 18.8% | 29.0% | -10.2% |
| White | 28.6% | 44.6% | -16.0% |
| Asian | 33.7% | 5.4% | +28.3% |
| Two or more races | 17.5% | 5.0% | +12.5% |
| Black/African American | 0.4% | 14.9% | -14.5% |
| American Indian/Alaska Native | 0.9% | 0.9% | 0.0% |
| Pacific Islander | 0.1% | 0.4% | -0.3% |
| Note: | |||
| Note. Main sample (n=676) excludes 41 students missing the race/ethnicity information. | |||
| * source: https://nces.ed.gov/programs/coe/indicator/cge/racial-ethnic-enrollment |
The demographic composition provides both strengths and considerations for generalizability. The sample’s substantial linguistic diversity, with ~15% English Learners compared to 10.6% nationally and ~40% speaking a language other than English at home, offers valuable insights into assessment performance across multilingual populations. This linguistic diversity is particularly relevant for morphological assessment, as language background and exposure may influence how students develop morphological knowledge (Nagy, Carlisle, and Goodwin 2013; Ramirez et al. 2010).
12.4 Item Sample
Forty items were retained from 45 administered items. Five items were dropped from the study due to inconsistency with item design: one item had the base word as the correct answer, requiring no morphological shift, while the remaining four items had more than one correct answer syntactically and semantically. The retained 40 items showed systematic variation across the key design features, as shown in Table 12.7 below.
| N = 40 | |
|---|---|
| Target Word Type | |
| derivational-common | 15 (38%) |
| derivational-less-common | 13 (33%) |
| inflectional | 12 (30%) |
| Derivational distractors | |
| 0 or 1 derivational distractors | 26 (65%) |
| 2-derivational distractors | 14 (35%) |
| Target Word Frequency | |
| Mean (SD) | 2.53 (0.86) |
| Min, Max | 0.85, 4.24 |
| Sentence Syntax | |
| not simple | 6 (15%) |
| simple | 34 (85%) |
| Note. ‘Common’ and ‘less-common’ indicate suffix frequency determined by lists provided by Honig et al. (2000). Target word frequency is the log10 version of frequency norms based on the SUBTLEXus corpus (Brysbaert & New, 2009). Sentence syntax was coded by a researcher with an AI tool. | |
The forty items were designed in accordance with an initial construct map as shown in Table 12.8.
| Waypoint | Students successfully recognize and manipulate morphemes to transform a base word into: |
|---|---|
| 3 | derivational word with less common suffix |
| 2 | derivational word with common suffix |
| 1 | inflectional word with common suffix |
| 0 | Does not show morphological knowledge. |
12.5 Study Procedure
Students took the ROAR-Morphology assessment online in their classrooms between April 30 and June 5 in 2024, proctored by teachers or other adults (e.g., reading specialist).
Initially, there was no time limit on the assessment. However, based on feedback from participating schools, 8- and 5- minute time limits were later implemented for some testing sessions. Despite this time constraint, the vast majority of students were still able to complete the full assessment. The assessment took students an average of 6.8 minutes to complete, with the middle 33% of students finishing between 5.7 and 7.4 minutes. While most students completed the assessment within this typical range, completion times varied from 3.9 minutes (5th percentile) to 10.6 minutes (95th percentile). For this study, items were randomly ordered in each session.Future versions will implement computer-adaptive testing (CAT) to further improve assessment efficiency, which is expected to reduce average completion times while maintaining or enhancing measurement precision. Students answered an average of 75% of items correctly, with individual performance ranging from 7% to 100% correct responses.