14  ROAR-Morphology External Validity Results

14.1 External Validity

For concurrent and discriminant validity, ROAR-Morphology ability estimates were correlated with ELA and Math scores from the statewide standardized assessments (SBAC, Smarter Balanced Assessment Consortium) as well as with two other ROAR measures: ROAR-Word and ROAR-Sentence. A subsample of 296 students with complete data was used for this set of analyses. Students took the SBAC assessments as well as the three ROAR measures around the same time period in the spring of 2024.

SBAC data were available only for students in grades 3-5, as these assessments are not administered to second-grade students so no second graders were included in this analysis. Additionally, grade 3-5 students who were missing any of the five outcome measures (i.e., ROAR-Morphology, ROAR-Sentence, ROAR-Word, SBAC-ELA, and SBAC-Math) were excluded.

We expected ROAR-Morphology to correlate moderately with SBAC-ELA, ROAR-Word, and ROAR-Sentence measures, all of which are designed to capture aspects of reading ability. For discriminant validity, a lower correlation was expected between ROAR-Morphology and SBAC-Math than with SBAC-ELA. However, some correlation with math scores was expected given that mathematical assessments rely on complex word problems and academic vocabulary that may require morphological processing skills (Schleppegrell 2007; O’Halloran 2015). Additionally, both assessments may share variance related to general academic language proficiency and test-taking skills.

14.1.1 Relationships with SBAC & Other ROAR Assessments

Figure 14.1 below shows the bivariate relationship between ROAR-Morphology, on the x-axis, and one of the other assessments, namely, ROAR-Word, ROAR-Sentence, SBAC-ELA or SBAC-Math scores, on the y-axis. As expected, ROAR-Morphology showed moderate correlations with other ROAR measures and SBAC-ELA scores (r = 0.47-0.63) and ROAR-word (r = 0.63). This finding provides evidence for concurrent validity, demonstrating that ROAR-Morphology measures skills that are meaningfully related to broader reading achievement as assessed by the state ELA standardized test and other reading-related measures.

As expected, the correlation with SBAC-Math scores was slightly lower (r = 0.47), yielding discriminatory validity evidence. This pattern suggests that while morphological knowledge contributes to performance across academic domains, it is more specifically related to reading and language arts achievement than to mathematical reasoning. However, the moderate correlation with mathematics is consistent with research showing that mathematical assessments incorporate complex academic language and word problems that draw upon morphological knowledge (Schleppegrell 2007). This finding aligns with the broader understanding that morphological awareness supports academic language comprehension across content areas, not just traditional language arts contexts (Nagy, Carlisle, and Goodwin 2013).

Figure 14.1: Scatter Plots between ROAR Morphology logit scores and SBAC- scale scores, ROAR-Word, and ROAR-Sentence (n = 306 students)

?fig-corr-sbac-roar below shows a correlation matrix among the five assessments. The first column reveals that ROAR-Morphology correlated most strongly with SBAC-ELA and ROAR-Word, followed by ROAR-Sentence. The pattern of correlations supports the theoretical understanding that morphological knowledge operates as part of an integrated reading system while maintaining its unique contribution to comprehension.

Figure 14.2: Correlations among SBAC-ELA, SBAC-Math, and the three ROAR-measures (Morphology, Word, and Sentence)
Figure 14.3: Correlations among SBAC-ELA, SBAC-Math, and the three ROAR-measures (Morphology, Word, and Sentence)

14.1.2 Predictive Power of ROAR-Morphology

To determine whether ROAR-Morphology has explanatory power beyond the other two ROAR measures in predicting SBAC-ELA scores, we conducted multiple linear regression analyses. Four models were run, all predicting SBAC-ELA scale scores with various combinations of ROAR measures, using student grade level as a control variable (with grade 3 as the reference category).

The results, shown in Table 14.1, demonstrate that all three ROAR measures have unique and statistically significant effects on SBAC-ELA scores after controlling for other predictors in the models. This is indicated by the positive coefficient estimates and associated p-values shown in the M3 column. The explanatory power of ROAR-Morphology in predicting SBAC-ELA scores increased by six percentage points when it was added to a model already containing the other ROAR measures (adjusted R² increased from 0.46in M2 to 0.52 in M3), and this increase was statistically significant (F(290, 291) = 37.03, p < 0.001). These results suggest that ROAR Morphology captures unique aspects of reading ability that contribute to comprehension beyond what is measured by word-level and sentence-level assessments.

  M0 M1 M2 M3
Predictors Estimates p Estimates p Estimates p Estimates p
Intercept 2516.47 <0.001 2514.14 <0.001 2516.11 <0.001 2520.64 <0.001
grade 4 -10.11 0.443 2.82 0.790 -5.21 0.595 -9.12 0.326
grade 5 19.96 0.080 18.83 0.039 17.96 0.033 6.85 0.399
ROAR-Word (z-score) 51.98 <0.001 31.55 <0.001 16.74 0.001
ROAR-Sentence (z-score) 33.49 <0.001 28.20 <0.001
ROAR-Morphology (z-score) 28.54 <0.001
Observations 296 296 296 296
R2 / R2 adjusted 0.018 / 0.011 0.376 / 0.370 0.471 / 0.464 0.531 / 0.523
Table 14.1: Results from the Multiple Regression Models, Predicting SBAC-ELA scores, with three ROAR-measures and grade level (ref = grade 3)

14.2 Fairness

Fairness is a critical aspect of test development, particularly for measures like ROAR-Morphology that may be sensitive to linguistic and cultural variation. English learners may develop morphological awareness through different pathways than native English speakers, potentially affecting their performance on specific item types or features. We conducted Differential Item Functioning (DIF) analyses comparing students whose primary language is English to those whose primary language is not English. Primary language had been determined locally, often with a home language survey, and was available through the district administrative records.

Target Word Difficulty Difference DIF Category Direction
freed 0.80 C (moderate to large) Harder for non-English
collection 0.52 B (slight to moderate) Harder for non-English
countless -0.58 B (slight to moderate) Harder for English
editor -0.60 B (slight to moderate) Harder for English
Note:
n = 667 students (58.2% English primary language, 41.8% non-English primary language). Positive values indicate greater difficulty for non-English speakers; negative values indicate greater difficulty for English speakers.
Table 14.2: Differential Item Functioning Analysis Between English and Non-English Language Groups

14.2.1 Differential Item Functioning Analysis

Method and Sample Using a sub-sample of students whose primary language was known (n = 667), we examined evidence of uniform differential item functioning between students whose primary language is English (58.2%) and those whose primary language is non-English (41.8%). The Extended Rasch Models (“eRm”) package (Mair et al. 2025) was used for this analysis.

Results As can be seen in Table 14.2, four items showed slight (category B) to moderate DIF (category C), according to the ETS DIF criteria (Zwick, Thayer, and Lewis 1999).

Items More Difficult for Non-English Speakers:

  • Freed (0.80 logits difference): Requires knowledge of past tense formation for verbs ending in ‘ee’.

  • Collection (0.52 logits difference): Involves derivational transformation using ‘-ion’ with orthographic-phonological complexity.

Items More Difficult for English Speakers:

  • Editor (-0.58 logits difference): Uses agentive suffix “-or” rather than more frequent “-er”.

  • Countless (-0.60 logits difference): Combines abstract meaning with semantic complexity.

14.2.1.1 Implications

The bidirectional DIF pattern suggests the assessment does not systematically disadvantage either language group, though individual items function differently. The relatively small number of DIF items (10% of total) supports overall fairness while highlighting specific patterns that inform score interpretation and future item development.

15 Discussion

15.1 Reliability & Validity Evidence

The ROAR-Morphology assessment demonstrates strong psychometric properties. With a person separation reliability of 0.85 and EAP reliability of 0.89, the measure reliably differentiates levels of morphological knowledge among students. These values exceed common thresholds for educational assessments, indicating consistent measurement of the intended construct.

Validity is supported by multiple sources. The Wright map confirmed the hypothesized construct structure: items requiring derivational shifts with less common suffixes were more difficult than those with common suffixes, which in turn were harder than inflectional items. This ordering aligns with theoretical models of morphological development (Berko 1958; Carlisle 2000; Carlisle and Nomanbhoy 1993; Deacon and Kirby 2004). Additionally, item difficulty was systematically influenced by target word frequency and the number of derivational distractors, validating the construct map.

External validity evidence is also strong. ROAR-Morphology correlated more highly with SBAC-ELA and Roar-Word (r = 0.63) than with ROAR-Sentence (r = 0.5) and SBAC-Math (r = 0.47), supporting both convergent and discriminant validity. Importantly, regression analyses showed that ROAR-Morphology uniquely explained an additional 6 % of variance in SBAC-ELA scores, highlighting its distinct contribution to reading comprehension beyond word and sentence-level skills. This finding supports theoretical models that position morphological knowledge as a critical and separable component of the reading system.

15.2 Significance of the Assessment

ROAR-Morphology fills a critical gap by offering a theoretically grounded, empirically validated measure of morphological knowledge suitable for classroom use. Unlike assessments that treat morphology as a unidimensional construct or require individual administration, ROAR-Morphology varies both word type (inflectional vs. derivational) and suffix frequency (common vs. less common) to capture developmental progression. The validation study supported this structure, and updated construct maps (Figure 13.5) provide clear descriptions of developmental levels.

The assessment’s design offers several advantages. First, its sentence-context format reflects how students encounter morphologically complex words during actual reading, unlike isolated word tasks such as morpheme segmentation, analogies, or production tasks (Carlisle 2000; Kirby et al. 2012). Second, systematic variation in item features allows educators to target specific aspects of morphological knowledge. Third, its computer-based format supports efficient, whole-class administration without requiring specialized training.

Preliminary findings suggest that ROAR-Morphology captures morphological knowledge that contributes uniquely to reading comprehension, in line with models that position morphology as a bridge between word- and text-level processes (Perfetti and Stafura 2013; Kim 2020; Levesque, Breadmore, and Deacon 2020). While further validation is needed, these results support its inclusion in broader comprehension assessment systems and instructional planning.

Moreover, the assessment’s strong correlation with reading comprehension, beyond word recognition and sentence reading skills, highlights morphology’s distinct role and reinforces its value in comprehensive literacy assessment and instruction.

ROAR-Morphology provides:

  • Whole-class, 10-minute administration

  • Sentence-context items reflecting real reading

  • Systematic variation in item features

  • Theory-based and empirically-validated developmental waypoints for meaningful interpretation

15.3 Limitations

Despite strong psychometric results, several limitations should be noted. The calibration sample (n = 717) was drawn entirely from Northern California, limiting generalizability. Compared to national demographics, the sample included more Asian students (33.7% vs. 5.4%) and English Learners (15% vs. 10.6%), and fewer White (28.6% vs. 44.6%) and Black students (0.4% vs. 14.9%). Broader validation is needed with nationally representative samples.

The current item pool, though sufficient for calibration, lacks coverage at the extreme ends of the ability distribution. Wright map analysis indicates that additional items of varying difficulty would improve precision and instructional utility across a wider range of learners.

While differential item functioning (DIF) by primary language was examined, additional fairness and measurement invariance analyses are needed. Four out of 40 items (10%) showed DIF between students whose primary language is English and students whose primary language is not English, pointing to the need for careful linguistic review in future development. Further research should examine interactions between item features and variables such as socioeconomic status, disability, dialect, and home language.

Finally, because the study was cross-sectional, it cannot speak to how morphological knowledge develops over time. Longitudinal research is needed to evaluate the assessment’s sensitivity to growth and its utility for progress monitoring.

15.3.1 Future Directions

Several initiatives are underway to address current limitations. National validation studies will improve generalizability and enable norm development. The item pool will be expanded to cover a wider range of difficulty while maintaining design principles. Comprehensive fairness analyses will assess performance across diverse subgroups.

Longitudinal studies will explore how students progress through developmental levels, assess the tool’s sensitivity to growth and interventions, and clarify the relationship between morphological knowledge and broader reading outcomes. Instructional applications will also be explored, including how best to interpret results, identify instructional targets, and monitor student progress.

15.3.2 Conclusion

ROAR-Morphology represents a significant advance in the assessment of morphological knowledge. Its strong psychometric properties, theoretical grounding, and practical design make it a valuable addition to comprehensive reading assessment systems. While continued research is needed, current findings support its use for for educational assessment and research purpose. As validation continues, the assessment will evolve to better serve educators and students in understanding and supporting morphological development.

References