3  ROAR Scores and Norms

ROAR has been developed through a collaborative, co-design process with schools around the United States. The development, validation, score-reporting, and norming samples reflect the contributions of hundreds of schools that have collaborated with the ROAR team through a Research Practice Partnership (RPP) model (Laura Wentworth et al. 2023; L. Wentworth et al. 2021). The goal of this model is to make sure that the diverse interests of stakeholders (teachers, students, parents, school administrators, etc.) representing the incredible diversity in the U.S. education system have a voice in guiding the research and development process of the tools used in their schools. Section 3.2 shows the distribution of ROAR partner schools around the United States.

3.1 ROAR Scores

The ROAR assessment score reports include different types of scores that each have intended use cases. The ROAR Next Steps Guide provides detailed descriptions of how to interpret scores and use them to guide instruction and/or intervention. The ROAR Family Guide provides an explanation of scores and a guide for support at home, designed for parents and guardians.

One important consideration for interpreting scores on any assessment is participant effort, concentration, and engagement. A score is only an accurate representative of the participant’s ability level if the participant is engaged and tries their hardest even as the assessment gets difficult. For assessments that are individually administered (e.g. by a teacher), the administrator might get a qualitative impression of the participant’s effort and focus. If the same person is administering the assessment and interpreting the scores (e.g. classroom teacher or reading specialist), this qualitative impression can be helpful. However it can also be a source of bias and it is hard to standardize the criteria for judging engagement. For automated, online assessments like ROAR, participant disengagement might be a particular concern and needs to be considered when interpreting scores. Each ROAR measure has a defined criteria, grounded in research, for identifying disengaged participants and flagging unreliable scores (for example see Section 16.2 and Section 17.2). This criteria is defined in an algorithm that takes into account a) the response time distribution and b) pattern of responses on the assessment. Scores for any participant that are flagged for disengagement or other issues that might affect the interpretation of the score are flagged in the ROAR Score Report. Figure 3.5 shows an example ROAR Score Table indicating which students need support in Phonological Awareness, Single Word Reading and Sentence Reading Efficiency and disengaged participants with unreliable data are flagged with grayed out scores (dialog box provides additional information).

Types of Scores

  • Raw Scores and Scaled Scores: A raw score is the basic measure of a student’s performance on the test. A scaled score puts the raw score on an interval scale to make it more interpretable. Each assessment reports a raw score and the scoring rules for that raw score are detailed in the introduction to that assessment (for example, see Section 5.3 for the scoring rules for ROAR-Word). Most of the assessments use item response theory (IRT; see Section 4.2) and computer adaptive testing (CAT; see Chapter 4) for scoring, though some timed assessments like ROAR-Sentence (see Section 6.4) use other types of scoring models. All assessments that use an IRT model for scoring report Scaled Scores. Scaled Scores are comparable across grades and over time and are useful for tracking growth in reading skills (e.g., response to intervention (RTI)).

  • Percentile Scores: The percentile refers to a student’s rank within their grade level on the given skill. The percentile is the number of students out of 100 who have lower scores. Percentile scores are computed by comparing raw scores to a norming table. A norming table captures the distribution of scores for each age bin in a lookup table providing the percentiles associated with each raw score. Percentile Scores are useful for identifying students who are struggling relative to their peers (or relative to national norms). Figure 3.6 (a) shows a screenshot from ROAR Score Reports displaying Percentile Scores. The norming table for percentile scores are computed using Generalized Additive Models for Location, Scale and Shape (GAMLSS) to model raw ROAR test scores and to predict corresponding age-based percentiles. Section 3.2 shows the participating ROAR schools, Section 3.3 presents school characteristics, Table 3.3 show student characteristics, and Section 3.4 goes into depth on the modeling methodology.

  • Standard Scores: A standard score is a way of showing how performance compares to other kids of the same age or grade. The standard score is comparable within a grade level, but not across grade levels or over time. Figure 3.6 (b) shows a screenshot from ROAR Score Reports displaying Standard Scores. Age standardized scores for ROAR-Word put scores for each age bin on a standard scale (normal distribution, \(\mu=100\), \(\sigma=15\), see Figure 3.3) and are computed using Generalized Additive Models for Location, Scale and Shape (GAMLSS) to output age-based standard percentiles which can be converted to z-scores. (see Section 3.4 for more information).

The ROAR norming table which comprises raw scores, scaled scores (for IRT- and CAT-based assessments), age-based standard scores, and age-based percentiles were compared to the linking of ROAR scores to criterion measures such as WJ BRS Standard Scores and Percentiles in the case of ROAR-Word (Section 24.1.1) and CTOPP Standard Scores in the case of ROAR-Phoneme (Chapter 26). This linking allowed ROAR scores to be interpreted with direct reference to the criterion measure that is often used to define dyslexia risk and to validate the GAMLSS created norms (Figure 3.9).

  • Support Categories: For each measure, ROAR recommends students who are in need of extra support. Support categories can also be interpreted as indicating risk of reading difficulties such as dyslexia (for more information see Chapter 29 and Chapter 10). Dyslexia refers to the lower end of a continuum of reading skills and there is no agreed upon cutoff.

For students in Kindergarten through 5th grade, percentiles are used to determine ROAR Support Categories. The 20th percentile based on national norms is a common cutpoint that is used to indicate students who are in need of additional support and that is the cut point implemented in ROAR Support Categories. Students below the 20th percentile are recommended for additional support. Students between the 20th and 40th percentile are indicated that the skill is still developing. Students above the 40th percentile are indicated as being at grade level (achieved skill).

For students in 6th through 12th grade, support categories are determined relative to the “Decoding Threshold”, or specific cut-points that represent a meaningful bottleneck for reading development (see Section 33.2.3 and Chapter 34 for more information on the Decoding Threshold). The cut-points determined by the decoding threshold also correspond to 3rd grade and 5th grade median scores (50th percentile).

The “Decoding Threshold Hypothesis” posits that students below a certain proficiency level in Foundational Reading Skills will experience stagnant comprehension growth as they struggle with basic word recognition (Figure 34.1, Figure 34.2). 3rd grade and 5th grade median scores were determined to match the transition zones The Decoding Threshold. Students in grades 6th through 12th grade that score below the 3rd grade median are recommended for additional support. Students in these grades that score between the 3rd and 5th grade median are indicated that the skill is still developing. 6th through 12th grade students who score above the 5th grade median are indicated as having achieved the skill.

The support categories described can be seen in Figure 3.4 which shows the distribution of support categories from a hypothetical school district in the ROAR Score Report and Figure 3.5 shows a ROAR Score Table indicating which participants need support in which skills.

(a) ROAR Score Report displaying Percentile Scores on Risk Categories
(b) ROAR Score Report displaying Standard Scores on Risk Categories
Figure 3.6: ROAR Score Reports can toggle between different display formats, overlaying Percentile Scores or Standard Scores on Risk Categories.

3.2 Map of ROAR Partners

ROAR is being used by 698 schools and community based organizations across 36 states around the United States. See the map of ROAR partners in each state. This representative sample of the U.S. education system enables both a) research that is representative of the incredible diversity of educational experiences around the U.S. and b) norming samples for each ROAR assessment.

Figure 3.7: Map of collaborating ROAR partners in the United States.

3.3 Table of school characteristics

ROAR is designed to serve schools nationwide. Through our research-practice partnership model, we seek to work with a broad scope of schools and community organizations representing the diversity of students across the country. Table 3.1 describes the schools and student populations that are represented in our norming sample. To compute nationally representative norms, we weight the contribution of the of each school’s data to better represent the typical school in the United States. Table 3.2 shows the weighted student population that was used in our norming models (see Section 3.4).

  Overall
(N=94)
School Level
Elementary 41 (43.6%)
Secondary 32 (34.0%)
Combined 13 (13.8%)
Missing 8 (8.5%)
School Type
Public 82 (87.2%)
Charter 5 (5.3%)
Private 7 (7.4%)
Locale
City 57 (60.6%)
Rural 9 (9.6%)
Suburb 28 (29.8%)
Town 0 (0%)
% Free or Reduced Meals
Mean (SD) 56.4 (85.6)
Median [Min, Max] 49.2 [0, 731]
Missing 19 (20.2%)
% Hispanic
Mean (SD) 40.3 (30.2)
Median [Min, Max] 38.3 [1.13, 98.2]
% American Indian/Alaskan Native
Mean (SD) 0.276 (0.507)
Median [Min, Max] 0.0245 [0, 3.12]
% Asian and Pacific Islander
Mean (SD) 12.8 (17.9)
Median [Min, Max] 2.63 [0, 86.6]
% Black
Mean (SD) 16.1 (23.5)
Median [Min, Max] 2.99 [0, 93.5]
% White
Mean (SD) 23.8 (25.0)
Median [Min, Max] 13.9 [0, 97.4]
% Multi-Racial
Mean (SD) 5.87 (5.97)
Median [Min, Max] 3.89 [0, 28.6]
Table 3.1: School Characteristics and Demographics
  Overall
(N=94)
School Level
Elementary 41 (43.6%)
Secondary 32 (34.0%)
Combined 13 (13.8%)
Missing 8 (8.5%)
School Type
Public 82 (87.2%)
Charter 5 (5.3%)
Private 7 (7.4%)
Locale
City 57 (60.6%)
Rural 9 (9.6%)
Suburb 28 (29.8%)
Town 0 (0%)
% Free or Reduced Meals
Mean (SD) 52.7 (87.0)
Median [Min, Max] 48.4 [0, 731]
Missing 19 (20.2%)
% Hispanic
Mean (SD) 37.3 (32.4)
Median [Min, Max] 34.0 [0.128, 98.2]
% American Indian/Alaskan Native
Mean (SD) 0.261 (0.510)
Median [Min, Max] 0.00367 [0, 3.12]
% Asian and Pacific Islander
Mean (SD) 8.57 (15.4)
Median [Min, Max] 1.60 [0, 86.6]
% Black
Mean (SD) 13.7 (21.1)
Median [Min, Max] 2.85 [0, 92.9]
% White
Mean (SD) 19.4 (25.2)
Median [Min, Max] 7.81 [0, 97.4]
% Multi-Racial
Mean (SD) 3.84 (4.47)
Median [Min, Max] 2.16 [0, 28.6]
Table 3.2: Weighted School Characteristics and Demographics

For several analyses throughout the ROAR Technical Manual, we take into account individual student demographics. Table 3.3 shows descriptive statistics for the group of students for whom we currently have individual demographics. This group represents about half of our overall sample. While many of our partnering schools share demographic data with us, some do not. Others share race/ethnicity but not other important demographics such as home language, English learner status, and indicators of socioeconomic status such as free and reduced meal eligibility. Depending on the analysis, we will use a subset of this data so that we can achieve a representative sample. Where applicable, this is clarified throughout the ROAR Technical Manual, and updated sample demographic tables are provided.

Table 3.4 shows descriptive statistics for the group of students for whom we currently have individual demographics that are specifically in the norming sample. This group is a small portion of our overall sample of students with demographics and an even smaller portion of our overall students who have taken ROAR. Depending on the norms analysis, we used a subset of the larger demographic data to achieve a representative sample.

N % % Missing
Female 460352 44.82 6.83
Free or Reduced Lunch 1817 0.18 99.53
English Language Learner 140634 13.69 67.86
Special Education 196725 19.15 80.36
Race/Ethnicity
Hispanic Ethnicity 34350 3.34 87.30
White 43790 4.26 91.26
Black or African American 20650 2.01 91.26
Asian 8910 0.87 91.26
American Indian or Alaska Native 2182 0.21 91.26
Hawaiian or Other Pacific Islander 352 0.03 91.26
Multiracial 13820 1.35 91.26
Total 1027193
Table 3.3: All Individual Student Demographics
N % % Missing
Female 5102 43.70 12.08
Free or Reduced Lunch 188 1.61 89.13
English Language Learner 209 1.79 88.40
Special Education 80 0.69 88.55
Race/Ethnicity
Hispanic Ethnicity 2173 18.61 10.78
White 3229 27.66 19.62
Black or African American 1409 12.07 19.62
Asian 1014 8.69 19.62
American Indian or Alaska Native 72 0.62 19.62
Hawaiian or Other Pacific Islander 47 0.40 19.62
Multiracial 3613 30.95 19.62
Total 11674
Table 3.4: Norms Sample Individual Student Demographics

3.4 Norming model for ROAR Foundational Reading Skills Suite

To generate age-referenced norms for ROAR we model raw test scores with Generalized Additive Models for Location, Scale and Shape (GAMLSS) implemented in the gamlss R packagem(Rigby and Stasinopoulos 2005). GAMLSS extends traditional regression by estimating not only the mean (μ) of the score distribution, but also parameters characterizing the shape of the distribution such as its scale (σ), skewness (ν) and kurtosis (τ) as smooth functions of age, thereby capturing the heteroscedasticity and non-normal shape typically seen in developmental data. GAMLSS is the current approach recommended by the World Health Organization to estimate non-linear growth trajectories (Borghi et al. 2006) and has been growing in popularity for applications spanning brain development (Bethlehem et al. 2022), cognitive development (Timmerman, Voncken, and Albers 2021), and clinical applications (Zhang et al. 2018). We selected the Box–Cox-t (BCT) family because it accommodates positive/negative skew and heavier tails while preserving scores’ natural lower bound at zero (Rigby and Stasinopoulos 2006). Penalized B-splines (P-splines) with automatic smoothing selection are used for μ and σ, with σ allowed to vary by age so that the spread of scores can widen or narrow across development. Once the model is fitted, centile curves (e.g., 5th, 25th, 50th, 75th, 95th) are obtained by inverting the fitted BCT distribution at each age point, and individual scores are transformed to age-adjusted Z-scores or percentile ranks by locating them on these curves—yielding a continuous, smoothly varying normative reference across the entire age range assessed by ROAR. Figure 3.8 shows percentile curves from the norming model fit to ROAR-Word data.

Figure 3.8: Percentile curves derived from the norming model fit to ROAR-Word data. The 2D histogram displays the count of ROAR-Word scores as a function of age. Colored curves show percentiles fitted to the data based on a GAMLSS model with a Box–Cox-t (BCT) distribution.

To validate that: - our norming sample is nationally representative and similar to other widely accepted standardized tests - the continuous norming model fit to ROAR-Word data based on a Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a Box–Cox-t distribution performs similar to conventional approaches.

We compare the percentiles computed under the GAMLSS model to those derived by linking ROAR-Word scores to Woodcock Johnson percentiles. Figure 3.9 shows the similarity of the two norming approaches.

Figure 3.9: Percentiles computed based on ROAR norms are compared to percentiles computed based on linking ROAR-Word to Woodcock Johnson Basic Reading Skills percentiles.

The norms and additional scoring details are provided in the introduciton to each measure within ROAR Foundational Reading Skills: - For ROAR-Word see: Chapter 5 - For ROAR-Sentence see: Chapter 6 - For ROAR-Letter see: Chapter 8 - For ROAR-Phoneme see: Chapter 7

References

Bethlehem, R A I, J Seidlitz, S R White, J W Vogel, K M Anderson, C Adamson, S Adler, et al. 2022. “Brain Charts for the Human Lifespan.” Nature 604 (7906): 525–33.
Borghi, Elaine, Mercedes de Onis, Cutberto Garza, Jan Van den Broeck, Edward A Frongillo, Laurence Grummer-Strawn, S Van Buuren, et al. 2006. “Construction of the World Health Organization Child Growth Standards: Selection of Methods for Attained Growth Curves.” Statistics in Medicine 25 (2): 247–65.
Rigby, Robert A, and D Mikis Stasinopoulos. 2005. “Generalized Additive Models for Location, Scale and Shape.” Journal of the Royal Statistical Society Series C: Applied Statistics 54 (3): 507–54.
———. 2006. “Using the Box-Cox t Distribution in GAMLSS to Model Skewness and Kurtosis.” Statistical Modelling 6 (3): 209–29.
Timmerman, Marieke E, Lieke Voncken, and Casper J Albers. 2021. “A Tutorial on Regression-Based Norming of Psychological Tests with GAMLSS.” Psychol. Methods 26 (3): 357–73.
Wentworth, Laura, Paula Arce-Trigatti, Carrie Conaway, and Samantha Shewchuk. 2023. Brokering in Education Research-Practice Partnerships: A Guide for Education Professionals and Researchers. Taylor & Francis.
Wentworth, L, R Khanna, M Nayfack, and D Schwartz. 2021. “Closing the Research-Practice Gap in Education.” Stanford Social Innovation Review 19 (2): 57–58.
Zhang, Jingzhou, Xiao Hu, Xinlun Tian, and Kai-Feng Xu. 2018. “Global Lung Function Initiative 2012 Reference Values for Spirometry in Asian Americans.” BMC Pulm. Med. 18 (1): 95.