1  ROAR Vision and Mission

ROAR emerged out of more than a decade of research in the Brain Development & Education Lab on the neurobiological foundations of literacy. Our goal was to leverage the extensive literature on the cognitive neuroscience of reading development to develop a completely automated, lightly gamified online assessment platform that could replace the resource-intensive and time-consuming conventional approach of individually administering assessments that are scored based on verbal responses. In other words, we endeavored to create a platform that could assess an entire school district in the time typically required to administer an assessment to a single student. We envisioned a new approach to research and development, grounded in the principles of open-science, where each ROAR measure would be grounded in the extensive interdisciplinary literature on reading development, be validated adhering to the highest standards of rigor in each discipline, and be published in open-access journals to support scientific transparency.

1.1 Open-source ideology in educational assessment

The last decade has seen a revolution in scientific transparency. The open-science movement began as a grassroots movement to make science more transparent, accessible and reproducible through the open sharing of code and data to accompany publications in open-access journals. The success of the open-science movement can be appreciated in new public mandates for data sharing by many of the major scientific funders in the United States and Europe, as well as proliferation of organizations like the Center for Open Science, and preprint servers like bioRxiv, that all make it easier to document, share and reproduce scientific research. In fields like cognitive neuroscience, it is now standard practice for software and algorithms to be open-source, and many journals even require various open-science practices. However, in education, most widely used assessments are grounded in proprietary products, with many of the technical details guarded by paywalls or made purposefully opaque to maintain a competitive edge in the market. There are, of course, counterexamples like DIBELS that have always maintained open-access printed materials, and with projects like the Item Response Warehouse which provides open access to many educational datasets, there is a clear desire among many educational researchers for a move toward open science.

We launched ROAR with the mission to bring the open-source ideology to educational assessment. Our lab has a long track record of developing and supporting open-source software for analysis and sharing of brain imaging data, and for modeling the interplay between brain development and learning. ROAR represents the next phase of this open-science mission: to build tools that fill the needs of educators to assess reading development while, simultaneously, opening the door to research at an unprecedented scale. Not every aspect of ROAR is completely open, but we consciously prioritize open science at every stage of development including this technical manual which is written as an open-source quarto book.

1.2 Approach to validation

Each ROAR measure is rigorously validated both in an academic research setting (i.e., “in the lab”) as well as in a typical school setting (i.e., “in the classroom”). We take both these approaches to validation to ensure that ROAR meets the highest standards of rigor across applications in research and practice. Lab validation studies involve recruiting research participants through the typical recruitment avenues of the Brain Development & Education Lab and involve validating new ROAR measures against “gold standard” individually administered diagnostic assessments that are widely accepted by reading and dyslexia researchers. School validation studies are conducted through a Research Practice Partnership model in collaboration with school districts to ensure that ROAR is valid for the desired use cases in the school. Since the question for a school is often “how does ROAR relate to our standard of practice”, we report both a) validation of ROAR measures against the current assessments that are used in standard practice in our collaborating schools and b) validation of ROAR measures against validation measures administered by the ROAR research team to students in the district. Together these two approaches to validation have allowed us to extensively examine the accuracy and precision of ROAR relative to a) the constructs it was designed to measure and b) other related measures that are widely used across the United States.

1.3 The ROAR Assessment Suite

ROAR consists of a collection of measures, each designed to tap into a critical aspect of reading. Each individual measure can be run independently and returns raw scores, standard scores, and percentiles relative to national norms. Additionally, measures are also grouped into measurement suites that comprehensively evaluate different constructs in reading development, and produce composite scores and risk indices.

1.4 Silent and Automated ROAR—Theoretical and Practical Considerations

Historically, foundational reading skill assessment has comprised one-on-one evaluations that rely heavily on verbal responses and oral reading. ROAR, on the other hand, is an automated assessment delivered in a group setting, providing efficient screening, and eliminating barriers to assessment. Despite the name – “ROAR” – all the measures in the ROAR Foundational Reading Skills Suite, and most ROAR assessments beyond the Suite, are designed and validated as silent assessments with no verbal responses. This allows the assessment to be automated, scored in realtime, and administered in a group setting. Two exceptions to this convention are ROAR-RAN (Section 9.2.1) and ROAR-Phonics, which are not intended to be administered in the group setting. This approach is grounded in both practice and theory.

Even though reading is conventionally assessed through reading out loud, most reading is done silently. Even in the case of single-word reading or decoding, what we really care about is whether the student can read the word accurately in their mind, not whether they can properly pronounce the word. For example, students with speech impediments might struggle to correctly articulate certain phonemes (e.g., /s/ vs /th/) even though they can accurately read the words. Differences in articulation pose challenges for scoring verbal measures. Moreover, students around the United States speak many different language variations (Washington and Seidenberg 2021; Brown et al. 2015). The scoring manuals for widely-used reading assessments often require students to pronounce words with a specific variety of spoken English (e.g. “General American English”), which creates concerning racial biases in the measures. Even in cases where scoring manuals allow for language variation, it can be a real challenge to a teacher to decide whether different pronunciations of a word represent a language varieties that they may be less familiar with, or reading errors. Washington and Seidenberg (2021) highlight how these assessment issues continue to stigmatize students who speak African American English and other varieties of spoken English. We believe that carefully designed and rigorously validated silent reading assessments have the potential to overcome these historic biases in assessment. Each ROAR measure has been carefully validated against conventional assessments that are scored based on verbal responses, and this technical manual reports the compendium of validation studies against a variety of different outcome measures.

Occasionally we are asked, isn’t it important for teachers to listen to their students read aloud? Our vision is not to reduce the amount of time teachers spend listening to their students read: there is great value in a teacher gaining a deep and qualitative perspective on their students’ reading. Rather, ROAR intends to lift the burden of assessment so that teachers can spend more time working individually with students on reading skills, providing the instruction and intervention students need, and less time administering and scoring assessments. With assessments done silently in a group administration format, teachers are given more time to focus on teaching rather than assessing. Professional development can be focused on instruction rather than compliance with assessment.

Another common question we receive is, why doesn’t ROAR use speech-recognition software for group-administered oral reading assessments? The promise of speech-recognition is exciting and will have many implications for educational technology. At this time, the cutting-edge of speech-recognition technology is still highly biased against several language groups (Koenecke et al. 2020), and it is only appropriate for very specific use cases in foundational reading skills assessment across a linguistically diverse population (Wu et al. 2019; Hannah, Kim, and Jang 2022). These algorithms will improve quickly over time, but ROAR only currently integrates this technology for one specific use case – Rapid Automatized Naming Section 9.2.1. Importantly, even though improved automated speech recognition algorithms could, in theory, lift the burden of scoring, there are major threats to validity for group administered assessments that involve reading out loud. In a group setting, students would hear the verbal responses of their peers, which would cause a) distractions, b) confusion and, c) input that may influence responses (e.g., hearing another student’s answer). Thus, even as speech recognition algorithms improve, concerns about validity will always remain if verbal assessments are administered in a group setting.

Overall, from a practical standpoint, silent and automated assessments allow for greater scale and efficiency and have much lower demands on resources. Careful design and validation can mitigate worries about bias across different varieties of spoken English. Further, automated assessment and scoring eliminates issues with inter-rater reliability, where different test administrators may score a student differently based on their varying levels of training, experience, and other factors. From a theoretical standpoint, by remaining a silent-reading assessment, ROAR captures the information that informs the screening of dyslexia and foundational reading skills (Chapter 9) in a way that is accurate, reliable, and valid while also having the benefit of being administered in a group setting.

References

Brown, Megan C, Daragh E Sibley, Julie A Washington, Timothy T Rogers, Jan R Edwards, Maryellen C MacDonald, and Mark S Seidenberg. 2015. “Impact of Dialect Use on a Basic Component of Learning to Read.” Front. Psychol. 6 (March): 196.
Hannah, L, H Kim, and E E Jang. 2022. “Investigating the Effects of Task Type and Linguistic Background on Accuracy in Automated Speech Recognition Systems: Implications for Use in Language Assessment of Young Learners.” Lang. Assess. Q. 19 (3): 289–313.
Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. “Racial Disparities in Automated Speech Recognition.” Proc. Natl. Acad. Sci. U. S. A. 117 (14): 7684–89.
Washington, J A, and M S Seidenberg. 2021. “Teaching Reading to African American Children: When Home and School Language Differ.” American Educator.
Wu, Fei, Leibny Paola García-Perera, Daniel Povey, and Sanjeev Khudanpur. 2019. “Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network.” In Interspeech 2019. ISCA: ISCA.