1  ROAR Vision and Mission

ROAR emerged out of more than a decade of research in the Brain Development & Education Lab on the neurobiological foundations of literacy. Our goal was to leverage the extensive literature on the cognitive neuroscience of reading development (e.g., Yeatman and White 2021; Yeatman 2022) to develop a completely automated, lightly gamified online assessment platform that could replace the resource-intensive and time-consuming conventional approach of individually administering assessments that are manually scored based on verbal responses. In other words, we endeavored to create a platform that could assess an entire school district in the time typically required to administer an assessment to a single student. We envisioned a new approach to research and development, grounded in the principles of open-science, where each ROAR measure would draw from the extensive interdisciplinary literature on reading development, be validated adhering to the highest standards of rigor in each discipline, and be published in open-access journals to support scientific transparency.

1.1 Open-source ideology in educational assessment

The last decade has seen a revolution in scientific transparency. The open-science movement began as a grassroots movement to make science more transparent, accessible, and reproducible through the open sharing of code and data to accompany publications in open-access journals. The success of the open-science movement can be appreciated in new public mandates for data sharing by many of the major scientific funders in the United States and Europe, as well as proliferation of organizations like the Center for Open Science, and preprint servers like bioRxiv, that all make it easier to distribute and reproduce scientific research. In fields like cognitive neuroscience, it is now standard practice for software and algorithms to be open-source, and many journals even require various open-science practices. However, in education, most widely used assessments are grounded in proprietary products, with many of the technical details guarded by paywalls or made purposefully opaque to maintain a competitive edge in the market. There are, of course, counterexamples like DIBELS that have always maintained open-access printed materials, and with projects like the Item Response Warehouse which provides open access to many educational datasets, there is a clear desire among many educational researchers for a move toward open science.

We launched ROAR with the mission to bring the open-source ideology to educational assessment. Our lab has a long track record of developing and supporting open-source software for analysis and sharing of brain imaging data, and for modeling the interplay between brain development and learning (Yeatman et al. 2018, 2012; Kruper et al. 2021; Keshavan, Yeatman, and Rokem 2019; Mezer et al. 2013; Yeatman, Wandell, and Mezer 2014; Richie-Halford et al. 2022; Richie-Halford, Yeatman, et al. 2021; Richie-Halford, Narayan, et al. 2021). ROAR represents the next phase of this open-science mission: to build tools that fill the needs of educators to assess reading development while, simultaneously, opening the door to research at an unprecedented scale. Not every aspect of ROAR is completely open, but we consciously prioritize open science at every stage of development including this technical manual which is written as an open-source quarto book.

1.2 Mission Statement

ROAR bridges the lab, community, and classroom, aligning academic research to practical challenges in education. Our mission is to inspire a virtuous cycle between research and practice, supporting equity in education through the open dissemination of evidence-based tools that support students and teachers while advancing the frontiers of knowledge through inclusive research at an unprecedented scale. ROAR empowers educators, families, clinicians, and researchers with research-backed assessments to advance learning, accelerate research on learning differences, and foster equitable access to high-quality, data-driven decision-making for all.

1.3 Vision Statement

We envision a virtuous cycle between research and practice in education with deeper, systemic relationships between research and practice to support the diversity of learners in the United States and, eventually, around the world.

1.4 Values

  • Integrity: We uphold rigorous standards of scientific integrity, transparency, and ethics in measurement, data handling, and reporting to support reliable and meaningful insights grounded in data.
  • Equity and Access: We believe that all learners deserve high-quality tools to develop strong academic skills regardless of background or resources. We focus our efforts on historically under-served communities.
  • Collaboration: We succeed by fostering shared growth and innovation between researchers, educators, families, clinicians and students.
  • Innovation: We continuously innovate to develop tools that adapt to the evolving needs of educators and reflect the forefront of academic research.
  • Respect for Privacy and Security: We are committed to protecting privacy and uphold the highest security standards in both research and practice.
  • Openness and Transparency: Whenever possible, we seek to freely and openly disseminate knowledge to benefit the broader community.
  • Inclusivity: ROAR is for everyone. We are committed to creating an environment where everyone feels welcome. We strive to support every organization working to enhance educational outcomes for children, and we invite those who share our mission to join us.

2 ROAR’s Approach

2.1 Approach to validation

Each ROAR measure is rigorously validated both in an academic research setting (i.e., “in the lab”) as well as in a typical school setting (i.e., “in the classroom”). We take both these approaches to validation to ensure that ROAR meets the highest standards of rigor across applications in research and practice. Lab validation studies involve recruiting research participants through the typical recruitment avenues of the Brain Development & Education Lab and involve validating new ROAR measures against “gold standard” individually administered diagnostic assessments that are widely accepted by reading and dyslexia researchers. School validation studies are conducted through a Research Practice Partnership model in collaboration with school districts to ensure that ROAR is valid for the desired use cases in the school. Since the question for a school is often “how does ROAR relate to our standard of practice”, we report both a) validation of ROAR measures against the current assessments that are used in standard practice in our collaborating schools and b) validation of ROAR measures against validation measures administered by the ROAR research team to students in the district. Together these two approaches to validation have allowed us to extensively examine the accuracy and precision of ROAR relative to a) the constructs it was designed to measure and b) other related measures that are widely used across the United States.

2.2 The ROAR Assessment Suite

ROAR consists of a collection of measures, each designed to tap into a critical aspect of reading. Each individual measure can be run independently and returns raw scores, standard scores, and percentiles relative to national norms. Additionally, measures are also grouped into measurement suites that comprehensively evaluate different constructs in reading development, and produce composite scores and risk indices.

2.3 Silent and Automated ROAR—Theoretical and Practical Considerations

Historically, foundational reading skill assessment has comprised one-on-one evaluations that rely heavily on verbal responses and oral reading. ROAR, on the other hand, is an automated assessment delivered in a group setting, providing efficient screening, and eliminating barriers to assessment. Despite the name – “ROAR” – all the measures in the ROAR Foundational Reading Skills Suite, and most ROAR assessments beyond the Suite, are designed and validated as silent assessments with no verbal responses. This allows the assessment to be automated, scored in realtime, and administered in a group setting. Two exceptions to this convention are ROAR-RAN (Section 9.2.1) and ROAR-Phonics, which are not intended to be administered in the group setting. This approach is grounded in both practice and theory.

Even though reading is conventionally assessed through reading out loud, most reading is done silently. Even in the case of single-word reading or decoding, what we really care about is whether the student can read the word accurately in their mind, not whether they can properly pronounce the word. For example, students with speech impediments might struggle to correctly articulate certain phonemes (e.g., /s/ vs /th/) even though they can accurately read the words. Differences in articulation pose challenges for scoring verbal measures. Moreover, students around the United States speak many different language variations (Washington and Seidenberg 2021; Brown et al. 2015). The scoring manuals for widely-used reading assessments often require students to pronounce words with a specific variety of spoken English (e.g. “General American English”), which creates concerning racial biases in the measures. Even in cases where scoring manuals allow for language variation, it can be a real challenge to a teacher to decide whether different pronunciations of a word represent a language varieties that they may be less familiar with, or reading errors. Washington and Seidenberg (2021) highlight how these assessment issues continue to stigmatize students who speak African American English and other varieties of spoken English. We believe that carefully designed and rigorously validated silent reading assessments have the potential to overcome these historic biases in assessment. Each ROAR measure has been carefully validated against conventional assessments that are scored based on verbal responses, and this technical manual reports the compendium of validation studies against a variety of different outcome measures.

Occasionally we are asked, isn’t it important for teachers to listen to their students read aloud? Our vision is not to reduce the amount of time teachers spend listening to their students read: there is great value in a teacher gaining a deep and qualitative perspective on their students’ reading. Rather, ROAR intends to lift the burden of assessment so that teachers can spend more time working individually with students on reading skills, providing the instruction and intervention students need, and less time administering and scoring assessments. With assessments done silently in a group administration format, teachers are given more time to focus on teaching rather than assessing. Professional development can be focused on instruction rather than compliance with assessment.

Another common question we receive is, why doesn’t ROAR use speech-recognition software for group-administered oral reading assessments? The promise of speech-recognition is exciting and will have many implications for educational technology. At this time, the cutting-edge of speech-recognition technology is still highly biased against several language groups (Koenecke et al. 2020), and it is only appropriate for very specific use cases in foundational reading skills assessment across a linguistically diverse population (Wu, García-Perera, and Povey 2019; Hannah, Kim, and Jang 2022). These algorithms will improve quickly over time, but ROAR only currently integrates this technology for one specific use case – Rapid Automatized Naming Section 9.2.1. Importantly, even though improved automated speech recognition algorithms could, in theory, lift the burden of scoring, there are major threats to validity for group administered assessments that involve reading out loud. In a group setting, students would hear the verbal responses of their peers, which would cause a) distractions, b) confusion and, c) input that may influence responses (e.g., hearing another student’s answer). Thus, even as speech recognition algorithms improve, concerns about validity will always remain if verbal assessments are administered in a group setting.

Overall, from a practical standpoint, silent and automated assessments allow for greater scale and efficiency and have much lower demands on resources. Careful design and validation can mitigate worries about bias across different varieties of spoken English. Further, automated assessment and scoring eliminates issues with inter-rater reliability, where different test administrators may score a student differently based on their varying levels of training, experience, and other factors. From a theoretical standpoint, by remaining a silent-reading assessment, ROAR captures the information that informs the screening of dyslexia and foundational reading skills (Chapter 9) in a way that is accurate, reliable, and valid while also having the benefit of being administered in a group setting.

References

Brown, Megan C, Daragh E Sibley, Timothy T Washington Julie A and Rogers, Jan R Edwards, and Mark S MacDonald Maryellen C and Seidenberg. 2015. “Impact of Dialect Use on a Basic Component of Learning to Read.” Frontiers in Psychology 6 (March): 196.
Hannah, L, H Kim, and E E Jang. 2022. “Investigating the Effects of Task Type and Linguistic Background on Accuracy in Automated Speech Recognition Systems: Implications for Use in Language Assessment of Young Learners.” Language Assessment Quarterly 19 (3): 289–313.
Keshavan, Anisha, Jason D Yeatman, and Ariel Rokem. 2019. “Combining Citizen Science and Deep Learning to Amplify Expertise in Neuroimaging” 13 (May): 29.
Koenecke, Allison, Andrew Nam, Emily Lake, Minnie Nudell Joe and Quartey, Zion Mengesha, John R Toups Connor and Rickford, Dan Jurafsky, and Sharad Goel. 2020. “Racial Disparities in Automated Speech Recognition.” Proceedings of the National Academy of Sciences of the United States of America 117 (14): 7684–89.
Kruper, John, Jason D Yeatman, Adam Richie-Halford, David Bloom, Mareike Grotheer, Sendy Caffarra, Gregory Kiar, et al. 2021. “Evaluating the Reliability of Human Brain White Matter Tractometry.” Apert Neuro 1 (1): 1–25.
Mezer, A, J D Yeatman, N Stikov, K N Kay, N J Cho, R F Dougherty, M L Perry, et al. 2013. “Quantifying the Local Tissue Volume and Composition in Individual Brains with MRI.” Nature Medicine 19 (12): 1667–72.
Richie-Halford, Adam, Matthew Cieslak, Lei Ai, Sendy Caffarra, Sydney Covitz, Alexandre R Franco, Iliana I Karipidis, et al. 2022. “An Analysis-Ready and Quality Controlled Resource for Pediatric Brain White-Matter Research.” Scientific Data 9 (1): 1–27.
Richie-Halford, Adam, Manjari Narayan, Noah Simon, Jason Yeatman, and Ariel Rokem. 2021. “Groupyr: Sparse Group Lasso in Python.” J. Open Source Softw. 6 (58): 3024.
Richie-Halford, Adam, Jason Yeatman, Noah Simon, and Ariel Rokem. 2021. “Multidimensional Analysis and Detection of Informative Features in Human Brain White Matter.” Plos Computational Biology 17 (6): e1009136.
Washington, J A, and M S Seidenberg. 2021. “Teaching Reading to African American Children: When Home and School Language Differ.” American Educator.
Wu, Fei, Leibny Paola García-Perera, and Sanjeev Povey Daniel and Khudanpur. 2019. “Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network.” In Interspeech 2019. ISCA: ISCA.
Yeatman, Jason D. 2022. “The Neurobiology of Literacy.” The Science of Reading: A Handbook, 533–55.
Yeatman, Jason D, Robert F Dougherty, Nathaniel J Myall, Brian A Wandell, and Heidi M Feldman. 2012. “Tract Profiles of White Matter Properties: Automating Fiber-Tract Quantification.” PLoS One 7 (11): e49790.
Yeatman, Jason D, Adam Richie-Halford, Josh K Smith, Anisha Keshavan, and Ariel Rokem. 2018. “A Browser-Based Tool for Visualization and Analysis of Diffusion MRI Data.” Nature Communications 9 (1): 940.
Yeatman, Jason D, Brian A Wandell, and Aviv A Mezer. 2014. “Lifespan Maturation and Degeneration of Human Brain White Matter.” Nature Communications 5 (September): 4932.
Yeatman, Jason D, and Alex L White. 2021. “Reading: The Confluence of Vision and Language.” Annual Review of Vision Science 7 (1): 487–517.