24 ROAR-Inference Assessment Design

24.1 How the Assessment Operationalizes the Construct

ROAR-Inference operationalizes the construct of coherence evaluation through a systematic approach to item design, that is, ordered multiple-choice (Briggs et al. 2006; Morell et al. 2025). The foundation is the three types of meaning-making that students must engage in when reading: logical relations (understanding causality and character motivation—why events happen and how they connect), informational relations (identifying and tracking informational event elements—who is involved, what objects or places matter, when events occur), and evaluative relations (understanding significance, themes, and character values—what deeper meaning lies beneath the surface).

Within each type, response options represent three distinct levels of explanatory coherence—from minimal coherence (responses that lack textual grounding) to partial coherence (responses that engage with text but miss full integration) to full coherence (responses that integrate multiple pieces of evidence and are grounded in textual support) See construct map (Wilson 2023) in Figure XX. This design allows educators to see not just whether students answer correctly, but how they approach the task of constructing meaning. (Carlson, Broek, and McMaster 2022) findings, “suggests that less skilled comprehenders may have overly relied on their background knowledge (beyond text information) during their recalls, which may have in turn caused their recall to be less connected with the text in general and less connected to the causal structures of the text” (Carlson et al., 2022, p. 492)

Insert figure XX: Construct Map Here “Explanatory Coherence Construct Map”

24.2 Three Types of Meaning-Making: The Core Organization

24.2.1 Logical Relations

Logical relation items measure students’ ability to understand causal and motivational relationships—why events happen and how they connect. These items employ “why” and “how” question formats and address questions like: Why did an actor act this way? How did the character succeed? These items can be literal of inferential.

Dolphin Item Example - Logical Relation:

Passage: Dolphins like being around other dolphins and animals. When Tory, a dolphin photographer, goes into the ocean with a group of photographers, she gets great photos even when there is only one dolphin. She loves how much the dolphins engage with her group.

Question: Why is Tory able to get great photos of dolphins?

Response Options and Coherence Levels:

Target (Full Coherence): “Because dolphins are social animals” — This response integrates multiple pieces of information (dolphins like being around others, dolphins engage with the photographer group) with the logical structure of the explanation (social animals engage more, enabling better photos). The student has connected the evidence to explain the outcome.
Distractor 1 (Partial Coherence): “Because dolphins enjoy being around other dolphins” — This response shows engagement with textual information but lacks the full integration needed. The student has identified a detail from the passage but hasn’t connected it to the specific question about Tory’s photo success, especially given that she gets good photos even with one dolphin.
Distractor 2 (Minimal Coherence): “Because Tory enjoys being around dolphins” — This response may seem plausible from the question alone but fails to integrate actual narrative information or connect to the logical structure. The passage doesn’t mention Tory’s enjoyment being relevant to photo quality.

This item measures whether students can integrate multiple textual clues to construct a causally coherent explanation. The distractor patterns reveal whether students are accessing textual information (Distractor 1) or relying on background knowledge without textual grounding (Distractor 2).

24.2.2 Informational Relations

Informational relation items measure students’ ability to identify and integrate informational event elements—who is involved, what objects or places matter, when events occur. These items employ “who,” “what,” “where,” and “when” question formats. They may address referential relationships (connecting pronouns to antecedents) or spatiotemporal relationships (tracking actors, objects, and locations across the text and time). These can also be literal of inferential.

Beetle Item Example - Informational Relation:

Passage: Peri beetles can change into different colors, such as brown and yellow. Peri beetles turn green to protect themselves from threats. Peri beetles live on leaves. When they see other Peri beetles, they turn green.

Question: What happens when a Peri beetle sees another Peri beetle?

Response Options and Coherence Levels:

Target (Full Coherence): “They turn green” — This response demonstrates direct text-based retrieval and correct understanding of which detail answers the specific question.
Distractor 1 (Partial Coherence): “They turn brown” — This response shows engagement with textual information about beetle color changes but answers the wrong question. The student has identified a relevant detail from the passage but hasn’t accurately matched it to the specific scenario asked about.
Distractor 2 (Minimal Coherence): “They turn orange” — This response introduces information not present in the text, suggesting the student is relying on general knowledge or guessing rather than using textual evidence.

In informational items, the coherence progression focuses on accurate retrieval and integration of event elements rather than causal explanation. Students selecting Distractor 1 demonstrate they can access textual information but may struggle with careful reading or tracking multiple similar details. Students selecting Distractor 2 may lack engagement with the text.

24.2.3 Evaluative Relations

Evaluative relation items measure students’ ability to understand significance, themes, and what events reveal about actors and events—the deeper meaning beneath the surface. These items ask students to integrate textual information with broader conceptual understanding. All evaluative items are inferential in nature, as students must construct an understanding of theme, lesson, or values that are not explicitly stated with grammatical cues. In ROAR Inference, all of our Evaluative items were inferential.

Example Structure:

Evaluative items follow a similar response option structure but focus on meaning-making rather than causality or informational details. A target answer might identify an actor’s values based on their actions across the passage. Distractor 1 might identify a partial pattern—noticing one instance of the actor acting a certain way without recognizing the broader pattern. Distractor 2 might be a plausible interpretation based on general knowledge about actor behavior that isn’t actually supported by the passage.

Passage: Ade harvested the most yams in the village. He wanted to share them with his village and invited everyone to his home. James brought his prize-winning tomatoes and made a great yam stew. Stila organized a dance to celebrate Ade’s harvest. Ade’s home was filled with laughter that evening.

Question: What did Ade learn?

Response Options and Coherence Levels:

Target (Full Coherence): “Sharing what you have with others brings joy” — This response integrates multiple pieces of information (Ade wants to share, people bring different dishes to his home, they all dance and celebrate, people laughing) with the logical structure of the explanation (By sharing people experienced a context of joy).
Distractor 1 (Partial Coherence): “How to harvest the most yams in the village” — This response shows engagement with textual information about Ade harvesting the most yams, and by virtue of harvesting the most yams, likely knows how to harvest yams, but the student has identified a relevant detail from the passage without accurately matching it to the larger thematic structure.
Distractor 2 (Minimal Coherence): “How to raise animals in the village” — This response introduces information not present in the text beyond that they are in a village and thus, to the reader, there are possibly animals there, suggesting the student is relying on general knowledge or guessing rather than using textual evidence.

24.3 Frameworks Underlying Item Design

While ROAR-Inference measures coherence evaluation across event-chain relations (logical, informational, evaluative) as described in the theoretical background, the assessment design leverages several complementary frameworks that guide how we construct items to measure these abilities at different levels of cognitive demand.

Several key frameworks inform the design:

Cognitive Demand Variation

Inference vs Literal: From a general inference theory perspective, what makes an item an inference is not its surface form or the modality of the passage, but whether it requires the reader to construct meaning that goes beyond what is explicitly stated in the text (Blum et al. 2020; Biancarosa 2026; Kendeou 2015; Medeiros et al. 2025; Trabasso 1980). Literal items, by contrast, have answers with direct grammatical links to the text. Empirical item-difficulty analyses have found that inferential and literal items can overlap considerably in difficulty, and inference items are not categorically harder than literal items [Alonzo, Liu, and Tindal (2007); Basaraba et al. (2013); Santos et al. (2016)]. Nevertheless, inference items tend, on average, to be more demanding, as they require integration of information beyond what is explicitly provided.

QARs: In terms of Question-Answer relations (Blum et al. 2020; Pearson and Johnson 1978; Raphael and Au 2005), items vary in the cognitive distance between questions and answers. Some answers are directly stated in text with grammatical links to questions (text-explicit), others require connecting textual information (text-implicit), and others require integration with background knowledge (script-implicit: knowledge structures about events or situations stored in long-term memory) (Anderson, Spiro, and Anderson 1978; Anderson and Pearson 1984; Bower, Black, and Turner 1979; Mandler 2014; Schank and Abelson 2013). This variation ensures students must construct and evaluate meaning across different levels of processing challenge. These QARs have also been positioned from a lens of coherence, ranging from local (text-implicit) to global (script-implicit) (Blum et al. 2020, 2026).

Knowledge-Base Inference Types: (A. C. Graesser, Singer, and Trabasso 1994) framework highlights that students use different types of inferences when constructing meaning—including causal-antecedent inferences (understanding causes as prior events), superordinate goal inferences (understanding character intentions), state inferences (understanding conceptual and dispositional states (e.g., traits), and referential inferences (identifying referents (e.g., this, that, here, there, him, her)). By varying these inference types across items, the assessment captures the different cognitive processes students employ when constructing meaning. Other scholars have used these types of inferences in their investigation (Malle and Holbrook 2012; Van Overwalle et al. 2012), particularly goal, state, and casual-antecedent inferences that situate different types of explanatory-stances (i.e., mechanistic (i.e., backwards and state oriented) and teleological (i.e., forward and goal directed)) (Keil 2006; Lombrozo 2006, 2016).

24.4 How Cognitive Demand Varies: Question-Answer Relations

Beyond ensuring coverage of all three types of meaning-making, items also vary in cognitive demand through the relationship between questions and answers (Blum et al. 2020, 2026; Pearson and Johnson 1978). This variation ensures students must construct and evaluate meaning across different levels of processing challenge.

Text-Explicit Items: Text-Explicit Items: The answer is directly stated in the text, and there is a grammatical link between the question and the answer. These items still require students to understand and retrieve the relevant information, but the pathway to the answer is explicitly (syntactically) linked to the answer. For example, in the beetle informational relation item, “When they see other Peri beetles, they turn green.” And are asked, “What happens when a Peri beetle sees another Peri beetle?” and they say , “green” there is a grammatical link coordinated with the conjunction “when”.

Text-Implicit Items: The answer comes from information in the text, but there is no grammatical link between the question and answer. The student must connect pieces of information within the text—a more cognitively demanding task than retrieval alone. If a passage states “The team won the championship. The coach then shared her trophy with the team,” a text-implicit question-answer relation might be represented as, Question: “Why did the coach share her prize?” — and the answer: “Because the team won”, that answer is a detail in the story, but there is no causal link between the propositional units of winning the championship and the coach sharing the trophy.

Script-Implicit Items: These answers require integrating textual information with background knowledge and world schemas (Generic and specific knowledge structures such as memories and socially agreed upon norms and practices). The student must go beyond the text to their knowledge of how the world works. For example, in the dolphin item, understanding that “social animals are more likely to engage with humans” draws on background knowledge integrated with textual information.

24.5 How Inference Types Vary: Knowledge-Base Inferences

Items also specify the types of inferences students must generate (A. C. Graesser, Singer, and Trabasso 1994). That is, the target answer itself represents different inference types:

Causal-antecedent inferences: Understanding what prior event caused an action or event
Superordinate goal inferences: Understanding an actor’s overarching intentions or desired outcomes
State inferences: Understanding an actor’s psychological or dispositional state
Referential inferences: Identifying what pronouns or references refer to

By varying inference types across items, the assessment captures the different cognitive processes students use when constructing meaning. A student might handle referential inferences well but struggle with state inferences, revealing a specific area for instructional focus.

24.6 Passage and Item Design Principles

24.6.1 Passage Length and Content

Passages are standardized to2-6 sentences long, with the mean sentence length of 3.6 – (1/34 stories had 6 sentences, 5/34 stories had 5 sentences, 8/34 stories had 4 sentences, 17/34 stories had 3 sentences, and 2/34 stories had 2 sentences), a deliberate design choice that shapes what ROAR-Inference measures and how results should be interpreted. This length reduces working memory demands that can confound inference assessment (Yuill, Oakhill, and Parkin 1989) and minimizes reliance on text memory, allowing for a clearer measure of inference-making ability in controlled conditions. By presenting students with passages on varied topics, ROAR-Inference reduces confounding effects from topic-specific or domain knowledge (Garth-McCullough 2008; Johnston and Pearson 1982). This ensures that performance reflects inference-making ability more than familiarity with particular topics. Some other inference assessments also employ short passages with immediate inference measurement as an attempt to isolate this comprehension process (Barth, Vaughn, and McCulley 2015; Rice and Wijekumar 2024; Rochat, Lima, and Bressoux 2025).

This design choice prioritizes construct precision—measuring inference-making ability in controlled conditions—over capturing how students handle inferences in longer, naturalistic texts where working memory demands are greater. ROAR-Inference provides a focused measure of students’ ability to construct and evaluate coherent meaning within bounded, manageable text segments. Educators should interpret scores as reflecting students’ inference-making ability under these specific conditions, recognizing that performance on longer, more complex texts may involve additional cognitive demands not assessed by this measure. Passage sentence length is relatively consistent across items to control this variable’s impact on item difficulty.

Content Features

All stories were analyzed using T.E.R.A by Co-Metrix (Arthur C. Graesser, McNamara, and Kulikowich 2011) and each score ranges from 0 to 1. The five textural compnents it evaluates are:

“Narrativity measures how much a text is story-like; the greater the degree of narrativity, the easier the text is to read. In contrast, less narrativity can indicate that the text contains more complex information.
Syntactic Simplicity measures how syntax is structured, which is determined by the number of words and clauses in a sentence, or the number of words before the main verb in the sentence. When the syntax is more complex, readers can have more difficulty creating a coherent understanding of the sentence’s meaning.
Word Concreteness measures the number of concrete words in comparison to abstract words. In contrast to abstract words, concrete words offer clear mental images, which allow a text to be easier to comprehend.
Referential Cohesion measures how much words, word stems, or concepts overlap within a text; low referential cohesion can cause a reader to have difficulty connecting ideas between sentences.
Deep Cohesion measures how events and ideas are related throughout the entire text; with greater overlap suggesting greater overall cohesion.”

The mean syntactic simplicity score was (0.48); mean narrativity score (0.74); mean word concreteness (0.92); mean referential cohesion (0.83); mean deep cohesion (.59); mean Fleshkincaid (5.1); mean # of words (44.0).

24.6.2

24.7 Response Option Architecture

As illustrated in the dolphin example above, each item includes three response options representing systematically different levels of explanatory coherence. These three levels correspond to ROAR-Inference’s developmental waypoints (i.e., benchmarks that describe qualitatively distinct stages in students’ meaning-making ability along a developmental progression):

Target Answer (Full Coherence - Waypoint 2 - Strategic): The target demonstrates complete integration of textual information with the question demand and the underlying meaning-making process being assessed. The answer is grounded in passage information and accounts for the relevant evidence. A student demonstrating strategic coherence evaluation consistently selects target responses across items, showing sophisticated ability to construct meaning that integrates multiple pieces of evidence and evaluates them against alternatives.

Distractor 1 (Partial Coherence - Waypoint 1 - Developing): This option shows engagement with content—the student has accessed text-based information from the passage—but lacks the full integration or accuracy needed. The student may have identified a text-based detail but has not connected it properly, or may have missed some important contextual information. This represents developing coherence evaluation, where students show emerging ability to use textual information but have not yet achieved consistent full integration.

Distractor 2 (Minimal Coherence - Waypoint 0 - Emerging): This option might seem plausible from general knowledge or from the question alone but fails to integrate actual text-based information or connect to the meaning-making structure being assessed. This represents minimal coherence evaluation, often reflecting over-reliance on background knowledge without textual grounding or limited engagement with the text. A student at this waypoint may understand individual sentences but struggles to construct coherent understanding of passages.

24.8 Summary

Each ROAR-Inference item is designed to measure one type of meaning-making (logical, informational, or evaluative) with varying cognitive demand (text-explicit, text-implicit, or script-implicit), potentially requiring a specific type of inference (causal-antecedent, goal, state, or referential). The response options reveal whether students demonstrate full coherence (target), partial coherence (Distractor 1), or minimal coherence (Distractor 2) in their meaning-making.

This systematic design ensures that:

Students’ abilities across all three types of meaning-making are measured
The assessment captures variation in how students approach meaning-making
Response patterns reveal specific strengths and instructional needs
Educators can see not just whether students answer correctly, but how they construct and evaluate meaning

Together, these elements allow ROAR-Inference to provide detailed information about how students engage in the fundamental reading comprehension process of constructing and evaluating coherent meaning from diverse texts.

References

Alonzo, J, K Liu, and G Tindal. 2007. “Examining the Technical Adequacy of Reading Comprehension Measures in a Progress Monitoring Assessment System” 41.

Anderson, Richard C, and P David Pearson. 1984. “A Schema-Theoretic View of Basic Processes in Reading Comprehension.” Center for the Study of Reading Technical Report ; No. 306, January.

Anderson, Richard C, Rand J Spiro, and Mark C Anderson. 1978. “Schemata as Scaffolding for the Representation of Information in Connected Discourse.” Am. Educ. Res. J. 15 (3): 433–40.

Barth, Amy E, Sharon Vaughn, and Elisabeth V McCulley. 2015. “The Effects of Blended Text-Processing and Linguistic Comprehension Interventions Among Struggling Middle-School Readers.” Int J Res Learn Disabil 2 (2): 2–17.

Basaraba, Deni, Paul Yovanoff, Julie Alonzo, and Gerald Tindal. 2013. “Examining the Structure of Reading Comprehension: Do Literal, Inferential, and Evaluative Comprehension Truly Exist?” Read. Writ. 26 (3): 349–79.

Biancarosa, Gina. 2026. “Inferencing and Reading Comprehension.” Language and Literacy Beyond Decoding, 140.

Blum, Alexander Mario, James M Mason, Robin C Irey, Yukie Toyama, Yunting Liu, Gio Jung, Andrew Scott, Jinho Kim, and P David Pearson. 2026. “Disentangling Local Versus Global Processing Dispositions in Autistic Cognition Using Item Response Theory Models: Comparing Multimodal Comics Versus Text-Based Narratives in a Randomized Study.” Discourse Process., February, 1–24.

Blum, Alexander Mario, James M Mason, Jinho Kim, and P David Pearson. 2020. “Modeling Question-Answer Relations: The Development of the Integrative Inferential Reasoning Comic Assessment.” Reading and Writing: An Interdisciplinary Journal 33 (8): 1971–2000.

Bower, Gordon H, John B Black, and Terrence J Turner. 1979. “Scripts in Memory for Text.” Cogn. Psychol. 11 (2): 177–220.

Briggs, Derek, Alicia Alonzo, Cheryl Schwab, and Mark Wilson. 2006. “Diagnostic Assessment With Ordered Multiple-Choice Items.” Educational Assessment 11 (1): 33–63. https://doi.org/10.1207/s15326977ea1101_2.

Carlson, Sarah E, Paul van den Broek, and Kristen L McMaster. 2022. “Factors That Influence Skilled and Less-Skilled Comprehenders’ Inferential Processing During and After Reading: Exploring How Readers Maintain Coherence and Develop a Mental Representation of a Text.” Elem. Sch. J. 122 (4): 475–501.

Garth-McCullough, Ruanda. 2008. “Untapped Cultural Support: The Influence of Culturally Bound Prior Knowledge on Comprehension Performance.” Reading Horizons 49: 1–30.

Graesser, A C, M Singer, and T Trabasso. 1994. “Constructing Inferences During Narrative Text Comprehension.” Psychol. Rev. 101 (3): 371–95.

Graesser, Arthur C, Danielle S McNamara, and Jonna M Kulikowich. 2011. “Coh-Metrix: Providing Multilevel Analyses of Text Characteristics.” Educ. Res. 40 (5): 223–34.

Johnston, Peter H, and P David Pearson. 1982. “Prior Knowledge, Connectivity, and the Assessment of Reading Comprehension.” Center for the Study of Reading Technical Report; No. 245.

Keil, Frank C. 2006. “Explanation and Understanding.” Annu. Rev. Psychol. 57: 227–54.

Kendeou, Panayiota. 2015. “A General Inference Skill.” In Inferences During Reading, edited by Edward J O’Brien, Anne E Cook, and Robert F Lorch Jr, 160–81. Cambridge, England: Cambridge University Press.

Lombrozo, Tania. 2006. “The Structure and Function of Explanations.” Trends Cogn. Sci. 10 (10): 464–70.

———. 2016. “Explanatory Preferences Shape Learning and Inference.” Trends Cogn. Sci. 20 (10): 748–59.

Malle, Bertram F, and Jess Holbrook. 2012. “Is There a Hierarchy of Social Inferences? The Likelihood and Speed of Inferring Intentionality, Mind, and Personality.” J. Pers. Soc. Psychol. 102 (4): 661–84.

Mandler, J M. 2014. Stories, Scripts, and Scenes: Aspects of Schema Theory. Distinguished Lecture Series. London, England: Psychology Press.

Medeiros, Stasha, Neil Cohn, Tom Foulsham, and Emily L Coderre. 2025. “Association of Autistic Traits with Inference Generation in Visual Narratives.” Sci. Rep. 15 (1).

Morell, Linda, Shruti Bathia, Bon W Koo, Mark Wilson, Perman Gochyyev, and Rebecca Smith. 2025. “Developing and Gathering Validity Evidence for an Instrument to Measure How High School Students Identify as Researchers.” Res. Sci. Educ. 55 (2): 359–82.

Pearson, P David, and Dale D Johnson. 1978. “Questions.” In Teaching Reading Comprehension, 153–78. New York, NY: Holt, Rinehart; Winston.

Raphael, Taffy E, and Kathryn H Au. 2005. “QAR: Enhancing Comprehension and Test Taking Across Grades and Content Areas.” Read. Teach. 59 (3): 206–21.

Rice, Marianne, and K Wijekumar. 2024. “Inference Skills for Reading: A Meta-Analysis of Instructional Practices.” J. Educ. Psychol., March.

Rochat, Nicolas, Laurent Lima, and Pascal Bressoux. 2025. “The Riddle Knowledge Inference Test (R-Kit).” J. Psychoeduc. Assess., January.

Santos, Sandra, Irene Cadime, Fernanda Leopoldina Viana, Gerardo Prieto, Séli Chaves-Sousa, Alina Galvão Spinillo, and Iolanda Ribeiro. 2016. “An Application of the Rasch Model to Reading Comprehension Measurement.” Psicologia: Reflexão e Crítica 29 (1): 38.

Schank, Roger C, and Robert P Abelson. 2013. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. Artificial Intelligence Series. London, England: Psychology Press.

Trabasso. 1980. “On the Making of Inferences During Reading and Their Assessment.”

Van Overwalle, Frank, Marijke Van Duynslaeger, Daphné Coomans, and Bert Timmermans. 2012. “Spontaneous Goal Inferences Are Often Inferred Faster Than Spontaneous Trait Inferences.” J. Exp. Soc. Psychol. 48 (1): 13–18.

Wilson, Mark. 2023. Constructing Measures. 2nd Edition. Routledge.

Yuill, Nicola, Jane Oakhill, and Alan Parkin. 1989. “Working Memory, Comprehension Ability and the Resolution of Text Anomaly.” Br. J. Psychol. 80 (3): 351–61.