| Grade | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 11103 | 0.91 | 0.91 to 0.91 |
| 1 | 4238 | 0.88 | 0.88 to 0.89 |
| 2 | 3233 | 0.87 | 0.86 to 0.88 |
| Kindergarten | 3632 | 0.89 | 0.89 to 0.90 |
20 Reliability of ROAR-Composite
20.1 Reliability
Score reliability for the IRT-based composite can be computed using the standard formula for marginal reliability; whereas, reliability for the overall weighted composite is computed using a special case of the Spearman-Brown formula (citation). Under classical test theory (CTT), score reliability is defined as the ratio of true-score variance to observed variance:
\[ \rho_{XX} = \frac{\mathrm{Var}(T_X)}{\mathrm{Var}(X)} \]
where \(T_X\) is the true score. Using this definition, IRT marginal reliability—based on expected a posteriori (EAP) theta estimates—can be computed as
\[ \rho_{\text{marginal}} = \frac{\mathrm{Var}(\theta)} {\mathrm{Var}(\theta) + \mathrm{Var}(e)} \]
where the error variance, \(\mathrm{Var}(e)\), is defined as the expected posterior variance of the latent trait across examinees:
\[ \mathrm{Var}(e) = \mathrm{E}\!\left[\mathrm{Var}(\theta \mid \mathbf{u})\right] = \frac{1}{N} \sum_{i=1}^{N} \mathrm{Var}(\theta \mid \mathbf{u}_i) \]
\(\theta\) = Latent trait being measured.
\(\mathbf{u}_i\) = Response vector for examinee \(i\).
\(\mathrm{Var}(\theta)\) = Population variance of the latent trait.
\(\mathrm{Var}(\theta \mid \mathbf{u}_i)\) = Posterior variance of \(\theta\) for examinee \(i\).
\(N\) = Number of examinees.
The CTT definition of reliability can be readily extended to composite scores:
\[ \rho_{CC} = \frac{\mathrm{Var}(T_C)}{\mathrm{Var}(C)} \]
where the composite true score is equal to
\[ T_C = \mathbf{w}^\top \mathbf{T}. \]
The general formula for composite reliability (in matrix form) becomes (Mosier 1943):
\[ \rho_{CC} = \frac{\mathbf{w}^\top \boldsymbol{\Sigma}_T \mathbf{w}} {\mathbf{w}^\top \boldsymbol{\Sigma}_X \mathbf{w}} \]
This formulation allows for correlated errors among the component scores; however, if we assume that the errors are uncorrelated, the formula reduces to
\[ \rho_{CC} = \frac{ \sum_{i=1}^{k} w_i^2 \, \rho_{ii} \, \sigma_i^2 \;+\; 2 \sum_{i<j} w_i w_j \, \sigma_{ij} }{ \sum_{i=1}^{k} w_i^2 \, \sigma_i^2 \;+\; 2 \sum_{i<j} w_i w_j \, \sigma_{ij} } \]
\(k\) = Number of component scores in the composite.
\(X_i\) = Observed score on component \(i\).
\(w_i\) = Weight assigned to component \(i\).
\(w_j\) = Weight assigned to component \(j\).
\(\rho_{ii}\) = Reliability of component \(i\).
\(\sigma_i^2 = Var(X_i)\) = Variance of component \(i\).
\(\sigma_{ij} = Cov(X_i, X_j)\) = Covariance between components \(i\) and \(j\).
20.2 Composite score reliability estimates
The empirical reliability of the foundational skills composite in the calibration sample was 0.93 (0.93 to 0.93). The marginal reliability was 0.96. Students who answered every administered item correctly were excluded from the reliability calculation (1223 of 34472 excluded; 33249 retained).
| Gender | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 19457 | 0.94 | 0.94 to 0.94 |
| Female | 9518 | 0.94 | 0.94 to 0.94 |
| Male | 9939 | 0.94 | 0.94 to 0.94 |
| Free or Reduced Lunch | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 4852 | 0.94 | 0.94 to 0.94 |
| Free/Reduced | 1301 | 0.93 | 0.92 to 0.93 |
| Paid | 3551 | 0.94 | 0.93 to 0.94 |
| English learner status | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 6238 | 0.94 | 0.94 to 0.94 |
| English Learner | 1796 | 0.94 | 0.94 to 0.94 |
| English Only | 3567 | 0.94 | 0.93 to 0.94 |
| Initial Fluent English Proficient | 643 | 0.93 | 0.92 to 0.94 |
| Reclassified Fluency English Proficient | 232 | 0.91 | 0.88 to 0.92 |
| Primary language | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 4584 | 0.94 | 0.94 to 0.94 |
| English | 2968 | 0.94 | 0.93 to 0.94 |
| Other | 774 | 0.92 | 0.92 to 0.93 |
| Spanish | 842 | 0.93 | 0.92 to 0.94 |
| IEP / Special Education | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 5503 | 0.94 | 0.94 to 0.94 |
| No | 5123 | 0.94 | 0.94 to 0.94 |
| Yes | 380 | 0.94 | 0.92 to 0.94 |
| Hispanic ethnicity | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 19754 | 0.94 | 0.94 to 0.94 |
| No | 14368 | 0.94 | 0.93 to 0.94 |
| Yes | 5386 | 0.95 | 0.95 to 0.95 |
| Race | N | Empirical Reliability | 95% CI |
|---|---|---|---|
| All | 15645 | 0.94 | 0.94 to 0.95 |
| American Indian/Alaska Native | 81 | 0.96 | 0.94 to 0.97 |
| Asian | 1533 | 0.94 | 0.93 to 0.94 |
| Black/African American | 2014 | 0.92 | 0.92 to 0.93 |
| Hispanic/Latinx | 3054 | 0.94 | 0.93 to 0.94 |
| Multiracial | 4647 | 0.92 | 0.92 to 0.93 |
| Native Hawaiian/Other Pacific Islander | 51 | 0.93 | 0.89 to 0.95 |
| White | 4265 | 0.95 | 0.95 to 0.95 |