- Joshua B. Gilbert
Search EdWorkingPapers by author, title, or keywords.
Joshua B. Gilbert
This simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores provide generally equivalent bias and false positive rates compared to CTT scores and superior calibration of standard errors under model misspecification. Analysis of the statistical power of each method reveals that the EIRM and IRT theta scores provide a marginal benefit to power and are more robust to missing data than other methods when parametric assumptions are met and provide a substantial benefit to power under heteroskedasticity, but their performance is mixed under other conditions. The methods are illustrated with an empirical data application examining the causal effect of an elementary school literacy intervention on reading comprehension test scores and demonstrates that the EIRM provides a more precise estimate of the average treatment effect than the CTT or IRT theta score approaches. Tradeoffs of model selection and interpretation are discussed.
The current study aimed to explore the COVID-19 impact on the reading achievement growth of Grade 3-5 students in a large urban school district in the U.S. and whether the impact differed by students’ demographic characteristics and instructional modality. Specifically, using administrative data from the school district, we investigated to what extent students made gains in reading during the 2020-2021 school year relative to the pre-COVID-19 typical school year in 2018-2019. We further examined whether the effects of students’ instructional modality on reading growth varied by demographic characteristics. Overall, students had lower average reading achievement gains over the 9-month 2020-2021 school year than the 2018-2019 school year with a learning loss effect size of 0.54, 0.27, and 0.28 standard deviation unit for Grade 3, 4, and 5, respectively. Substantially reduced reading gains were observed from Grade 3 students, students from high-poverty backgrounds, English learners, and students with reading disabilities. Additionally, findings indicate that among students with similar demographic characteristics, higher-achieving students tended to choose the fully remote instruction option, while lower-achieving students appeared to opt for in-person instruction at the beginning of the 2020-2021 school year. However, students who received in-person instruction most likely demonstrated continuous growth in reading over the school year, whereas initially higher-achieving students who received remote instruction showed stagnation or decline, particularly in the spring 2021 semester. Our findings support the notion that in-person schooling during the pandemic may serve as an equalizer for lower-achieving students, particularly from historically marginalized or vulnerable student populations.
Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing Heterogeneous Treatment Effects (HTE) fail to address the HTE that may exist within outcome measures. In this study, we present a novel application of the Explanatory Item Response Model (EIRM) for assessing what we term “item-level” HTE (IL-HTE), in which a unique treatment effect is estimated for each item in an assessment. Results from data simulation reveal that when IL-HTE are present but ignored in the model, standard errors can be underestimated and false positive rates can increase. We then apply the EIRM to assess the impact of a literacy intervention focused on promoting transfer in reading comprehension on a digital formative assessment delivered online to approximately 8,000 third-grade students. We demonstrate that allowing for IL-HTE can reveal treatment effects at the item-level masked by a null average treatment effect, and the EIRM can thus provide fine-grained information for researchers and policymakers on the potentially heterogeneous causal effects of educational interventions.