- Benjamin W. Domingue
Search EdWorkingPapers by author, title, or keywords.
Benjamin W. Domingue
Analyzing heterogeneous treatment effects plays a crucial role in understanding the impacts of educational interventions. A standard practice for heterogeneity analysis is to examine interactions between treatment status and pre-intervention participant char- acteristics, such as pretest scores, to identify how different groups respond to treatment. This study demonstrates that identical observed patterns of heterogeneity on test score outcomes can emerge from entirely distinct data-generating processes. Specifically, we describe scenarios in which treatment effect heterogeneity arises from either variation in treatment effects along a pre-intervention participant characteristic or from correlations between treatment effects and item easiness parameters. We demonstrate analytically and through simulation that these two scenarios cannot be distinguished if analysis is based on summary scores alone as such outcomes are insufficient to identify the relevant generating process. We then describe a novel approach that identifies the relevant data-generating process by leveraging item-level data. We apply our approach to a randomized trial of a reading intervention in second grade, and show that any apparent heterogeneity by pretest ability is driven by the correlation between treatment effect size and item easiness. Our results highlight the potential of employing measurement principles in causal analysis, beyond their common use in test construction.
Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an intervention has had a positive impact on student achievement. We show that item-level data and psychometric analyses can provide information about treatment heterogeneity and improve design of future experiments. We apply techniques typically used in the study of Differential Item Functioning (DIF) to examine variation in the degree to which items show treatment effects. That is, are observed treatment effects due to generalized gains on the aggregate achievement measures or are they due to targeted gains on specific items? Based on our analysis of 7,244,566 item responses (265,732 students responding to 2,119 items) taken from 15 RCTs in low-and-middle-income countries, we find clear evidence for variation in gains across items. DIF analyses identify items that are highly sensitive to the interventions—in one extreme case, a single item drives nearly 40% of the observed treatment effect—as well as items that are insensitive. We also show that the variation of item-level sensitivity can have implications for the precision of effect estimates. Of the RCTs that have significant effect estimates, 41% have patterns of item-level sensitivity to treatment that allow for the possibility of a null effect when this source of uncertainty is considered. Our findings demonstrate how researchers can gain more insight regarding the effects of interventions via additional analysis of item-level test data.
Education has faced unprecedented disruption during the COVID-19 pandemic; evidence about the subsequent effect on children is of crucial importance. We use data from an oral reading fluency (ORF) assessment—a rapid assessment taking only a few minutes that measures a fundamental reading skill—to examine COVID’s effects on children’s reading ability during the pandemic in more than 100 U.S. school districts. Effects were pronounced, especially for Grades 2–3, but distinct across spring and fall 2020. While many students were not assessed in spring 2020, those who were seemed to have experienced relatively limited or no growth in ORF relative to gains observed in other years. In fall 2020, a far more representative set of students was observed. For those students, growth was more pronounced and seemed to approach levels observed in previous years. Worryingly, there were also signs of stratification such that students in lower-achieving districts may be falling further behind. However, at the level of individual students, those who were struggling with reading prior to the pandemic were not disproportionately impacted in terms of ORF growth. This data offers an important window onto how a foundational skill is being affected by COVID-19 and this approach can be used in the future to examine how student abilities recover as education enters a post-COVID paradigm.