Clustered observational studies (COSs) are a critical analytic tool for educational effectiveness research. We present a design framework for the development and critique of COSs. The framework is built on the counterfactual model for causal inference and promotes the concept of designing COSs that emulate the targeted randomized trial that would have been conducted were it feasible. We emphasize the key role of understanding the assignment mechanism to study design. We review methods for statistical adjustment and highlight a recently developed form of matching designed specifically for COSs. We review how regression models can be profitably combined with matching and note best practice for estimates of statistical uncertainty. Finally, we review how sensitivity analyses can determine whether conclusions are sensitive to bias from potential unobserved confounders. We demonstrate concepts with an evaluation of a summer school reading intervention in Wake County, North Carolina.
Many interventions in education occur in settings where treatments are applied to groups. For example, a reading intervention may be implemented for all students in some schools and withheld from students in other schools. When such treatments are non-randomly allocated, outcomes across the treated and control groups may differ due to the treatment or due to baseline differences between groups. When this is the case, researchers can use statistical adjustment to make treated and control groups similar in terms of observed characteristics. Recent work in statistics has developed matching methods designed for contexts where treatments are clustered. This form of matching, known as multilevel matching, may be well suited to many education applications where treatments are assigned to schools. In this article, we provide an extensive evaluation of multilevel matching and compare it to multilevel regression modeling. We evaluate multilevel matching methods in two ways. First, we use these matching methods to recover treatment effect estimates from three clustered randomized trials using a within-study comparison design. Second, we conduct a simulation study. We find evidence that generally favors an analytic approach to statistical adjustment that combines multilevel matching with regression adjustment. We conclude with an empirical application.