Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

K-12 Education

Joshua B. Gilbert, James S. Kim, Luke W. Miratrix.

Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing Heterogeneous Treatment Effects (HTE) fail to address the HTE that may exist within outcome measures. In this study, we present a novel application of the Explanatory Item Response Model (EIRM) for assessing what we term “item-level” HTE (IL-HTE), in which a unique treatment effect is estimated for each item in an assessment. Results from data simulation reveal that when IL-HTE are present but ignored in the model, standard errors can be underestimated and false positive rates can increase. We then apply the EIRM to assess the impact of a literacy intervention focused on promoting transfer in reading comprehension on a digital formative assessment delivered online to approximately 8,000 third-grade students. We demonstrate that allowing for IL-HTE can reveal treatment effects at the item-level masked by a null average treatment effect, and the EIRM can thus provide fine-grained information for researchers and policymakers on the potentially heterogeneous causal effects of educational interventions.

More →


Dan Goldhaber, Zeyu Jin, Richard Startz.

We present new estimates of the importance of teachers in early grades for later grade outcomes, but unlike the existing literature that examines teacher “fade-out,” we directly compare the contribution of early-grade teachers to later year outcomes against the contributions of later year teachers to the same later year outcomes. Where the prior literature finds that much of the contribution of early teachers fades away, we find that the contributions of early-year teachers remain important in later grades. The difference in contributions to eighth-grade outcomes between an effective and ineffective fourth-grade teacher is about half the difference among eighth-grade teachers. The effect on eighth-grade outcomes of replacing a fourth-grade teacher who is below the 5th percentile with a median teacher is about half the underrepresented minority (URM)/non-URM achievement gap. Our results reinforce earlier conclusions in the literature that teachers in all grades are important for student achievement.

More →


Dan Goldhaber, John Krieg, Stephanie Liddle, Roddy Theobald.

Prior work on teacher candidates in Washington State has shown that about two thirds of individuals who trained to become teachers between 2005 and 2015 and received a teaching credential did not enter the state’s public teaching workforce immediately after graduation, while about one third never entered a public teaching job in the state at all. In this analysis, we link data on these teacher candidates to unemployment insurance data in the state to provide a descriptive portrait of the future earnings and wages of these individuals inside and outside of public schools. Candidates who initially became public school teachers earned considerably more, on average, than candidates who were initially employed either in other education positions or in other sectors of the state’s workforce. These differences persisted at least 10 years into the average career and across transitions into and out of teaching. There is therefore little evidence that teacher candidates who did not become teachers were lured into other professions by higher compensation. Instead, the patterns are consistent with demand-side constraints on teacher hiring during this time period that resulted in individuals who wanted to become teachers taking positions that offered lower wages but could lead to future teaching positions.

More →


Stephen B. Holt, Katie Vinopal, Heasun Choi, Lucy C. Sorensen.
While a growing body of literature has documented the negative impacts of exclusionary punishments, such as suspensions, on academic outcomes, less is known about how teachers vary in disciplinary behaviors and the attendant impacts on students. We use administrative data from North Carolina elementary schools to examine the extent to which teachers vary in their use of referrals and investigate the impact of more punitive teachers on student attendance and achievement. We also estimate the effect of teachers' racial bias in the use of referrals on student outcomes. We find more punitive teachers increase student absenteeism and reduce student achievement. Moreover, more punitive teachers negatively affect the achievement of students who do not receive disciplinary sanctions from the teacher. Similarly, while teachers with racial bias in the use of referrals do not negatively affect academic outcomes for White students, they significantly increase absenteeism and reduce achievement for Black students. We find the negative effects of both more punitive and more biased teachers persist into middle school and beyond. The results suggest punitive disciplinary measures do not aid teachers in productively managing classrooms; rather, teachers taking more punitive stances may undermine student engagement and learning in both the short- and long- run. Furthermore, bias in teachers' referral usage contributes to inequities in student outcomes.

More →


Zeyu Xu, Ben Backes.

In this descriptive study, we use longitudinal student-level administrative records from 4 cohorts of high school graduates in Kentucky to examine the extent to which students persist and attain post-secondary credentials in the CTE fields of concentration they choose in high school. To our knowledge, this is the first paper to use student-level administrative data to examine how different fields of concentration in high school CTE are related to future postsecondary outcomes. We find that concentrating in a particular CTE field in high school is associated with both continuing on with that same field in college and obtaining a postsecondary credential in that field; this relationship is especially strong in health fields and especially for women in health. The secondary-postsecondary connection is the weakest among students concentrating in occupational fields in high school, who are also the most disadvantaged socioeconomically and academically before high school. Despite the existence of secondary-postsecondary pipelines of career interests, most students enroll and obtain credentials in fields that are different from the field of concentration in high school. In addition, relative to students with similar pre-high-school achievement as measured by grades and test scores, we find that CTE concentration in high school is strongly associated with being more likely to enroll in a two-year college and less likely to enroll in a four-year college.

More →


Peter M. Steiner, Patrick Sheehan, Vivian C. Wong.

Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no wide-spread agreement about which measures should be used. This paper addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities – that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies’ conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication dataset from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected.

More →


Mark J. Chin.

In this paper I study how school desegregation by race following Brown v. Board of Education affected White individuals’ racial attitudes and politics in adulthood. I use geocoded nationwide data from the General Social Survey and differences-in-differences to identify causal impacts. Integration significantly reduced White individuals’ political conservatism as adults in the U.S. South but not elsewhere. I observe similar geographic impact heterogeneity for individuals’ attitudes towards Blacks and policies promoting racial equity, but positive effects emerge less consistently across specifications. Results suggest that this heterogeneity may depend on the effectiveness of integration policies. In the south, Black-White exposure was greater following desegregation, and White disenrollment was lower. My study provides the first causal evidence on how different theories concerning intergroup contact and racial attitudes (i.e., the contact and racial threat hypotheses) may have applied to school contexts following historic court mandates to desegregate.

More →


Kathleen Lynch, Lily An, Zid Mancenido.
We present results from a meta-analysis of 37 contemporary experimental and quasi-experimental studies of summer programs in mathematics for children in Grades pre-K-12, examining what resources and characteristics predict stronger student achievement. Children who participated in summer programs that included mathematics activities experienced significantly better mathematics achievement outcomes, compared to their control group counterparts. We find an average weighted impact estimate of +0.10 standard deviations on mathematics achievement outcomes. We find similar effects for programs conducted in higher- and lower-poverty settings. We undertook a secondary analysis exploring the effect of summer programs on non-cognitive outcomes and found positive mean impacts. The results indicate that summer programs are a promising tool to strengthen children’s mathematical proficiency outside of school time.

More →


Sarah R. Cohodes, Helen Ho, Silvia C. Robles.

The federal government and many individual organizations have invested in programs to support diversity in the STEM pipeline, including STEM summer programs for high school students, but there is little rigorous evidence of their efficacy. We fielded a randomized controlled trial to study a suite of such programs targeted to underrepresented high school students at an elite, technical institution. The STEM summer programs differ in their length (one week, six weeks, or six months) and modality (on-site or online). Students offered seats in the STEM summer programs are more likely to enroll in, persist through, and graduate from college, with gains in institutional quality coming from both the host institution and other elite universities. The programs also increase the likelihood that students graduate with a degree in a STEM field, with the most intensive program increasing four-year graduation with a STEM degree attainment by 33 percent. The shift to STEM degrees increases potential earnings by 2 to 6 percent. Program-induced gains in college quality fully account for the gains in graduation, but gains in STEM degree attainment are larger than predicted based on institutional differences.

More →


Jackie Eunjung Relyea, James S. Kim, Patrick Rich.

The current study replicated and extended the previous findings of content-integrated literacy intervention focusing on its effectiveness on first- and second-grade English learners’ (N = 1,314) reading comprehension, writing, vocabulary knowledge, and oral proficiency. Statistically significant findings were replicated on science and social studies vocabulary knowledge (ES = .51 and .53, respectively) and argumentative writing (ES = .27 and .41, respectively). Furthermore, treatment group outperformed control group on reading (ES = .08) and listening comprehension (ES = .14). Vocabulary knowledge and oral proficiency mediated treatment effects on reading comprehension, whereas only oral proficiency mediated effects on writing. Findings replicate main effects on vocabulary knowledge and writing, while also extending previous research by highlighting mechanisms underlying improved reading comprehension and writing.

More →