Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Standards, accountability, assessment, and curriculum

Aaron Phipps, Alexander Amaya.

Given the simultaneous rise in time-to-graduation and college GPA, it may be that students reduce their course load to improve their performance. Yet, evidence to date only shows increased course loads increase GPA. We provide a mathematical model showing many unobservable factors -- beyond student ability -- can generate a positive relationship between course load and GPA unless researchers control student schedules. West Point regularly implements the ideal experiment by randomly modifying student schedules with additional training courses. Using 19 years of administrative data, we provide the first causal evidence that taking more courses reduces GPA and increases course failure rates, sometimes substantially.

More →


Bobby W. Chung, Jian Zou.

States increasingly require prospective teachers to pass exams for program completion and initial licensure, including the recent controversial roll-out of the educative Teacher Performance Assessment (edTPA). We leverage the quasi-experimental setting of different adoption timing by states and analyze multiple data sources containing a national sample of prospective teachers and students of new teachers in the US. With extensive controls of concurrent policies, we find that the edTPA reduced prospective teachers in traditional route programs, less-selective and minority-concentrated universities. Contrary to the policy intention, we do not find evidence that edTPA increased student test scores.

More →


Jonathan E. Collins.

The George Floyd Protests of the Summer of 2020 initiated public conversations around the need for antiracist teaching. Yet, over time the discussion evolved into policy debates around the use of Critical Race Theory in civics courses. The rapid transition masked the fact that we know little about Americans' policy preferences. Do Americans support antiracist teaching? What factors best explain support/opposition? How does critical race theory factor in? Using a series of original survey experiments, this study shows that Americans maintain strong support for antiracist teaching, but that support is drastically weakened when curriculum features the term "critical race theory."

More →


Ann Mantil, John Papay, Preeya Pandya Mbekeani, Richard J. Murnane.

Preparing K-12 students for careers in science, technology, engineering and mathematics (STEM) fields is an ongoing challenge confronting state policymakers. We examine the implementation of a science graduation testing requirement for high-school students in Massachusetts, beginning with the graduating class of 2010. We find that the design of the new requirement was quite complicated, reflecting the state’s previous experiences with test-based accountability, a broad consensus on policy goals among key stakeholders, and the desire to afford flexibility to local schools and districts. The consequences for both students and schools, while largely consistent with the goals of increasing students’ skills and interest in STEM fields, were in many cases unexpected. We find large differences by demographic subgroup in the probabilities of passing the first science exam and of succeeding on retest, even when conditioning on previous test-score performance. Our results also show impacts of science exit-exam performance for students scoring near the passing threshold, particularly on the high-school graduation rates of females and on college outcomes for higher-income students. These findings demonstrate the importance of equity considerations in designing and evaluating ambitious new policy initiatives.

More →


Ishtiaque Fazlul, Cory Koedel, Eric Parsons.

Measures of student disadvantage—or risk—are critical components of equity-focused education policies. However, the risk measures used in contemporary policies have significant limitations, and despite continued advances in data infrastructure and analytic capacity, there has been little innovation in these measures for decades. We develop a new measure of student risk for use in education policies, which we call Predicted Academic Performance (PAP). PAP is a flexible, data-rich indicator that identifies students at risk of poor academic outcomes. It blends concepts from emerging “early warning” systems with principles of incentive design to balance the competing priorities of accurate risk measurement and suitability for policy use. PAP is more effective than common alternatives at identifying students who are at risk of poor academic outcomes and can be used to target resources toward these students—and students who belong to several other associated risk categories—more efficiently.

More →


Kate Antonovics, Sandra E. Black, Julie Berry Cullen, Akiva Yonah Meiselman.

Schools often track students to classes based on ability. Proponents of tracking argue it is a low-cost tool to improve learning since instruction is more effective when students are more homogeneous, while opponents argue it exacerbates initial differences in opportunities without strong evidence of efficacy. In fact, little is known about the pervasiveness or determinants of ability tracking in the US. To fill this gap, we use detailed administrative data from Texas to estimate the extent of tracking within schools for grades 4 through 8 over the years 2011-2019. We find substantial tracking; tracking within schools overwhelms any sorting by ability that takes place across schools. The most important determinant of tracking is heterogeneity in student ability, and schools operationalize tracking through the classification of students into categories such as gifted and disabled and curricular differentiation. When we examine how tracking changes in response to educational policies, we see that schools decrease tracking in response to accountability pressures. Finally, when we explore how exposure to tracking correlates with student mobility in the achievement distribution, we find positive effects on high-achieving students with no negative effects on low-achieving students, suggesting that tracking may increase inequality by raising the ceiling.

More →


Emily Morton, Paul Thompson, Megan Kuhfeld.

Four-day school weeks are becoming increasingly common in the United States, but their effect on students’ achievement is not well-understood. The small body of existing research suggests the four-day schedule has relatively small, negative average effects (~-0.02 to -0.09 SD) on annual, standardized state test scores in math and reading, but these studies include only a single state or are limited by using district-level data. We conduct the first multi-state, student-level analysis that estimates the effect of four-day school weeks on student achievement and a more proximal measure of within-year growth using NWEA MAP Growth assessment data. We conduct difference-in-differences analyses to estimate the effect of attending a four-day week school relative to attending a five-day week school. We estimate significant negative effects of the schedule on spring reading achievement (-0.07 SD) and fall-to-spring achievement gains in math and reading (-0.06 SD in both). The negative effects of the schedule are disproportionately larger in non-rural schools than rural schools and for female students, and they may grow over time. Policymakers and practitioners will need to weigh the policy’s demonstrated negative average effects on achievement in their decisions regarding how and if to implement a four-day week.

More →


John Papay, Ann Mantil, Richard J. Murnane.

Many states use high-school exit examinations to assess students’ career and college readiness in core subjects. We find meaningful consequences of barely passing the mathematics examination in Massachusetts, as opposed to just failing it. However, these impacts operate at different educational attainment margins for low-income and higher-income students. As in previous work, we find that barely passing increases the probability of graduating from high school for low-income (particularly urban low-income) students, but not for higher-income students. However, this pattern is reversed for 4-year college graduation. For higher-income students only, just passing the examination increases the probability of completing a 4-year college degree by 2.1 percentage points, a sizable effect given that only 13% of these students near the cutoff graduate.

More →


Walter Herring.

Because high-stakes testing for school accountability does not begin until third grade, accountability ratings for elementary schools do not directly measure students’ academic progress in grades K through 2. While it is possible that children’s test scores in grades 3 and above are highly correlated with children’s outcomes in the untested grades, research provides reasons to believe that this might not be the case in all schools. This study explores whether measures of school quality based on test scores in grades 3 through 5 serve as a strong proxy for children’s academic outcomes in grades K through 2. The results show that directly accounting for children’s test scores in the early grades could lead to meaningful changes in schools’ test-based performance ratings. The findings have important implications for accountability policy.

More →


Morgan S. Polikoff, Laura M. Desimone, Andrew C. Porter, Michael S. Garet, Amy Stornaiuolo, Katie Pak, Toni M. Smith, Mengli Song, Nelson Flores, Lynn S. Fuchs, Douglas Fuchs, T. Philip Nichols.

Standards have been at the heart of state and federal efforts to improve education for several decades. Most recently, standards-based reforms have evolved with a focus on more ambitious "college- and career-ready" (CCR) standards. This paper synthesizes the results of a seven-year national research center focused on the implementation and effects of CCR standards. The paper draws on evidence from a quasi-experimental longitudinal study using NAEP data, a cluster-randomized trial of an alignment feedback intervention, and detailed implementation data from state-representative surveys and case studies of five districts. Situating our work in a "policy attributes theory," we find important gaps in the theory of change underlying current standards-based reform efforts. We conclude that the CCR standards movement is not succeeding in achieving its desired outcomes. We make specific suggestions for improving instructional policy, including a) providing more specific instructional guidance, b) reconceptualizing professional learning, c) building buy-in through the involvement of trusted leaders, d) providing better supports for differentiation, and e) devoting attention and guidance to the intersection of content and pedagogy, and f) addressing persistent deficit thinking among educators. 

More →