- Peter M. Steiner
Search EdWorkingPapers by author, title, or keywords.
Peter M. Steiner
Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no wide-spread agreement about which measures should be used. This paper addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities – that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies’ conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication dataset from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected.
Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.