Bayes’ Rule and the Paradox of Pre-Registration of RCTs 11

By Donald P. Green (Political Science, Columbia)

Not long ago, I attended a talk at which the presenter described the results of a large, well-crafted experiment.  His results indicated that the average treatment effect was close to zero, with a small standard error.  Later in the talk, however, the speaker revealed that when he partitioned the data into subgroups (men and women), the findings became “more interesting.”  Evidently, the treatment interacts significantly with gender.  The treatment has positive effects on men and negative effects on women.

A bit skeptical, I raised my hand to ask whether this treatment-by-covariate interaction had been anticipated by a planning document prior to the launch of the experiment.  The author said that it had.  The reported interaction now seemed quite convincing.  Impressed both by the results and the prescient planning document, I exclaimed “Really?”  The author replied, “No, not really.”  The audience chuckled, and the speaker moved on.  The reported interaction again struck me as rather unconvincing.

Why did the credibility of this experimental finding hinge on pre-registration?  Let’s take a step back and use Bayes’ Rule to analyze the process by which prior beliefs were updated in light of new evidence.  In order to keep the algebra to a bare minimum, consider a stylized example that makes use of Bayes’ Rule in its simplest form.

Let’s start by supposing that the presenter was in fact following a planning document that spelled out the interaction effect in advance. My hypothesis (H) is that this interaction effect is substantively small (i.e., close to zero).  Before attending the talk, my prior belief was that there is a 50% chance that this hypothesis is true.  Formally, my prior may be expressed as Pr(H) = 0.5.  Next, I encounter evidence (E) that the presenter’s experiment revealed a statistically significant interaction.  Suppose there is a 5% probability of obtaining a statistically significant effect given that H is true, which is to say that Pr(E|H) = 0.05.  In order to apply Bayes’ Rule, we need one more quantity: the probability of observing a significant result given that H is false (denoted ~H). For a well-powered study such as this one, we may suppose that Pr(E|~H) = 1.  In other words, if there were truly a substantively large effect, this study will find it.

Plugging these inputs into Bayes’ Rule allows us to calculate the posterior probability, Pr(H|E), which indicates my degree of belief in H after seeing evidence of a statistically significant finding:

Screen Shot 2013-01-24 at 2.35.56 PM

Before seeing the experimental evidence, I thought there was a 0.50 probability of H; now, I accord H a probability of just 0.048. Having seen the presenter’s evidence of a statistically significant effect, my beliefs have changed considerably.

What if the presenter obtained this result by fishing for a statistically significant estimate? I don’t know whether the presenter fished, but I do know that fishing is possible because the analysis was not guided by a planning documentGiven the possibility of fishing, I re-evaluate the probability of observing a significant result even when there is a negligible effect.  Above, we supposed that Pr(E|H) = 0.05; now, let’s assume that there is a 75% chance of obtaining a significant result via fishing: Pr(E|H) = 0.75. In that case,

Screen Shot 2013-01-24 at 2.35.21 PM

Having seen the experimental evidence, I take the probability of H to be 0.429, which is not very different from my prior belief.  In other words, when my evaluation of the evidence takes fishing into account, my priors are less influenced by the presenter’s evidence.

The broader point is that the mere possibility of fishing can undercut the persuasiveness of experimental results.  When I’m confident that the researcher’s procedures are sound, Pr(E|H) is quite different from Pr(E|~H), and the experimental finding really tells me something.  When I suspect fishing, Pr(E|H) moves closer to 1, and the experimental findings become less persuasive.  (In an extreme case where Pr(E|H) = Pr(E|~H) = 1, the experimental findings would not change my priors about H at all.)

This application of Bayes’ Rule suggests that planned comparisons may substantially increase the credibility of experimental results.  The paradox is that journal reviewers and editors do not seem to accord much weight to planning documents.  On the contrary, they often ask for precisely the sort of post hoc subgroup analyses that creates uncertainty about fishing.

The bottom line: if we want to make the case for pre-registration, proponents of experimental research must point out that the nominal results of an experiment are robbed of their persuasive value if readers suspect that the findings were obtained through fishing.  In the short run, that means finding fault with existing practice – the lack of planning documents – so that we can improve the credibility of experimental results in the long run.

Don Green backyard


About the author:
Donald P. Green is Professor of Political Science at Columbia University. The author of four books and more than one hundred essays, Green’s research interests span a wide array of topics: voting behavior, partisanship, campaign finance, hate crime, and research methods. Much of his current work uses field experimentation to study the ways in which political campaigns mobilize and persuade voters. He recently co-authored Field Experiments: Design, Analysis, and Interpretation (W.W. Norton Press, 2012).

This post is one of a ten-part series in which we ask researchers and experts to discuss transparency in empirical social science research across disciplines. The next post in the series is “Monkey Business” by Macartan Humphreys. You can find the complete list of posts here.

About these ads

11 comments

  1. Reblogged this on Talk of Change and commented:
    Valuable thoughts on the value of saying what you will accomplish before you accomplish it. Takeaway: at the start of a project, publicizing your plan for success dramatically strengthens your credibility when claiming success at the end of the project.

  2. Pingback: Lies, Damn Lies, and Stuff Missing From the Pre-Analysis Plan | Scientific News

  3. Pingback: Bayes’ Rule and the Paradox of Pre-Registration of RCTs « Berkeley Initiative for Transparency in the Social Sciences

  4. Pingback: Big on BITSS | Eva Vivalt

  5. Pingback: Reinhart-Rogoff and the problem with economics research | Eva Vivalt

  6. Pingback: P-Fishing with Unlikely Hypotheses | Tom Pepinsky

  7. Pingback: Five short links « Pete Warden's blog

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s