Chapter 23A. Established results that fade: system-wide bias in favor of ‘new and exciting’

Some important results appear in well established journals, are initially replicated by independent scientists, but then slowly get overturned. Their initial success seems to challenge the validity of the scientific method, but they may stem from little more than a combination of luck and a widespread bias in the global scientific enterprise.

An article in The New Yorker described several cases of well known and highly cited scientific results that, after multiple studies, appeared firmly established but then fell apart (The Truth Wears Off, 13 December 2010). Their initial support was strong statistically, so that their demise could not be explained by simple bad luck (randomness). Instead it seemed that something more sinister is at work. Lack of replication is common in medical research (J.P. Ioannidis, Why most published research findings are false. PLoS Med. 2005 Aug;2(8):e124; Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005 Jul 13;294(2):218-28.)

Bias in unexpected corners

It is not surprising that researchers with a financial interest in the outcome of a study find ways to guarantee the desired results. This deliberate bias is the subject of chapters 25 and 26. Yet perhaps all scientists who do financially inconsequential research (known as ‘basic’ research) are subject to equally powerful forces. So are the journals that publish scientific research. There is no dishonesty.

The reputation of a scientist and of a journal is strongly tied to the importance of the results they publish. We’ve all heard of Einstein for the theory of relativity and of Watson and Crick for the structure of DNA. There are uncountable degrees of fame and success below these levels. In this day of electronic media, the measure of a scientist and of a journal is tied to citations – the number of times a paper is referenced in the massive collection of scientific studies. Citations to a paper increase over time, obviously (they cannot go down), but important papers receive hundreds to thousands of citations; inconsequential papers receive only a handful. In turn, funding and salary for a scientist (and revenues for a journal) are tied to reputation.

There is an excess of scientific work done - far more studies done than there is space in ‘good’ journals. So the most famous journals can pick and choose what to publish. It is no surprise that they pick the most exciting studies to publish. Likewise, scientists are most eager to publish the most exciting work they do and they virtually always want to publish it in the best journals.

The rest all makes sense, and we can describe it in 3 phases:

Phase 1. With countless studies being conducted around the world, some – by chance – will hit upon an exciting result that isn’t robust - is repeatable only under very special circumstances, the details of which are not known. The result holds up statistically, perhaps by chance or perhaps because the result is real for the happenstance of the study. Because the result is exciting and appears sound, it gets published in a high-profile journal.

Phase 2. The excitement of the new result leads others to do similar studies. Maybe only a few of them replicate the finding. The new finding should die at this point, but the enterprise of science can work to sustain the initial result. Negative findings are hard to publish, and many researchers may assume they did something wrong when replicating the study and thus not attempt to publish the negative result. Even if a few negative studies get published, there will still be an over-representation of supporting studies, and the result will continue to appear supported. So even at this stage, there is a system-wide bias against overturning the result.

Phase 3. As more studies get done, negative findings begin to emerge in proportion to their outcomes. Furthermore, as the initial result gains momentum and becomes part of the foundation of knowledge, it becomes a target. Scientists can now make a name for themselves by overturning the result. The bias can now shift against the result. Additionally, newer studies may test the robustness of the original result by changing the conditions. If the original result was legitimate only for a narrow set of conditions, the community will lose interest in the result because it is hard to repeat.

An easy example

One can appreciate how the process works by considering a coin-flip ‘study.’ Suppose that a million people each flipped a coin 20 times to find out if they had special powers (or were exceptionally lucky). On average, two of those people would get 20/20 heads or 20/20 tails. No one getting a boring result such as 12/20 heads would think twice about it, but a person getting 20/20 the same might be inclined to draw attention to themselves -- the result would even be newsworthy if we did not know that a million people had done the test.

Phase 2 might involve a million more people being inspired to try their luck. And again, we would on average get 2 extremely lucky – and newsworthy - people. But in the end, others would come forward and expose the nonsense.

Multiple comparisons

The problem underlying this type of bias is that we – the scientific community – does not know the total number of studies done that look at the same phenomenon. If 1 out of 10 (or 1 out of 50) studies gets published, and only the exciting studies get published, then the literature is biased toward novelty. This problem is well appreciated in statistics, and in fact can happen in studies by a single scientists. If a scientist looks at 20 different statistical tests in his/her data, on average, one of those 20 tests will prove to be statistically significant at the magic 5% level. In other words, when we do many tests, we are assured of ‘unlikely’ results. We know how to correct for that bias within a study, but not across the scientific community.

Table of contents

Copyright 2011 Craig M. Pease & James J. Bull