Chapter 23A. Established results that fade:
system-wide bias in favor of Ônew and excitingÕ
Some important results
appear in well established journals, are initially replicated by independent
scientists, but then slowly get overturned. Their initial success seems to
challenge the validity of the scientific method, but they may stem from little
more than a combination of luck and a widespread bias in the global scientific
enterprise.
An
article in The New Yorker described several cases of well known
and highly cited scientific results that, after multiple studies, appeared
firmly established but then fell apart (The Truth Wears Off, 13 December
2010). Their initial support was strong statistically, so that their demise
could not be explained by simple bad luck (randomness). Instead it seemed that
something more sinister is at work. Lack of replication is common in medical
research (J.P. Ioannidis, Why most published research findings are false. PLoS
Med. 2005 Aug;2(8):e124; Contradicted and initially stronger effects in highly
cited clinical research. JAMA. 2005 Jul 13;294(2):218-28.)
Bias in unexpected corners
It is
not surprising that researchers with a financial interest in the outcome of a
study find ways to guarantee the desired results. This deliberate bias is the
subject of chapters 25 and 26. Yet perhaps all scientists who do financially
inconsequential research (known as ÔbasicÕ research) are subject to equally
powerful forces. So are the journals that publish scientific research. There
is no dishonesty.
The
reputation of a scientist and of a journal is strongly tied to the importance
of the results they publish. WeÕve all heard of Einstein for the theory of
relativity and of Watson and Crick for the structure of DNA. There are
uncountable degrees of fame and success below these levels. In this day of
electronic media, the measure of a scientist and of a journal is tied to
citations Ð the number of times a paper is referenced in the massive collection
of scientific studies. Citations to a paper increase over time, obviously
(they cannot go down), but important papers receive hundreds to thousands of
citations; inconsequential papers receive only a handful. In turn, funding and
salary for a scientist (and revenues for a journal) are tied to reputation.
There
is an excess of scientific work done - far more studies done than there is
space in ÔgoodÕ journals. So the most famous journals can pick and choose what
to publish. It is no surprise that they pick the most exciting studies to
publish. Likewise, scientists are most eager to publish the most exciting work
they do and they virtually always want to publish it in the best journals.
The
rest all makes sense, and we can describe it in 3 phases:
Phase
1. With countless studies being conducted around the world, some Ð by chance Ð
will hit upon an exciting result that isnÕt robust - is repeatable only under
very special circumstances, the details of which are not known. The result
holds up statistically, perhaps by chance or perhaps because the result is real
for the happenstance of the study. Because the result is exciting and appears
sound, it gets published in a high-profile journal.
Phase
2. The excitement of the new result leads others to do similar studies. Maybe
only a few of them replicate the finding. The new finding should die at this
point, but the enterprise of science can work to sustain the initial result.
Negative findings are hard to publish, and many researchers may assume they did
something wrong when replicating the study and thus not attempt to publish the
negative result. Even if a few negative studies get published, there will
still be an over-representation of supporting studies, and the result will
continue to appear supported. So even at this stage, there is a system-wide
bias against overturning the result.
Phase
3. As more studies get done, negative findings begin to emerge in proportion
to their outcomes. Furthermore, as the initial result gains momentum and
becomes part of the foundation of knowledge, it becomes a target. Scientists
can now make a name for themselves by overturning the result. The bias can now
shift against the result. Additionally, newer studies may test the robustness
of the original result by changing the conditions. If the original result was
legitimate only for a narrow set of conditions, the community will lose
interest in the result because it is hard to repeat.
An easy example
One can
appreciate how the process works by considering a coin-flip Ôstudy.Õ Suppose
that a million people each flipped a coin 20 times to find out if they had
special powers (or were exceptionally lucky). On average, two of those people
would get 20/20 heads or 20/20 tails. No one getting a boring result such as
12/20 heads would think twice about it, but a person getting 20/20 the same
might be inclined to draw attention to themselves -- the result would even be
newsworthy if we did not know that a million people had done the test.
Phase 2
might involve a million more people being inspired to try their luck. And
again, we would on average get 2 extremely lucky Ð and newsworthy - people.
But in the end, others would come forward and expose the nonsense.
Multiple comparisons
The
problem underlying this type of bias is that we Ð the scientific community Ð
does not know the total number of studies done that look at the same
phenomenon. If 1 out of 10 (or 1 out of 50) studies gets published, and only
the exciting studies get published, then the literature is biased toward
novelty. This problem is well appreciated in statistics, and in fact can
happen in studies by a single scientists. If a scientist looks at 20 different
statistical tests in his/her data, on average, one of those 20 tests will prove
to be statistically significant at the magic 5% level. In other words, when we
do many tests, we are assured of ÔunlikelyÕ results. We know how to correct
for that bias within a study, but not across the scientific community.
Table of contents
Copyright 2011 Craig M.
Pease & James J. Bull