Is Science Broken? False Positives and the Need for Repeatability

By Peter Lebrocquy Cox Last updated Jul 9, 2018

Science is the most transformative of endeavours which has propelled humans forward for more than 200 years. It heals, prolongs, preserves and kills. So, it is important that we get it as right as possible.

Today there is a tendency to view science as a separate entity, expressed in statements which start with “Science Says” or “Science Proves”. But science does not exist in a vacuum, it is a tool highly interconnected with all that we do, observe and learn and the way we use it is formative. Misuse of the scientific method can lead to misconceptions and outright falsehoods.

In 2011 there was a study published in the Journal of Personality and Social Psychology called “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect” which was carried out by Dr Bem from Cornell University. In plain language this is a study on Extrasensory Perception (ESP), as in predicting the future. Surprisingly this study found evidence for ESP. The results were counterintuitive – you cannot “feel” and thereby predict future events. However, no fault could be found with the study, it had been exceedingly well run. This raised two possibilities. Either ESP is real, or there is something wrong with the way we practice science.

It was this study which drove Brian Nosek to grapple with apparent issues in how science is practiced. He created a collaborative project between 270 researchers, to reproduce 100 Psychological studies that had been published in prominent journals. In 2015 the results were in, and they were not good.

To put this into context, for a study to be considered significant it needs to pass a statistical test called a P test. If the P-value, resulting from this test, is less than 0.05 this means that there is less than a 5% chance of the results arising due to random chance. Essentially this means that the likelihood of the result being due to the reasons prescribed by the researchers is over 95%. This is a high level of probability and means that, if the studies were all carried out correctly, they should have been at least 95% repeatable.

However, the results showed that of the 100 studies only 39 of them could be repeated. And this problem persists throughout many scientific disciplines. In a study run by Dr Tim Errington, out of five landmark cancer studies, only 2 were repeatable. Amgen, a pharmaceutical company, attempted to reproduce 53 landmark cancer studies but succeeded in reproducing only six. In addition, a survey in Nature showed that over 70% of scientists have tried and failed to reproduce another scientists’ research.

So, why does this happen? Repeatability is what makes science, science. If something cannot be repeated it cannot be assumed to be true. There are a number of reasons for this. The first is that Journals publish positive results far more often than negative results. In fact, positive results account for about seventy to ninety percent of the published studies. This leads to something called the File Drawer Effect, where researchers don’t even attempt to publish negative results. This is not just a huge waste of time, it is dangerous as it has been shown that positive results are statistically far less likely to be accurate than negative results. This tendency comes from an implicit bias within all of us that searches for the positive – the new knowledge that changes things not the knowledge that reiterates what is already known. But the truth is that negative results are just as important as positive ones.

This is made abundantly clear in the case of, the memorably named, TGN1412. In this instance six volunteers were given the drug, TGN1412, within a day the volunteers were all in a serious condition as their bodies began to collapse with breathing, circulation and severe fevers. In time their extremities began to rot and they all needed to be put on dialysis, they survived but not without lasting damage. During the inquest that followed it was discovered that research had been carried out some ten years earlier that would have warned the doctors of the risks, but it had never been published as it was a negative result. This was the consequence of the File Drawer Effect.

The second issue is that journals will be biased to publishing exceptional discoveries. So, researchers will tend to ask questions that are more likely to result in them. The problem is that these exceptional discoveries are considerably more likely to be false, think back to Dr Bem’s study on ESP. The tendency to search for unlikely results and then cherry picking those that show positive results compounds the likelihood that positive results are less reliable than negative ones.

Open Journals are an attempt to try and separate research from the big business of journals as online libraries like PLoS ONE insist only on the research being methodically sound to be published. This is not just a response to improving the quality of the research but also to improve the access to it as the business practices of many paid journals have meant that it is almost impossible to access the original research without paying huge sums of money.

Is Science broken? No. Science is a tool, it allows us to drop our biases at the threshold and enter into a room of truth and understanding. But, as with all tools, it is how you use it that determines the results and how those results are communicated then influence the use. It is important to understand our own part in this cycle. Because, these issues would be less serious if we as consumers of scientific information were a little more discerning. Just remember that a single study means very little in most circumstances, certainly until it has been repeated and if a study gels with your world view be even more critical. After all we could all do with leaving more of our biases at the threshold.