Half of social-science studies fail replication test in years-long project

Hundreds of scientists have scrutinized thousands of research papers as part of a huge project exploring whether experiments and data analyses can be replicated.Credit: ridvan celik/Getty

A massive seven-year project exploring 3,900 social-science papers has ended with a disturbing finding: researchers could replicate the results of only half of the studies that they tested1.

The conclusions of the initiative, called the Systematizing Confidence in Open Research and Evidence (SCORE) project, have been “eagerly awaited by many”, says John Ioannidis, a metascientist at Stanford University in California who was not involved with the programme. The scale and breadth of the project is impressive, he says, but the results are “not surprising”, because they are in line with those from smaller, earlier studies.

‘Publish or perish’ culture blamed for reproducibility crisis

Researchers have been investigating a ‘crisis’ in the reliability of scientific results for more than a decade. They’ve found that many scientific experiments can’t be repeated — not just in the social sciences, but also in the biomedical field.

The SCORE findings — derived from the work of 865 researchers poring over papers published in 62 journals and spanning fields including economics, education, psychology and sociology — don’t necessarily mean that science is being done poorly, says Tim Errington, head of research at the Center for Open Science, an institute that co-ordinated part of the project. Of course, some results are not replicable because of either honest mistakes or the rare case of misconduct, he says, but SCORE found that, in many cases, papers simply did not provide enough data or details for experiments to be repeated accurately. Fresh methods or analyses can legitimately lead to distinct results. This means that, rather than take papers at face value, researchers should treat any single study as “a piece of the puzzle”, Errington says.

The project’s findings are reported today in three Nature papers1,2,3. The SCORE team tested whether previously published results held up by evaluating three characteristics: reproducibility, robustness and replicability. It was funded by the US Defense Advanced Research Projects Agency, with the aim of eventually building automated tools that assign confidence scores to social-science findings.

The SCORE project “gives hope that the wide scientific community is taking now very seriously these important matters”, Ioannidis says.

Test, retest, repeat

One test of a paper’s credibility is whether its results can be reproduced, meaning that the exact same analysis of the same data yields the same finding. When some of SCORE’s team members attempted to reproduce the data analyses of 600 papers, they found that only 145 contained enough details to do so. And of these, only 53% could be reproduced so that results matched precisely2. However, many of the failures might have been caused by the SCORE researchers needing to make guesses about procedures or to recreate raw data, Errington says. Sharing data more openly and being more transparent about what methodologies are used should help to solve this problem.

... continue reading