A package of papers looking at the social and behavioural sciences shows the value of researchers collaborating to further the cause of reproducible, replicable and robust findings.
Studies of published papers in the social sciences suggest what factors could make research findings more likely to endure.Credit: Roni Bintang/Getty
The durability of research findings can be cast in terms of three Rs. Findings should be reproducible (the same type of analysis using the same data should produce the same result); replicable (redoing an experiment to collect fresh data should produce the same result); and robust (alternative analyses using the same data should draw the same conclusion).
Over the past two decades, studies in fields from psychology to medicine have highlighted that these criteria are often not met, leading to talk of a crisis in replication and reproducibility. Four papers1–4 published this week in Nature look at the reproducibility, replicability and robustness of research in the social and behavioural sciences. They provide a snapshot of the analysed fields, and suggest factors that could make research findings more likely to endure. Researchers, funders, journals and institutions should take note — for the betterment of all science.
Building trust in scientific evidence
Three of the papers1–3 are an outcome of nearly US$8 million in funding provided in 2019 by the US Defense Advanced Research Projects Agency to the Systematizing Confidence in Open Research and Evidence (SCORE) programme. The project is run by the Center for Open Science, a non-profit organization in Washington DC. More than 850 researchers contributed to hundreds of duplication efforts, establishing a database of reliability markers for 3,900 papers published between 2009 and 2018 (see go.nature.com/4campyc). The fourth paper4 is the result of a series of one-day ‘replication games’ workshops organized around the world since 2022 by the Institute for Replication, a virtual, non-profit network.
Some of the results are sobering. For example, Tyner et al.1 find that statistically significant effects could be replicated for only about half of the 164 papers they studied. Moreover, the replicated effect sizes were on average less than half of what was originally reported. This ‘decline effect’ has been reported before5, but it is unclear how much is due to authors’ cognitive biases, questionable research practices, the preference of journals for eye-catching results, flukes or true effects that are specific to a particular population and time.
Huge meta-research project puts claims in social-science papers to the test
This is a reminder to treat research results with a degree of scepticism, particularly if they are surprising. That applies to their robustness, too. Aczel et al.2 found that only 74% of statistically significant conclusions from a sample of 100 papers were found to still be significant when the same data were analysed in an alternative way. Brodeur et al.4 reached a comparable conclusion.
Some of the work analysed in these papers was done before concerns over research reliability became widespread and terms such as ‘P hacking’ (to describe the tweaking of analyses until they yield significant results) became commonly used. Awareness of transparency has only grown since then, as have mechanisms and norms for scientists to implement practices such as data sharing.
... continue reading