Can an army of babies and dogs rescue psychology from its reproducibility crisis?

Baby Zoe sits on her mother’s lap and watches a puppet show featuring three shapes with googly eyes. A red circle struggles to climb a steep hill until a blue square helps it with a push. A yellow triangle blocks the way and shoves the red circle down the hill. When the show is over, Zoe is offered a choice of puppets. She doesn’t hesitate: she ignores the unkind yellow triangle and makes a grab for the helpful blue square.

The scene, from a Netflix documentary series released in 2020, recreates a highly cited 2007 study1, which found that babies as young as six months old overwhelmingly prefer characters who help, rather than hinder, others. On the basis of these findings, developmental psychologist Kiley Hamlin, now at the University of British Columbia in Vancouver, Canada, concluded that the ability to evaluate others’ behaviour develops before speech, and could be a biological adaptation.

Over the next decade or so, researchers performed dozens of versions of Hamlin’s experiment. But many of them failed to find the same preference for helpers — or suggested that other factors could explain the choice. Hamlin became frustrated at the resulting confused picture, so, in 2017, she assembled a collaboration of 37 research groups in 18 countries to repeat the experiment with more than 1,000 babies. That, she thought, should settle the matter once and for all.

Half of social-science studies fail replication test in years-long project

Hamlin wasn’t the only cognitive scientist looking for ways to validate findings in their field at this time. Throughout the 2010s, many researchers attempted, and often failed, to replicate seminal studies in psychology and beyond, leading to what became known as the reproducibility or replication crisis. “There was huge press coverage about how psychology was garbage,” says Michael Frank, a developmental psychologist at Stanford University in California.

Many scientists saw small sample sizes as a major cause of the crisis — these distorted results or produced conclusions that applied only to limited groups. One obvious solution was to go big: perhaps boosting the number of trial participants could help to add weight to the results.

So, psychologists began building large-scale, international projects, often involving hundreds of collaborators, to plan and run the same experiment and see whether they got the same answer. Hamlin and Frank have used this approach to test hypotheses about infant cognition. Other researchers are investigating different aspects of cognition in humans and a wide range of species — from dogs to fish to flamingoes.

The results of these projects have been trickling in over the past few years. And they have sometimes failed to support the hypotheses that they were designed to replicate. Many scientists think that the challenging logistics of such mammoth studies are worth it for the extra rigour that they provide. “Joining forces by combining groups of subjects across labs solves this problem by giving us the statistical power to test important research questions,” says Kelsey Lucca, a developmental psychologist at Arizona State University in Tempe and a co-lead on the collaborative study of social evaluation in babies, alongside Hamlin.

Big teamwork

Plenty of questions in science — in subjects from particle physics to cell atlases — require large teams. Although smaller ones are still the norm in a lot of fields, researchers in an increasing number of disciplines are coming to see the benefits of pooling data and expertise.

... continue reading