If you build a data set and nobody can find it, is it useful? Not as much as it could be. With trust in science under siege from partisan actors and impartial pathogens, the accessibility and transparency of — and trust in — scientific information must be improved.
Have people stopped trusting science? The data tell a surprising story
Enter the FAIR Data Principles. In 2014, scientists realized that data management and stewardship could benefit from a set of shared guidelines, and dozens of international researchers gathered to draft new recommendations. The resulting principles — which established that data should be findable, accessible, interoperable and reusable (FAIR) — were published ten years ago1. The original publication has around 16,000 citations, and governments, funders and publishers around the world now ask that data be hosted and shared in FAIR-compliant ways.
A decade on, however, even the founders acknowledge that the FAIR principles are an imperfect tool. Barend Mons, a molecular biologist at Leiden University in the Netherlands who conceived the initiative, says that FAIR was always meant to be a set of general principles, “and so, by definition, cannot address the specifics of every application”. Fortunately, other researchers have taken the framework and extended it to cover the broader data ecosystem2, including the algorithms, tools and workflows that drive contemporary research.
Making every discipline FAIR
At its core, FAIR is meant to ensure that data are produced, analysed, stored and shared in ways that promote transparency and reproducibility. “The more the data are understandable by people other than the creators, the more we are able to determine not only the trustworthiness of the data set itself, but also its alleged creators,” says Mons.
The ideal data set should be properly documented, simple for both computers and people to find and use. It should also be easy to integrate with other data. To accomplish this, scientists must design workflows before data have been collected and create and maintain a detailed metadata file — an often overlooked component that contains contextual information about the data set, such as where and when it was created. The initiative also prioritizes data-management plans, including choosing appropriate licences and persistent identifiers — the unique labels ascribed to different resources — such that any information created during a project is findable and usable long after the research is over.
The complex truth about trust in science
“It’s a lot to think about, and I can see why it might seem really daunting for some scientists to consider,” says Amelia Jiménez-Sánchez, a data-integrity researcher at the University of Barcelona in Spain. But FAIR is like cooking, she says: once you have the right ingredients — or familiarize yourself with FAIR practices — it becomes easier to make a meal. “Eventually, it just becomes a part of how you do your work.”
Users have tailored those practices to their disciplines. Carnegie Mellon University in Pittsburgh, Pennsylvania, has released FAIR guides for chemistry, mathematics, neuroscience and psychology. Other initiatives have focused on astronomy, materials science, genetics and single-cell genomics data. For fields without dedicated FAIR resources, researchers in the Netherlands have published ‘ten simple rules’ for kick-starting conversations about FAIR practices3.
... continue reading