When Rene Aquarius began a review of potential stroke treatments last year, he was shocked. His initial search of the literature on drug candidates for treating hemorrhagic stroke found some 600 studies when he expected only 50 to 60—a first red flag. On further investigation, his team discovered that many of those papers were riddled with fishy data and questionable images, such as duplicated lanes within Western blots. His plan for a systematic review—considered the gold standard method to synthesize multiple studies on a topic and extract a broader conclusion—was in jeopardy.
Aquarius, a neuroscientist who specializes in reviewing preclinical animal research at Radboud University Medical Center, is one of a growing number of systematic review authors who have lost faith in the evidence base they depend on. His group put its project on hold to quantify the problem. “There is a real danger of systematic reviews losing the power they have.”
The junk papers are likely the products of paper mills—businesses that produce fake science to order. The size of the problem is not clear, but a manuscript posted to the Center for Open Science’s OSF preprint server in September suggests up to one in seven published papers are fabricated or falsified. Aquarius’s group plans to sum up the problems in papers it analyzed by the end of this year; the results are “grim,” he says. Just in the past 4 weeks he has flagged 130 suspect papers on the postpublication peer-review site Pubpeer, bringing his total over the past 10 months to more than 690—many related to the stroke project, as well as others.
Other researchers echo Aquarius’s experience. Andreas Voldstad, a psychiatry Ph.D. student at the University of Oxford, was working on a systematic review about mindfulness practices and relationship satisfaction when he noticed strange language in some studies he was considering including. The language appeared to come from automated translation software, which fraudsters can use to evade plagiarism detectors. As he looked closer, he found six of the 28 studies he planned to include in his analysis were suspect and described effect sizes some 10 to 20 times larger than those documented in the other papers. He also found missing information and dubious statistics.
“It caused a bit of stress because I am not sure how to talk or write about this,” Voldstad says. He eventually found Aquarius, who gave him some ideas. In the end, Voldstad included all the studies in his review while also documenting his concerns about some of the papers and conducting a separate analysis that excluded those studies to show readers how they affected the results. “It added some months of work and delay,” he says.
Medical researchers who draw up clinical guidelines by synthesizing clinical trial findings have been plagued by similar problems in recent years, and they have developed ways to cope. In 2023, Cochrane, an international network promoting evidence-based medicine, issued draft guidelines to help these researchers filter out junk science. The REAPPRAISED checklist, an effort by another group of research integrity specialists published in 2020, also helps researchers assess papers’ soundness.
But these tools, useful for medical reviews that often include just a handful of studies or in-depth assessments of specific papers, are not well suited to preclinical systematic reviews, says Torsten Rackoll of the Berlin Institute of Health Quest Center for Responsible Research. “They are all quite complicated,” says Rackoll, who is also part of Camarades, a global network of researchers working to improve the quality of preclinical evidence from animal studies. “They ask you to make sense checks or write to co-authors to ask if the manuscript is true.” That’s not feasible for reviews that might assess several hundred papers, he adds.
Rackoll is hoping to develop a tool for these larger reviews: a flow chart with a series of checkpoints that guide researchers as they consider each paper for possible inclusion. The results could suggest excluding the paper from the review, for example, or performing further quality checks or analysis on the data. But Camarades has no funding for the project, and it is expected to take years.
As well as grappling with how to ensure fake papers don’t compromise future reviews, researchers wonder how to cope with tainted reviews that have already been published. In January, stress researcher Otto Kalliokoski of the University of Copenhagen published a systematic review about the effectiveness of an experimental technique used to model depression in rats. In a subsequent analysis of almost 600 papers he had considered, he found that 19% of them had images with hallmarks of fakery, he reported in a preprint posted to bioRxiv in March. Some of these studies also reported larger than average effect sizes. “These papers can significantly skew the literature,” he says. “They are poisoning the well.”
His group has presented its findings to the integrity office of the journal where it published the original review and is awaiting next steps. “Should we have removed those studies? Or some of them? It’s really complicated,” Kalliokoski says.
The challenges raise unsettling questions for some researchers who conduct systematic reviews. Ananda Zeas-Sigüenza, a Ph.D. student at the Public University of Navarre who encountered problematic papers when conducting a systematic review about interventions to prevent loneliness, says the experience has left her questioning a career in academic research. “I really thought science was transparent,” she says. “I was ready to give it my best so I can really help people who feel lonely. Is it worth it? That’s what I’m thinking right now.”
