An unpublished analysis shared with Nature suggests that over the past two decades, more than 400,000 research articles have been published that show strong textual similarities to known studies produced by paper mills. Around 70,000 of these were published last year alone. The analysis estimates that 1.5–2% of all scientific papers published in 2022 closely resemble paper-mill works. Among biology and medicine papers, the rate rises to 3%.

Without individual investigations, it is impossible to know whether all of these papers are in fact products of paper mills. But the proportion – a few per cent – is a reasonable conservative estimate, says Adam Day, director of scholarly data-services company Clear Skies in London, who conducted the analysis using machine-learning software he developed called the Papermill Alarm.

In the past few years, publishers have stepped up their efforts to combat paper mills, says Joris Van Rossum, director of research integrity at STM who led development of the STM Integrity Hub, with a focus on tools (including Day’s software) to help detect fraudulent submitted manuscripts. They now have multiple ways to screen for them.

Whatever the scale of the problem, it seems clear that it has overwhelmed publishers’ systems. The world’s largest database of retractions, compiled by the website Retraction Watch, records fewer than 3,000 retractions related to paper-mill activity, out of a total of 44,000. That is an undercount, says the site’s co-founder Ivan Oransky, because database maintainers are still entering thousands of retractions, and some publishers avoid the term ‘paper mill’ in retraction notices.