When and how should text-generating artificial intelligence (AI) programs such as ChatGPT help write research papers? In the coming months, 4000 researchers from a variety of disciplines and countries will weigh in on guidelines that could be adopted widely across academic publishing, which has been grappling with chatbots and other AI issues for the past year and a half. The group behind the effort wants to replace the piecemeal landscape of current guidelines with a single set of standards that represents a consensus of the research community.
Known as CANGARU, the initiative is a partnership between researchers and publishers including Elsevier, Springer Nature, Wiley; representatives from journals eLife, Cell, and The BMJ; as well as industry body the Committee on Publication Ethics. The group hopes to release a final set of guidelines by August, which will be updated every year because of the “fast evolving nature of this technology,” says Giovanni Cacciamani, a urologist at the University of Southern California who leads CANGARU. The guidelines will include a list of ways authors should not use the large language models (LLMs) that power chatbots and how they should disclose other uses.
Since generative AI tools such as ChatGPT became public in late 2022, publishers and researchers have debated these issues. Some say the tools can help draft manuscripts if used responsibly—by authors who do not have English as their first language, for example. Others fear scientific fraudsters will use them to publish convincing but fake work quickly. LLMs’ propensity to make things up, combined with their relative fluency in writing and an overburdened peer-review system, “poses a grave threat to scientific research and publishing,” says Tanya De Villiers-Botha, a philosopher at Stellenbosch University.
Some journals, including Science and Nature, and other bodies have already released rules about how scientists can use generative AI tools in their work. (Science’s News department is editorially independent.) Those policies often state that AI tools cannot be authors because they cannot be accountable for the work. They also require authors to declare where the tools have been used.
But the level of guidance varies. In a December 2023 policy, the STM Association, a publishing trade body, spelled out allowed uses for generative AI and lists other areas on which journal editors should decide case by case. But last month’s announcement from the European Commission is less prescriptive, stating that researchers using these tools should do so transparently and remain responsible for their scientific output.
The variety of guidelines could be confusing for researchers. “Ideally there should be a big effort to bring all these rules together into one big set that everyone can follow,” says Jean-Christophe Bélisle-Pipon, a health ethicist at Simon Fraser University. “A standardized guideline is both necessary and urgent,” De Villiers-Botha adds.
Cacciamani is leading CANGARU in a systematic review of the relevant literature, which will inform the AI guidelines. A panel of researchers, clinicians, computer scientists, engineers, methodologists, and editors will then evaluate the guidelines.
But some researchers fear the initiative is not moving fast enough. “Already the world has changed significantly in the last 10 months,” says Daniel Hook, head of data analytics firm Digital Science. “The speed of progress of generative AI will only increase.”
The number of researchers using the tools in their writing appears to be soaring. Some cases of undisclosed, illicit ChatGPT use are obvious. It sometimes adds to text telltale phases such as “knowledge cutoff in September 2021.” “These are real smoking guns,” says University of Toulouse computer scientist Guillaume Cabanac, who compiled a list of more than 70 articles that bear the hallmarks of undeclared ChatGPT for the blog Retraction Watch.
Others have looked for subtler clues to LLM text. In a preprint posted on arXiv on 25 March, Andrew Gray, who works in bibliometric support at University College London, estimates that just over 1% of articles published in 2023, about 60,000 in total, contained a disproportionate occurrence of unusual words known to be correlated with LLM-generated text. Another analysis, posted on bioRxiv by Yingfeng Zheng at Sun Yat-sen University and colleagues on 26 March, investigated 45,000 preprints before and after ChatGPT became available and estimated 5% of the latter include AI-generated text.
Philip Shapira, who studies the governance of emerging technologies at the University of Manchester, says the numbers could be underestimates. “It is now easy to go online to find recommendations and tools to ‘weed out’ common ChatGPT-generated terms and phrases,” he says. And AI tools will likely improve their writing style in the future, making it more challenging to spot.
Once AI guidelines are drawn up, the next step will be to ensure that authors stick to them, says Sabine Kleinert, deputy editor of medical journal The Lancet, which is involved in CANGARU. This can be done by asking authors to declare the use of AI when they submit papers. Reining in AI use will also require “the expertise of editors … as well as robust peer review and additional research integrity and ethical polices,” Kleinert adds.
Ultimately, Bélisle-Pipon says, the success of the guidelines will also depend on institutions and granting agencies to incentivize adherence to policies—and to penalize researchers who don’t. “We all know that the crux of the matter is how funders and recruitment, tenure, and promotion committees evaluate researchers.”
