AI writing is improving, but it still can’t match human creativity

: Published: 28 December 2025; Created: 28 December 2025

With a few keystrokes, anyone can ask an artificial intelligence (AI) program such as ChatGPT to write them a term paper, a rap song, or a play. But don’t expect William Shakespeare’s originality. A new study finds such output remains derivative—at least for now.

To make the find, researchers designed their own program, one capable of measuring one aspect of the creativity of AI. Measuring creativity is “a hard and very interesting problem,” says Mirco Musolesi, a computer scientist at University College London who studies AI creativity but was not involved in the work. The new approach, he says, tackles at least one piece of creativity—linguistic novelty—“very well.”

Scientists have been skeptical of the power of programs such as ChatGPT since their origin. Though the generative AI and large language models (LLMs) that run this computer intelligence can instantaneously produce writing that looks very much like the stuff of humans, some researchers argue LLMs produce nothing new: They’re simply “stochastic parrots,” critics say, blindly remixing the words they were trained on.

But objectively testing this creativity has been tricky. Scientists have generally taken two tacks. One is to use another computer program to search for signs of plagiarism—though a lack of plagiarism does not necessarily equal creativity. The other approach is to have humans judge the AI output themselves, rating factors such as fluency and originality. But that’s subjective and time intensive.

So Ximing Lu, a computer scientist at the University of Washington, and colleagues created a program featuring both objectivity and a bit of nuance. Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn’t just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an “embedding,” which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap “anticipation” and “excitement” are considered matches.

After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI’s output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a “creativity index,” but creativity requires both novelty and quality—random gibberish is novel but not creative.)

The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review.

Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins’s 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer’s 2005 hit Twilight. (You can try the tool online.)

So, are LLMs merely parrots? Lu says they’re more like DJs. “They copy, paste, chop, and put together pieces from existing writing to make something amazing,” she says. “It’s like a DJ remixing existing music. This is definitely valuable, but it’s different from a composer.”

Next, the researchers should look at the novelty not just of short strings of words, but of overall narrative—a story’s structure—says Nanyun “Violet” Peng, a computer scientist at the University of California, Los Angeles. In her own work, Peng has manually judged AI’s narratives to be inferior; she would like to see such judgment automated, but it’s hard. Even if Lu’s tool stays at the linguistic level, Peng says it’s valuable. “It can be a writing assistant to encourage humans to stay away from cliché expression.”

More: https://www.science.org/content/article/ai-writing-improving-it-still-can-t-match-human-creativity

Popular articles

Comment of the week

AI writing is improving, but it still can’t match human creativity

What is the difference between a lead author and co-author?