In 2022, Guillaume Cabanac noticed something unusual: a study had attracted more than 100 citations in a short span of less than two months of being published. The paper has since been retracted — for containing so-called tortured phrases, strange twists on established terms that were probably introduced by translation software or humans looking to circumvent plagiarism checkers.

But Cabanac noticed something weird: The study had been cited 107 times according to the ‘Altmetrics donut,’ an indicator of an article’s potential impact, yet it had been downloaded just 62 times. What’s more, according to Google Scholar, this paper had been cited only once.

After a little probing, Cabanac and his sleuthing colleagues figured out where the extra citations were coming from: the metadata files submitted to Crossref, a repository for unique identifiers for scholarly metadata, as the group report in a preprint posted to the arXiv server October 4. Google Scholar doesn’t use metadata files submitted to Crossref; instead it text-mines PDF versions of studies.

According to Cabanac, the references are sneaked in at some point into metadata files that are submitted to Crossref and automatically ingested. Since metadata files can be resubmitted as many times as one likes, updated metadata files can also be submitted anytime after an article is published.

The study analyzed the content of three journals published by Technoscience Academy, each of which have minted more than 1,000 digital object identifiers at Crossref. It found that around 9% of references included in metadata files of the papers published by these three journals — 5,978 references out of a total of 65,836 — benefitted just two researchers who had co-authored the studies being cited.