Retired engineer Gerald Piosenka created the dataset in 2019 by downloading photos of children from “websites devoted to the subject of autism,” according to a description of the dataset’s methods, and uploaded it to Kaggle, a site owned by Google that hosts public datasets for machine-learning practitioners.

Without identifying each child in the dataset, there is no way to confirm that any of them do or do not have autism, says Dorothy Bishop, emeritus professor of developmental neuropsychology at the University of Oxford. The dataset first came to Springer Nature’s attention last month through separate investigations into two papers.

The Springer Nature's research integrity team was about to start investigating one “article of concern” when Guillaume Cabanac, professor of computer science at the University of Toulouse, alerted the team to the other one, which contained tortured phrases—strange phrases used in place of established ones, a possible sign that the text was generated by artificial intelligence. Both papers used the photo dataset that Piosenka assembled.

More: https://www.thetransmitter.org/retraction/exclusive-springer-nature-retracts-removes-nearly-40-publications-that-trained-neural-networks-on-bonkers-dataset/?fbclid=IwY2xjawOnTdJleHRuA2FlbQIxMQBzcnRjBmFwcF9pZBAyMjIwMzkxNzg4MjAwODkyAAEe0V5FUNXiEeaU0QBiMmcQ-MaBYRUYMXnvDHW6ZTfvSHfITNRHyvrUtoe3HY8_aem_LZr676CDtGzT79o_U02YEA