In a recent study, artificial intelligence (AI)-generated exam submissions at the University of Reading in the UK went largely undetected and often received higher grades than those of real students. The findings, presented by Peter Scarfe and colleagues in the open-access journal PLOS ONE on June 26, highlight significant concerns about the use of AI tools like ChatGPT in academic settings.

As AI tools have become more advanced, there is growing apprehension about their potential misuse by students to submit AI-generated work. This issue has been exacerbated by the shift from supervised in-person exams to unsupervised take-home exams during the COVID-19 pandemic, a model many institutions continue to use. Current tools for detecting AI-generated text have proven largely ineffective.

To investigate, Scarfe and his team generated answers entirely written by the AI chatbot GPT-4 and submitted them on behalf of 33 fake students to the School of Psychology and Clinical Language Sciences at the University of Reading. The exam graders were unaware of the study's nature.

The results were striking: 94% of AI-generated submissions went undetected, and these submissions generally received higher grades than those of actual students. Specifically, 83.4% of the AI-generated answers earned higher grades compared to a randomly selected group of real student submissions.

These findings indicate that not only can students potentially evade detection by using AI to cheat, but they might also achieve better grades than their peers who do not use such methods. The researchers also suggest that real students may have successfully submitted AI-generated work during the study.

From an academic integrity standpoint, these results are alarming. The researchers propose a return to supervised, in-person exams as one solution but also acknowledge that as AI tools continue to advance and become integrated into professional environments, universities should consider how to adapt to this "new normal" to enhance education.

The authors conclude: “A rigorous blind test of a real-life university examinations system shows that exam submissions generated by artificial intelligence were virtually undetectable and robustly gained higher grades than real students. The results of the ‘Examinations Turing Test’ invite the global education sector to accept a new normal, and this is exactly what we are doing at the University of Reading. New policies and advice to our staff and students acknowledge both the risks and the opportunities afforded by tools that employ artificial intelligence.”

More: https://www.eurekalert.org/news-releases/1048877