As reliance on large language models (LLMs) like ChatGPT grows, researchers are delving into the nuances of how prompt variations affect model responses. Led by Abel Salinas from USC Information Sciences Institute (ISI) and Fred Morstatter from USC's Viterbi School of Engineering and ISI, a recent study sheds light on the significant influence of subtle prompt alterations on LLM predictions.
The study, posted to the preprint server arXiv, examined the effects of four categories of prompt variations across 11 benchmark text classification tasks commonly used in natural language processing (NLP) research. These tasks encompassed areas such as toxicity classification, grammar evaluation, humor detection, and mathematical proficiency.
Researchers explored variations including requesting responses in specific output formats, minor prompt perturbations like adding polite phrases, employing "jailbreaks" to bypass content filters, and offering monetary incentives for optimal responses.
Findings revealed that even minor changes in prompt structure and presentation could lead to substantial shifts in LLM predictions. While specific output formats induced a minimum 10% change in predictions, minor perturbations such as adding spaces or greetings also significantly impacted model behavior.
Interestingly, the study highlighted nuanced relationships between prompt design and model accuracy. Although certain strategies like offering incentives showed marginal improvements, no single formatting or perturbation method proved universally effective across all tasks.
Salinas emphasized the importance of maintaining high accuracy, noting that certain formats or variations could lead to performance declines. For instance, formatting in XML resulted in lower accuracy, highlighting the critical role of prompt design in optimizing model outcomes for specific applications.
Despite minimal observed changes in response accuracy with tipping incentives, the study underscored the potential pitfalls of seemingly innocuous jailbreaks, which could result in significant accuracy loss.
While the exact reasons behind these effects remain unclear, researchers theorize that confusing instances may play a role in driving prediction changes. Additionally, conversational elements inherent in prompt variations could influence model learning processes, shaping responses based on the sources from which models glean information.
Looking ahead, the research community aims to develop LLMs resilient to prompt variations, ensuring consistent responses across formatting changes and perturbations. Future endeavors will focus on unraveling the underlying mechanisms driving response changes, advancing our understanding of LLM behavior in diverse contexts.
More: https://techxplore.com/news/2024-04-words-youre-engaging-chatgpt.html
