David Bau, a computer scientist at Northeastern University in Boston, Massachusetts, highlights the complexities of modern computer systems. After two decades as a software engineer, Bau notes that conventional software issues can usually be deciphered by insiders. However, the latest wave of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, presents a different challenge: their inner workings are largely inscrutable, even to their creators.

LLMs rely on machine learning to identify patterns in data without pre-set rules. Advanced models use neural networks, simulating brain architecture, and transforming information through layers of neurons. This process often turns these systems into 'black boxes' whose operations are mysterious.

Researchers are now focusing on explainable AI (XAI) to demystify these black boxes. Techniques like image highlighting and decision tree approximations help explain AI decisions, such as medical diagnoses or parole recommendations. Despite progress, XAI remains a developing field.

The challenge is particularly acute with LLMs, which power chatbots like ChatGPT. These models have hundreds of billions of parameters and can perform various tasks, from writing code to giving medical advice. However, they are known to generate misinformation, perpetuate stereotypes, and leak private information, necessitating robust XAI tools for safety and accuracy.

LLMs' complex behavior has earned them the nickname 'stochastic parrots' for their probabilistic text generation, sometimes displaying reasoning and other human-like abilities. Researchers at Anthropic and Harvard University have shown that LLMs can role-play and construct models of the world from their training data, indicating more than mere parroting.

Some researchers adopt psychological approaches, treating LLMs like human subjects to study their behavior. Techniques such as chain-of-thought prompting help LLMs explain their reasoning processes, though users are advised to approach chatbots with healthy skepticism.

Neuroscience-inspired methods also offer insights. Researchers use techniques like causal tracing to identify and edit specific neural activations within LLMs, allowing for precise adjustments to their knowledge and behavior. These methods not only enhance AI understanding but also attract interest from neuroscientists seeking parallels in biological brains.

Understanding and explaining LLMs is crucial for creating safer, more reliable AI systems. As these models become more integrated into everyday tasks, the need for transparency and trust in their operations grows ever more important.

More: https://www.nature.com/articles/d41586-024-01314-y