Not All ‘Open Source’ AI Models Are Truly Open: Here’s a Ranking

: Published: 20 June 2024; Created: 20 June 2024

Despite being labeled as 'open source', many AI models from major tech companies like Meta and Microsoft are not fully disclosing critical details about their underlying technologies. This is the finding from researchers who evaluated several popular chatbot models, highlighting the ambiguity in the definition of open-source AI.

The Need for Clear Definitions

The concept of open source in AI is still under debate, but advocates argue that complete transparency is essential for advancing science and ensuring AI accountability. This distinction will gain importance with the implementation of the European Union’s Artificial Intelligence Act, which will impose less stringent regulations on models classified as open source.

Mark Dingemanse, a language scientist at Radboud University, and his colleague Andreas Liesenfeld, a computational linguist, found that some big firms benefit from claiming their models are open-source while disclosing minimal information—a practice termed "open-washing". Their study, published on June 5 in the proceedings of the 2024 ACM Conference on Fairness, Accountability and Transparency, ranks AI models based on their openness.

Ranking AI Models on Openness

Dingemanse and Liesenfeld assessed 40 large language models on 14 parameters, such as the availability of code, training data, and documentation. Their findings revealed that many models claiming to be open, like Meta's Llama and Google DeepMind's Gemma, are merely 'open weight'—accessible for use but not for inspection or customization.

This approach, where openness is judged on a sliding scale, is considered practical by experts like Amanda Brock, CEO of OpenUK. However, a significant concern is the lack of transparency about the data used to train these models. Approximately half of the models analyzed do not provide detailed information about their datasets.

Responses from Tech Giants

A Google spokesperson emphasized their precise language in describing models, noting that existing open-source concepts do not always apply to AI systems. Similarly, a Microsoft spokesperson highlighted their efforts to be clear about what is available and the importance of community involvement in AI advancement. Meta did not respond to Nature's request for comment.

Smaller Firms Lead in Openness

The study found that smaller firms and academic groups tend to be more transparent. The BLOOM model, developed by an international academic collaboration, is cited as a truly open-source AI example.

Decline in Peer Review

The researchers also noted a decline in peer-reviewed publications detailing these models, replaced by blog posts and corporate preprints lacking comprehensive details. This trend raises concerns about the rigor and transparency of AI research disclosures.

Implications for the Future

The EU’s AI Act will define open source in a way that could exempt models from extensive transparency requirements, a definition likely to be influenced by corporate lobbying. The ongoing debate about what constitutes open-source AI underscores the need for clear standards to ensure accountability and scientific integrity in AI development.

More: https://www.nature.com/articles/d41586-024-02012-5

Popular articles

Comment of the week

Not All ‘Open Source’ AI Models Are Truly Open: Here’s a Ranking

Senior NIH official who helped lead high-profile China and sexual harassment initiatives retires