For one small Chinese startup, the U.S. ban on sales of the most advanced artificial intelligence (AI) computer chips to Chinese entities was a spur to innovation. DeepSeek, launched in May 2023 by a former AI student–turned–hedge fund manager, says it has found a way to match the performance of its U.S. rivals in AI using second-tier graphics processing units—and at a fraction of the cost.
DeepSeek had already garnered attention with a series of ambitious and highly efficient large language models (LLMs) similar to but less powerful than OpenAI’s ChatGPT. Unlike ChatGPT and most of its Western rivals, DeepSeek’s LLMs are open source, which means users can view and modify the source code to improve or customize it. Now, DeepSeek says it has taken a major leap forward with its latest model, V3. It “outperforms other open-source models and achieves performance comparable to leading closed-source models,” the company said in a 27 December 2024 technical report.
AI observers take the claim seriously. DeepSeek has “closed the gap with some of the world’s best [LLMs],” even outperforming OpenAI’s latest model, GPT-4o, on some benchmarks, says political scientist Jeffrey Ding of George Washington University, who studies emerging technologies. If DeepSeek V3 passes further independent checks, “it will be a highly impressive display of research and engineering under resource constraints,” computer scientist Andrej Karpathy, who co-founded and formerly worked at OpenAI, wrote on X.
China’s lack of ready access to advanced AI chips “is compelling Chinese AI scientists to innovate within the constraints of their available hardware resources,” says Ray Wang, a Washington, D.C.–based analyst specializing in U.S.-China economic competition. DeepSeek says part of its approach involved improving what is called a Mixture of Experts architecture. It reduces the computing power needed to train the model and produces more efficient responses to queries. Only a subset of the expert networks within the model is trained for a task. Then, a gating network sends queries to the expert networks best suited to answer.
“DeepSeek demonstrates how ingenuity can effectively mitigate the constraints posed by limited access to advanced hardware,” says Marina Zhang, an expert on innovation in China at the University of Technology Sydney.
The increased efficiency saves money, DeepSeek says. It estimates it spent just $5.6 million to train V3—far less than the estimated $78 million it cost OpenAI to train ChatGPT-4o. And Ding says users can run the model “for much lower costs than other models that offer similar performance.” The company says V3’s data analysis, pattern recognition, and predictive modeling capabilities could help predict climate impacts, identify disease biomarkers, and test cosmological theories, among other scientific uses.
Unlike most of its major rivals, DeepSeek is not backed by one of China’s high-tech giants, which are pursuing multiple technologies. And “the company’s primary focus is on innovation and the development of high-performing Chinese LLMs,” Wang says.
DeepSeek did not respond to an email from Science. But last year DeepSeek founder and CEO Liang Wenfeng told AnYong Waves, a Chinese media outlet, that “Research and technological innovation,” not business opportunities, is the company’s priority. Its ultimate goal, he added, is to achieve artificial general intelligence—AI’s holy grail—in which models match human cognitive capabilities. That lofty goal has helped the firm attract ambitious researchers, he said. “The biggest draw for top talent is definitely to solve the world’s toughest challenges.”
Liang studied AI at Zhejiang University. In 2015 he helped set up a hedge fund, High-Flyer, that relies on AI-driven strategies and reportedly now manages investments worth $8 billion. High-Flyer launched DeepSeek to focus on LLMs. Liang is reportedly a hands-on executive, co-authoring many of DeepSeek’s scientific papers.
Although DeepSeek has made major progress, observers see challenges ahead. Its open-source approach means “competitors can improve upon DeepSeek’s methods,” Ding says. And the company “will absolutely continue to struggle in the future without additional access to ever greater amounts of AI chips,” says Gregory Allen, an AI policy expert at the Center for Strategic and International Studies. Zhang says the Chinese firms will have to “continually push the boundaries of software and systems innovation to stay in the game.”
As DeepSeek and other Chinese firms scramble to match Western LLMs, they have an advantage in having the China market to themselves. ChatGPT and other models are blocked by China’s Great Firewall because their output is not censored (though many in China use virtual private networks to access them). DeepSeek V3 seems to acknowledge political sensitivities. Asked “What is Tiananmen Square famous for?” it responds: “Sorry, that’s beyond my current scope.”
Not all sensitive questions are off-limits, however. Asked about the origin of the COVID-19 pandemic, DeepSeek V3 gives a neutral, factual answer that mentions the theory of a leak from the Wuhan Institute of Virology, though it concludes most scientists “lean toward a natural zoonotic origin.”
More: https://www.science.org/content/article/chinese-firm-s-faster-cheaper-ai-language-model-makes-splash
