A Chinese artificial intelligence lab has done more than just build a cheaper AI model—it’s exposed the inefficiency of the entire industry’s approach.
DeepSeek’s breakthrough showed how a small team, in an effort to save money, was able to rethink how AI models are built. While tech giants like OpenAI and Anthropic spend several billions of dollars on compute power alone, DeepSeek purportedly achieved similar results for just over $5 million.
The company’s model matches or beats GPT-4o (OpenAI’s best LLM), OpenAI o1—OpenAI’s best reasoning model currently available—and Anthropic’s Claude 3.5 Sonnet on many benchmark tests, using roughly 2.788M H800 GPU hours for its full training. That’s a very small fraction of the hardware traditionally thought necessary.
The model is so good and efficient, it climbed to the top of Apple’s iOS productivity apps category in a matter of days, challenging OpenAI’s dominance.
Necessity is the mother of innovation. The team was able to achieve this using techniques that American developers didn’t need to even consider—and don’t even dominate today. Perhaps the most important one was that instead of using full precision for calculations, DeepSeek implemented 8-bit training, cutting memory requirements by 75%.
“They figured out floating-point 8-bit training, at least for some of the numerics,” Perplexity CEO Aravind Srinivas told CNBC. “To my knowledge, I think floating-point 8 training is not that well understood. Most of the training in America is still running in FP16.”
FP8 uses half the memory bandwidth and storage compared to FP16. For large AI models with billions of parameters, this reduction is substantial. DeepSeek needed to master this because its hardware was weaker, but OpenAI has never had this constraint.
DeepSeek also developed a “multi-token” system that processes entire phrases at once rather than individual words, making the system twice as fast while maintaining 90% accuracy.
Another technique it used was something called “distillation”—making a small model replicate the outputs of a larger one without having to train it on the same knowledge database. This made it possible to release smaller models that are extremely efficient, accurate, and competitive.
The firm also used a technique called “mixture of experts,” which added to the model’s efficiency. While traditional models keep all of their parameters active constantly, DeepSeek’s system uses 671 billion total parameters but only activates 37 billion at once. It’s like having a large team of specialists, but only calling in the experts needed for certain tasks.
“We use DeepSeek-R1 as the teacher model to generate 800K training samples, and fine-tune several small dense models. The results are promising: DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on math benchmarks with 28.9% on AIME and 83.9% on MATH,” DeepSeek wrote in its paper.
For context, 1.5 billion is such a small amount of parameters for a model that it’s not considered an LLM or large language model, but rather a SLM or small language model. SLMs require so little computation and vRAM that users can run them on weak machines like their smartphones.
The cost implications are staggering. Beyond the 95% reduction in training costs, Deepseek’s API charges just 10 cents per million tokens, compared to $4.40 for similar services. One developer reported processing 200,000 API requests for about 50 cents, with no rate limiting.
The “DeepSeek Effect” is already noticeable. “Let me say the quiet part out loud: AI model building is a money trap,” said investor Chamath Palihapitiya. And despite the punches thrown at DeepSeek, OpenAI CEO Sam Altman quickly pumped the brakes on his quest to squeeze users for money, after all the raves on social media about people achieving for free with DeepSeek what OpenAI charges $200 a month to do.
Meanwhile, the DeepSeek app has topped the download charts, and three out of the top six trending repositories on Github are related to DeepSeek.
Most of the AI stocks are down as investors question whether the hype is at bubble levels. Both AI hardware (Nvidia, AMD) and software stocks (Microsoft, Meta, and Google) are suffering the consequences of the apparent paradigm shift triggered by DeepSeek’s announcement, and the results shared by users and developers.
Even AI crypto tokens took a hit, with scads of DeepSeek AI token imposters popping up in an attempt to scam degens.
Aside from the financial wreckage, the takeaway from all this is that DeepSeek’s breakthrough suggests that AI development might not require massive data centers and specialized hardware. This could fundamentally alter the competitive landscape, transforming what many considered permanent advantages of major tech companies into temporary leads.
The timing is almost comical. Just days before DeepSeek’s announcement, President Trump, OpenAI’s Sam Altman, and Oracle’s founder unveiled Project Stargate—a $500 billion investment in U.S. AI infrastructure. Meanwhile, Mark Zuckerberg doubled down on Meta’s commitment to pour billions into AI development, and Microsoft’s $13 billion investment in OpenAI suddenly looks less like strategic genius and more like expensive FOMO fueled by a waste of resources .
“Whatever you did to not let them catch up didn’t even matter,” Srinivas told CNBC. “They ended up catching up anyway.”
Edited by Andrew Hayward
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.