Why China’s DeepSeek AI Is Blowing Everyone’s Minds—And Blowing Up the Market

A Chinese artificial intelligence lab has done more than just build a cheaper AI model—it’s exposed the inefficiency of the entire industry’s approach.

DeepSeek’s breakthrough showed how a small team, in an effort to save money, was able to rethink how AI models are built. While tech giants like OpenAI and Anthropic spend several billions of dollars on compute power alone, DeepSeek purportedly achieved similar results for just over $5 million.

The company’s model matches or beats GPT-4o (OpenAI’s best LLM), OpenAI o1—OpenAI’s best reasoning model currently available—and Anthropic’s Claude 3.5 Sonnet on many benchmark tests, using roughly 2.788M H800 GPU hours for its full training. That’s a very small fraction of the hardware traditionally thought necessary.

The model is so good and efficient, it climbed to the top of Apple’s iOS productivity apps category in a matter of days, challenging OpenAI’s dominance.

Whether DeepSeek’s numbers are legit or cooked doesn’t matter from a consumer’s standpoint. If their goal is to ignite a “race to the bottom” in pricing power and lure US users into shipping their data overseas by the boatload, they’ve already succeeded. DeepSeek’s app has… pic.twitter.com/6OsSt8fZFd
— Wolfe (@everytimeicash) January 27, 2025

Necessity is the mother of innovation. The team was able to achieve this using techniques that American developers didn’t need to even consider—and don’t even dominate today. Perhaps the most important one was that instead of using full precision for calculations, DeepSeek implemented 8-bit training, cutting memory requirements by 75%.

“They figured out floating-point 8-bit training, at least for some of the numerics,” Perplexity CEO Aravind Srinivas told CNBC. “To my knowledge, I think floating-point 8 training is not that well understood. Most of the training in America is still running in FP16.”

FP8 uses half the memory bandwidth and storage compared to FP16. For large AI models with billions of parameters, this reduction is substantial. DeepSeek needed to master this because its hardware was weaker, but OpenAI has never had this constraint.

DeepSeek also developed a “multi-token” system that processes entire phrases at once rather than individual words, making the system twice as fast while maintaining 90% accuracy.

Another technique it used was something called “distillation”—making a small model replicate the outputs of a larger one without having to train it on the same knowledge database. This made it possible to release smaller models that are extremely efficient, accurate, and competitive.

The firm also used a technique called “mixture of experts,” which added to the model’s efficiency. While traditional models keep all of their parameters active constantly, DeepSeek’s system uses 671 billion total parameters but only activates 37 billion at once. It’s like having a large team of specialists, but only calling in the experts needed for certain tasks.

“We use DeepSeek-R1 as the teacher model to generate 800K training samples, and fine-tune several small dense models. The results are promising: DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on math benchmarks with 28.9% on AIME and 83.9% on MATH,” DeepSeek wrote in its paper.

For context, 1.5 billion is such a small amount of parameters for a model that it’s not considered an LLM or large language model, but rather a SLM or small language model. SLMs require so little computation and vRAM that users can run them on weak machines like their smartphones.

Clearly you don’t understand OpenAI needs billions of dollars of hardware to answer the same question the DeepSeek can answer on your home computer without an internet connection.
It literally costs 97% less per query. pic.twitter.com/EBc0dHjaru
— Financelot (@FinanceLancelot) January 26, 2025

The cost implications are staggering. Beyond the 95% reduction in training costs, Deepseek’s API charges just 10 cents per million tokens, compared to $4.40 for similar services. One developer reported processing 200,000 API requests for about 50 cents, with no rate limiting.

The “DeepSeek Effect” is already noticeable. “Let me say the quiet part out loud: AI model building is a money trap,” said investor Chamath Palihapitiya. And despite the punches thrown at DeepSeek, OpenAI CEO Sam Altman quickly pumped the brakes on his quest to squeeze users for money, after all the raves on social media about people achieving for free with DeepSeek what OpenAI charges $200 a month to do.

ok we heard y’all.
*plus tier will get 100 o3-mini queries per DAY (!)
*we will bring operator to plus tier as soon as we can
*our next agent will launch with availability in the plus tier
enjoy 😊
— Sam Altman (@sama) January 25, 2025

Meanwhile, the DeepSeek app has topped the download charts, and three out of the top six trending repositories on Github are related to DeepSeek.

Most of the AI stocks are down as investors question whether the hype is at bubble levels. Both AI hardware (Nvidia, AMD) and software stocks (Microsoft, Meta, and Google) are suffering the consequences of the apparent paradigm shift triggered by DeepSeek’s announcement, and the results shared by users and developers.

Even AI crypto tokens took a hit, with scads of DeepSeek AI token imposters popping up in an attempt to scam degens.

Aside from the financial wreckage, the takeaway from all this is that DeepSeek’s breakthrough suggests that AI development might not require massive data centers and specialized hardware. This could fundamentally alter the competitive landscape, transforming what many considered permanent advantages of major tech companies into temporary leads.

While Anthropic and OpenAI were busy boasting and trying to hype things up to attract heavy investments, DeepSeek came out of nowhere and completely washed them off.pic.twitter.com/NkNpYYcGUg
— AshutoshShrivastava (@ai_for_success) January 27, 2025

The timing is almost comical. Just days before DeepSeek’s announcement, President Trump, OpenAI’s Sam Altman, and Oracle’s founder unveiled Project Stargate—a $500 billion investment in U.S. AI infrastructure. Meanwhile, Mark Zuckerberg doubled down on Meta’s commitment to pour billions into AI development, and Microsoft’s $13 billion investment in OpenAI suddenly looks less like strategic genius and more like expensive FOMO fueled by a waste of resources .

“Whatever you did to not let them catch up didn’t even matter,” Srinivas told CNBC. “They ended up catching up anyway.”

Edited by Andrew Hayward

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

Why China’s DeepSeek AI Is Blowing Everyone’s Minds—And Blowing Up the Market

Generally Intelligent Newsletter

Leave a Reply Cancel reply

Follow US

Popular News

CRO ETF? Canary Capital Files for US Cronos Fund as Altcoin Interest Intensifies

BTC Price will Hit $100K before Bitcoin Sweeps $30K Lows

Crypto Bahamas: Regulations Enter Critical Stage as Gov’t Shows Interest

Follow Us on Socials

We influence 20 million users and is the number one business blockchain and crypto news network on the planet.

Subscribe to our newsletter

Generally Intelligent Newsletter

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Follow US

Get Newest Articles Instantly!

Popular News

Subscribe to our newsletter