DeepSeek launched a free, open-source large language model in late December, claiming it was developed in just two months at a cost of under $6 million.
While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500
Maybe? Depends on what costs dominate operations. I imagine Chinese electricity is cheap but building new data centres is likely much cheaper % wise than countries like the US.
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
Even if they are lying and used more compute, it’s obvious they managed to train it without access to the large amounts of the highest end chips due to export controls.
Conservatively, I think NVidia is definitely going to have to scale down by 50% and they will have to reduce prices by a lot, too, since VC and government billions will no longer be available to their customers.
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
It might also lead to 100x more power to train new models.
SFT (supervised fine-tuning), a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model.
This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets.
This totally changes the way we think about AI training, which is why while OpenAI spent $100m on training GPT-4, running an expected 500,000 GPUs, DeepSeek used about 50,000, and likely spent that same roughly 10% of the cost.
So while operation, and even training, is now cheaper, it’s also substantially less compute intensive to train models.
And not only is there less data than ever to train models on that won’t cause them to get worse by regurgitating other worse quality AI-generated content, but even if additional datasets were scrapped entirely in favor of this new RL method, there’s a point at which an LLM is simply good enough.
If you need to auto generate a corpo-speak email, you can already do that without many issues. Reformat notes or user input? Already possible. Classify tickets by type? Done. Write a silly poem? That’s been possible since pre-ChatGPT. Summarize a webpage? The newest version of ChatGPT will probably do just as well as the last at that.
At a certain point, spending millions of dollars for a 1% performance improvement doesn’t make sense when the existing model just already does what you need it to do.
I’m sure we’ll see development, but I doubt we’ll see a massive increase in training just because the cost to run and train the model has gone down.
It still rely on nvidia hardware why would it trigger a sell-off? Also why all media are picking up this news? I smell something fishy here…
Here’s someone doing 200 tokens/s (for context, OpenAI doesn’t usually get above 100) on… A Raspberry Pi.
Yes, the “$75-$120 micro computer the size of a credit card” Raspberry Pi.
If all these AI models can be run directly on users devices, or on extremely low end hardware, who needs large quantities of top of the line GPUs?
While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500
That’s becoming less true. The cost of inference has been rising with bigger models, and even more so with “reasoning models”.
Regardless, at the scale of 100M users, big one-off costs start looking small.
But I’d imagine any Chinese operator will handle scale much better? Or?
Maybe? Depends on what costs dominate operations. I imagine Chinese electricity is cheap but building new data centres is likely much cheaper % wise than countries like the US.
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
Even if they are lying and used more compute, it’s obvious they managed to train it without access to the large amounts of the highest end chips due to export controls.
Conservatively, I think NVidia is definitely going to have to scale down by 50% and they will have to reduce prices by a lot, too, since VC and government billions will no longer be available to their customers.
It might also lead to 100x more power to train new models.
I doubt that will be the case, and I’ll explain why.
As mentioned in this article,
This totally changes the way we think about AI training, which is why while OpenAI spent $100m on training GPT-4, running an expected 500,000 GPUs, DeepSeek used about 50,000, and likely spent that same roughly 10% of the cost.
So while operation, and even training, is now cheaper, it’s also substantially less compute intensive to train models.
And not only is there less data than ever to train models on that won’t cause them to get worse by regurgitating other worse quality AI-generated content, but even if additional datasets were scrapped entirely in favor of this new RL method, there’s a point at which an LLM is simply good enough.
If you need to auto generate a corpo-speak email, you can already do that without many issues. Reformat notes or user input? Already possible. Classify tickets by type? Done. Write a silly poem? That’s been possible since pre-ChatGPT. Summarize a webpage? The newest version of ChatGPT will probably do just as well as the last at that.
At a certain point, spending millions of dollars for a 1% performance improvement doesn’t make sense when the existing model just already does what you need it to do.
I’m sure we’ll see development, but I doubt we’ll see a massive increase in training just because the cost to run and train the model has gone down.
Thank you. Sounds like good news.