DeepSeek ditches Nvidia for Huawei chips in V4 launch

inari@piefed.zip · 2 days ago

DeepSeek ditches Nvidia for Huawei chips in V4 launch

humanspiral@lemmy.ca · 12 hours ago

Huawei outperforms NVIDIA at the “cluster” level. Which are mostly turnkey systems for datacenter units. And promises truck container level cluster for next generation that is 30x the zetaflops as NVIDIA rubin cluster. China currently operates at 50% electric production capacity, and energy extremely abundant and low price, which make the per level card performance deficit irrelevant.

KingRandomGuy@lemmy.world · 11 hours ago

To be fair, the raw FLOPs count doesn’t tell the whole story. On a lot of workloads (including token generation during LLM inference), you’re bound by the memory bandwidth rather than throughput/FLOPs. On H100/H200, keeping the tensor cores fully occupied is surprisingly difficult, and that’s with 3+ TB/s of memory bandwidth. And I believe those cards have much higher throughput (at least at FP8, Ascend wins at FP4 since H100/200 don’t support it) compared to Ascend.

The Ascend 950PR units have far lower memory bandwidth, reportedly at 1.4 TB/s. Compare that to Blackwell, which has something like 8TB/s of bandwidth. I believe they’re manufacturing their own kind of HBM, so that’s still really impressive considering this is a fairly recent push into manufacturing accelerators. But I’m a bit skeptical it actually outperforms NVIDIA at scale.

billwashere@lemmy.world · 13 hours ago

Sorry if this is a dumb question, but is this just for training or does DeepSeek v4 now require these chips to run?

qaz@lemmy.world · 13 hours ago

I don’t think they only run on these chips. There are some companies in the US that provide Deepseek V4 presumably running on standard Nvidia chips.

billwashere@lemmy.world · 12 hours ago

Well I’ve got three 512gb Mac Studios in an EXO cluster I’m gonna see how it works.

humanspiral@lemmy.ca · 12 hours ago

It is for inference, which is generally good at open backends. Most models done in pytorch, with backend library.

jaykrown@lemmy.world · 22 hours ago

It almost feels like the Trump administration is trying to help Chinese companies at this point.

M137@lemmy.today · 9 hours ago

So many things they’re doing feel like this. A few things I could see being a conscious and planned deal but most things completely go against any interest they have so it’s clear that it’s just an unbelievable level of incompetence, stupidity and ignorance.

jaykrown@lemmy.world · 9 hours ago

I have a hard time thinking it’s not deliberate. They’re trying to protect US car companies by putting tariffs on Chinese car imports, but what that’s doing is leading to complacency and a lack of competition. Ultimately harming the US and setting us back while allowing the wealthy shareholders to hoard wealth while not needing to worry about staying competitive.

Buffalox@lemmy.world · edit-2 2 days ago

Chinese companies are heavily incentivized to use Chinese chips instead of American since Trump blocked trade with China.
China used to parallel import the chips they needed, and even repackage them with more onboard RAM, making more powerful Nvidia solutions available in China than in the rest of the world.
But Trumps behavior towards China made the Chinese government decide to limit the use of American technologies for AI.
There was a point where Nvidia exports to China was basically at a standstill, because China forbade the purchase of a new cut down Nvidia chip made for the Chinese market to circumvent American trade restrictions.

China is building their own complete stack now, replacing everything with Chinese technologies, right from the AI chips to the entire AI software framework.

So not only does Nvidia and other American companies lose hardware sales, the entire stack will be threatened with a Chinese alternative, that will likely compete with American options on the international market in the future. If Cuda loses its current dominance, it will be easier for competitors to take marketshare from Nvidia.

Hopefully this will be good for consumers worldwide.

Diurnambule@jlai.lu · 22 hours ago

Please do i can’t be so wet. European are eager to ditch Americans. In the long run Chinese seem to a be a more reliable partner

Buffalox@lemmy.world · edit-2 21 hours ago

I would prefer it wasn’t like this, PAX Americana seemed to work quite well for several decades, of course USA served their own interests, but they they also provided a somewhat stable world order with a decent degree of freedom.
Now they have abandoned the ideals of freedom and democracy and international law, to serve their own interests exclusively at immense cost to others, without regard for either law or decency. and of course that is not sustainable to be an ally of.
I think USA will soon find that without allies, their power isn’t so great after all.

Diurnambule@jlai.lu · 20 hours ago

They were infiltrated by Epstein dlasses assets and a t fucked the system. At this time look at the tax rate of these deep fucks.

Squizzy@lemmy.world · 14 hours ago

I think its chicken or the egg with the us government and dirty money scubags

Diurnambule@jlai.lu · 4 hours ago

For a short période rich fucker were taxed enough. They are still the fact that the grow of capitalist country where sustained by colonized country ressources.

msage@programming.dev · 20 hours ago

You are talking about freedom and peace, but only for the global north.

It wasn’t like that for rest of the world.

eestileib@lemmy.blahaj.zone · 2 days ago

Qwen is already the standard for actual pros as far as I can tell.

ArchAengelus@lemmy.dbzer0.com · 1 day ago

It’s only the standard for people who self host their llms and don’t have $500k to throw at hardware for GLM-5.1 or similar models.

I have qwen3.6:27b on my local hardware and it’s way better than I expected. I’m excited for the rest of the 3.6 line as it comes out, if they can keep up that quality.

This story is also a nothing burger. Generally, yes, Nvidia will suffer once chinas stack catches up (soon). By then whatever bubble we are in will have normalized one way or the other.

In terms of actually deploying this model, it doesn’t matter what hardware you’re using. VLLM supports almost everything with SIMD-type hardware instructions.

More competition will make everyone happy except Nvidia shareholders.

percent@infosec.pub · 11 hours ago

Gemma4:26b is also worth trying. I find it runs much faster on my hardware

Avid Amoeba@lemmy.ca · 1 day ago

Been using Qwen 3.x for a while now for local LLM with search capability. The 3.5 and 3.6 ones are great and run very fast.

humanspiral@lemmy.ca · 12 hours ago

3.6 27b is probably most powerful/efficient (to size) model out there. Qwen has a history of leveraging deepseek power as well. (deepseek creating small models with Qwen as the base), and Alibaba is main hosting service for deepseek. Alibaba/Qwen in talks to invest in Deepseek, atm.

Avid Amoeba@lemmy.ca · 11 hours ago

Yeah. The 80b Coder-Next runs at about the same speed on my hw too. I don’t know if it’s any better than 3.6 27b.

sp3ctr4l@lemmy.dbzer0.com · 21 hours ago

I got Qwen 3.5 running on a Steam Deck.

It ain’t exactly blazing fast, but it does actually work.

(Reasonably fast if you go down to the 2B param model, I can get the 9B param variant working, though this makes Steam Decky very hot and bothered.)

Yeah, you absolutely do not need Nvidia hardware to run an LLM, but we get blasted with their propoganda suggesting otherwise just all the time in the English speaking West.

Because if you don’t need Nvidia, well, then, this whole AI bubble looks a lot more bubbly.

Avid Amoeba@lemmy.ca · 17 hours ago

Take good care of your hw! It’s not like 2 years ago when you could buy stuff off the shelf for reasonable prices. :D

sp3ctr4l@lemmy.dbzer0.com · 17 hours ago

My Steam Deck is my child.

Maybe if I can get it to run a ‘good enough’ LLM, and also a robotics kinematics suite…

I can just start building DOG, with a Steam Deck for a face, instead of a Combine scanner bot.

los0220@lemmy.world · 12 hours ago

Gemma 4 seems nice for local usage, way faster than Qwen models.

I was able to run 27B Gemma on my PC, where 14B Qwen was to slow due to CPU offload

percent@infosec.pub · edit-2 11 hours ago

+1, exactly the same experience. Except Gemma4:26B really sucks with OpenCode. Works great with Pi though

Diurnambule@jlai.lu · edit-2 17 hours ago

Amd have the best consumer grafic card to run llm on the market.

sp3ctr4l@lemmy.dbzer0.com · 20 hours ago

Sorry, I’m not entirely sure what you mean.

Did you mean to say:

“And need to have the best consumer GPU on the market, to run an LLM.”

… likely alluding to an RTX 5090?

So you would be saying that basically it is bullshit, the idea that everyone needs extremely expensive hardware, to run an LLM?

Diurnambule@jlai.lu · 17 hours ago

Hello, no sorry auto correction and going fast do it to my posts. I wanted to say that NVIDIA is already the worst option for consumer graphic card since AMD made a card with 20go ram which is able to run most open weight models.

sp3ctr4l@lemmy.dbzer0.com · 17 hours ago

Aha! Ok, that makes sense as well.

Nikelui@lemmy.world · 1 day ago

Qwen 3.6 is already out? Damn, I swear I switched to 3.5 not even a month ago.

mannycalavera@feddit.uk · 2 days ago

Hopefully this will be good for consumers worldwide.

Until America decides to tariff anyone using Chinese technology.

Buffalox@lemmy.world · 1 day ago

USA is already losing the tariff war as it undermines the American economy, and hasn’t helped their trade deficits much.
When EU finally decides to put tariffs on American services, because USA continue with their shenanigans, then USA will have a trade deficit for real.
Because the trade deficit on goods is vastly outweighed by the surplus on services.
Even if they have a deficit, it is basically free, because they can pay for it with dollars they print themselves, because the USD is the global reserve currency.
But Trump is ruining that too, since “liberation day” where Trump introduced his tariffs, the use of the USD as a global reserve currency has dropped, some claim by up to 30%

All USA is doing is undermine the power they used to have. Everybody threatened by USA are in talks with each other to increase cooperation.

EU, Canada, Australia, UK, Japan, are making deals to cooperate around USA including on military.
The Gulf countries are now negotiating with China on economy, which will potentially be the end of the petro dollar. And they are looking to Europe especially Ukraine for defense equipment to replace American equipment.
South American countries have been working with China for years, and USA subsidizing Argentina will not change that.

USA is making themselves irrelevant, the Iran war has shown their military is a paper tiger, that cannot protect their allies, and they are pulling key defense equipment out of Japan and South Korea to aid in the Iran war. Making all allies unsure of the value of cooperating with USA. Japan participating in the EU SAFE program is an extremely clear indicator of that.

So whatever USA decides, will have very little bearing on the rest of the world. Because for USA, the train has already left the station, the ship has sailed. The world has lost patience with USA, and are now only idling in their relations with USA, while they all seek to strengthen other relations, for both financial stability and military safety.

FaceDeer@fedia.io · 2 days ago

It’ll be good for consumers worldwide. America is not the whole world.

I, for example, am in Canada. We’ve established a bunch of very nice trade deals with China recently, we’re going to end up with access to a bunch of Chinese products that Americans can’t get due to their self-imposed trade war with China.

BrinkBreaker@lemmy.dbzer0.com · 1 day ago

I think they mean that the US would put trade pressure on countries doing any tech trade with China, not specifically preventing it punishing American companies from using Chinese chips.

Unfortunately the United States is still a big economy regardless of their politics and Manny is right that the US would throw their weight behind anti China policies to the detriment of other nations.

How successful such a move would be is up to debate.

FaceDeer@fedia.io · 1 day ago

The US is already trying to throw its economic weight around bullying Canada, and we’ve already settled in to an effective economic defensive posture. Those trade deals with China are actually part of it, previously we were supporting various American initiatives to tariff China but the Americans tore up a bunch of agreements with us so we responded in kind. It’s unfortunate but they started it and we’re prepared to hold our own.

Tollana1234567@lemmy.today · 1 day ago

its what putin wants to isolate and weaken the US,.

HuudaHarkiten@piefed.social · 1 day ago

Luckily it seems to be making Europe and Canada stronger as well. Putin can weaken the US all he wants, but as long as Europe and Canada are not just standing around with their hands in their pockets, doing nothing, it wont benefit russia too much.

Rekall Incorporated@piefed.social · 23 hours ago

The Americans are doing it all on their own. The russians are just trying to capitalize where they can.

Sad thing is that it is extremely unlikely that the US will be able to implement any kind of reforms around crime, corruption, judicial independence, restrictions on suffrage.

Tollana1234567@lemmy.today · 1 day ago

funny how the news stop talking about the tariffs when it was still going in the us, im guessing continually reporting about trumps tariffs is actually go to hurt the republican plebs and companies. and i notice alot of products are more expensive or stopped being offered online shopping.

Otter@lemmy.ca · 1 day ago

America decides to tariff anyone using Chinese technology

America already decides to tariff countries regardless of what they’re doing

Tollana1234567@lemmy.today · 1 day ago

specific on tech, remember huwei phones it was all the rage as alternative to google and iphones, and samsungs. and then the compares got scared and lobbied for thier bans.

unexposedhazard@discuss.tchncs.de · edit-2 1 day ago

US tariffs hold no power anymore these days.

RabbitBBQ@lemmy.world · 13 hours ago

China is so far ahead of the United States in manufacturing and technology that the United States will never be able to catch up.

Nalivai@lemmy.world · 10 hours ago

United States was so ahead of China even 20 years ago, China was obviously never able to keep up, until it did, and then it was obvious that China was always going to overcome US, and now it’s obvious that this configuration is forever.
Something new will be always has been obvious in 20 years time.

Melvin_Ferd@lemmy.world · 13 hours ago

Any new technology instantly gets turned into some new boogeyman instead of being embraced. Works out great for China when western people become luddite at a very critical window of opportunity from emerging technology. You’d almost think China has a strong ability to flood western sites with propaganda.

super_user_do@feddit.it · edit-2 10 hours ago

Those are two completely different contexts. Every new technology in the west is usually meant to restrict existing freedoms while in China they never historically had said freedoms, so to them it’s just an improvement in productivity while to us is not as much of an improvement at the cost of completely usurping the existing rights that made the west great

electricyarn@lemmy.world · 10 hours ago

Your premise makes no sense. Vhina abdolutely uses these technologies to restrict freedoms (so does the us).

super_user_do@feddit.it · 10 hours ago

No bro the issue is that you can’t fucking read

Melvin_Ferd@lemmy.world · edit-2 11 hours ago

Is your argument that Chinese government is not using technology to restrict freedoms?

How is that being upvoted

I would say that decades of western Hollywood has primed the population to mistrust any new technology and only see potential dystopian ends in exchange for entertainment. Combined with China spreading propaganda to encourage those views in western nations. Sites like lemmy and Facebook are heavily used as generators to spread mistrust.

super_user_do@feddit.it · 10 hours ago

No this is not what I’m saying but apparently both of you can’t read. I said that Chinese people historically have never had our same amount of freedoms so to them it’s not that big of a deal and its just nothing more than a tool to increase productivity. In the west it’s different because we do indeed have freedoms that are going to be restricted

brucethemoose@lemmy.world · 2 days ago

What’s left unsaid is the software architecture is extremely interesting, and efficient.

Ironically, the Nvidia embargo was the best thing to ever happen to the Chinese labs (which Nvidia tried to tell the US govt). It forced them to get thrifty, unlike US labs which (allegedly) fill some GPU farms with busywork for the appearance of high utilization.

Tollana1234567@lemmy.today · 1 day ago

is that why JESEN was telling people to not leave him.

ag10n@lemmy.world · 2 days ago

You can run it on CPU alone. Not surprising they’re building their own AI ecosystem

Eager Eagle@lemmy.world · 2 days ago

It’s still matrix multiplication. Running it on a general purpose CPU is inefficient.

theunknownmuncher@lemmy.world · 1 day ago

I mean, sure. You could also run it by drawing marks in sand. It doesn’t make any sense to do either, though.

brucethemoose@lemmy.world · 2 days ago

Not at scale. Even on the new architecture, one really needs some kind of accelerator to make it economical for servers.

Bitnet-like models might change the calculus, but no major trainer had tried that yet.

[object Object]@lemmy.ca · 2 days ago

Even with a bitnet, it’s almost definitely better to train on a high precision float then refine down to bits.

I would expect bitnet to require more layers for equivalent quality too.

brucethemoose@lemmy.world · 2 days ago

I just meant for mass inference serving.

Yeah, I haven’t seen much in the way of bitnet training savings yet, like regular old QAT. It does appear that Deepseek is finetuning their MoEs in a 4-bit format now, though.

ag10n@lemmy.world · 2 days ago

Yes, you can run it at scale. Which is why it uses Huawei hardware.

You can run it on anything, scaled or not

brucethemoose@lemmy.world · 2 days ago

Just not power/cost efficiently on CPU only, is what I meant. CPUs don’t have the compute for batching (running generation requests in parallel). You need an accelerator, like Huawei’s, to be economical.

It’s fine for local inference, of course.

ag10n@lemmy.world · 1 day ago

A whole ecosystem that can run on any hardware, efficiently or not, is a whole ecosystem developed for the Chinese market

brucethemoose@lemmy.world · 19 hours ago

…I mean, yeah? It’s obviously developed for the Chinese market.

But that’s theoretical, for now. No CPU backend I can find supports DSV4, and DeepSeek hasn’t contributed anything yet.

KingRandomGuy@lemmy.world · 11 hours ago

Yeah, I’d expect KTransformers to add support eventually, especially considering their existing support for previous DeepSeek models. One of the tricky parts is that backends need both FP8 and MXFP4 support. As far as I’m aware no inference engine supports both on CPU at the moment (llama.cpp added fp4 support recently, but doesn’t have fp8, while kt-kernel doesn’t support fp4 yet).

brucethemoose@lemmy.world · 11 hours ago

Not to speak of the new attention scheme and the (IIRC) MLP changes.

I’m very much looking forward to ik_llama.cpp implementing it. I don’t think I can quite fit Flash on my rig (hence no Ktransformers for me) but a little quantization of the sparse layers, and it’d be perfect.

gens@programming.dev · 2 days ago

LLMs are limited by memory bandwidth much more then calculating power. You need HBM. Dedicated accelerators only lower power usage.

brucethemoose@lemmy.world · 13 hours ago

This is commonly cited, but not strictly true.

Prompt processing is completely compute limited. And at high batch sizes, where the weights are read once for many tokens generated in parallel, token generation is also quite compute limited. Obviously you want enough bandwidth to match the compute, but its very compute heavy.

You can see this for yourself. Try ~10 prompts in parallel on a CPU in llama.cpp, and it will slow to a crawl, while a GPU with a narrow bus won’t slow down much.

Training is a bit more complicated, but that’s not doable on CPUs anyway.

Now, local inference (aka a batch size of 1), past prompt processing, is heavily bandwidth limited. This is why hybrid inference works alright on CPUs. But this doesn’t really apply to servers, which process many users in parallel with each “pass”.

theunknownmuncher@lemmy.world · edit-2 1 day ago

Nope! You don’t know what you’re talking about. At all. But you can have fun running a 1.6 trillion parameter model on CPU at basically 0 tokens per second at scale, MoE or not.

KingRandomGuy@lemmy.world · 7 hours ago

You can actually get kind of acceptable performance on CPU alone, but you need rather specific CPUs, like SPR or newer Intel Xeons. These support AMX, which is almost like a mini tensor core, so you can actually get decent throughput in TFLOPs out of GNR Xeons. Memory bandwidth with max channels is also acceptable, something like ~800 GB/s per socket with maxed out MRDIMMs, which is not too far behind consumer GPUs like 3090 and 4090.

Not anywhere near the performance of real GPUs of course, and not something acceptable for scale or production workloads, but good enough for local inference.

ag10n@lemmy.world · 1 day ago

https://github.com/DeepSeek-V4/deepseek-V4

theunknownmuncher@lemmy.world · edit-2 1 day ago

You’ve proved my point that you don’t know what you’re talking about by blindly linking to the git repo. Couldn’t find any source that supports your claim? I wonder why.

Sure you can serve one request at a time to one patient user at a slow token per second rate, which makes running locally viable, but there is no RAM that has the bandwidth to run this model at scale. Even flash would be incredibly slow on CPU with multiple requests. You’d need the high bandwidth of VRAM and to run across multiple GPUs in a scalable way, it requires extremely high bandwidth interconnects between GPUs.

ag10n@lemmy.world · 1 day ago

Thank you for proving my point. It can be run on a cpu

“It’s slow, it’s inefficient” it still runs

It’s a foundational model just like R1 was.

theunknownmuncher@lemmy.world · 1 day ago

Yes, you can run it at scale.

at scale

Shift those goalposts! We went from “at scale” to “it still runs”

Ŝan • 𐑖ƨɤ@piefed.zip · 2 days ago

Sad trombone.

BETYU@moist.catsweat.com · 2 days ago

the only thing deepseek is really good for is to fuck over NVIDIA and chat gpt or American ai in general because deep seek is bast on chat gpt after all anything that brakes this monopoly is good and anything associated with it like ram.

Tollana1234567@lemmy.today · 1 day ago

its probably costing as much and polluting as much as a datacenter from the us is doing.

brucethemoose@lemmy.world · edit-2 15 hours ago

No. Not even close. Non-US models are trained (and run) on peanuts compared to big US models, because they don’t have mega GPU farms and have no other option. Deepseek in particular went all-in on software architecture efficiency.

…Ironically, the Nvidia GPU embargo was the best thing that ever happened to the Chinese devs. It made them thrifty.

Many tried to warn US regulators of this, but they had AI Bros whispering in their ears. The US tech system is just too screwed up, I guess.

DeepSeek ditches Nvidia for Huawei chips in V4 launch

DeepSeek ditches Nvidia for Huawei chips in V4 launch

DeepSeek launched V4 on Huawei chips one day after White House accused China of AI theft