• humanspiral@lemmy.ca
    link
    fedilink
    English
    arrow-up
    8
    ·
    12 hours ago

    Huawei outperforms NVIDIA at the “cluster” level. Which are mostly turnkey systems for datacenter units. And promises truck container level cluster for next generation that is 30x the zetaflops as NVIDIA rubin cluster. China currently operates at 50% electric production capacity, and energy extremely abundant and low price, which make the per level card performance deficit irrelevant.

    • KingRandomGuy@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      11 hours ago

      To be fair, the raw FLOPs count doesn’t tell the whole story. On a lot of workloads (including token generation during LLM inference), you’re bound by the memory bandwidth rather than throughput/FLOPs. On H100/H200, keeping the tensor cores fully occupied is surprisingly difficult, and that’s with 3+ TB/s of memory bandwidth. And I believe those cards have much higher throughput (at least at FP8, Ascend wins at FP4 since H100/200 don’t support it) compared to Ascend.

      The Ascend 950PR units have far lower memory bandwidth, reportedly at 1.4 TB/s. Compare that to Blackwell, which has something like 8TB/s of bandwidth. I believe they’re manufacturing their own kind of HBM, so that’s still really impressive considering this is a fairly recent push into manufacturing accelerators. But I’m a bit skeptical it actually outperforms NVIDIA at scale.

  • billwashere@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    13 hours ago

    Sorry if this is a dumb question, but is this just for training or does DeepSeek v4 now require these chips to run?

    • qaz@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      13 hours ago

      I don’t think they only run on these chips. There are some companies in the US that provide Deepseek V4 presumably running on standard Nvidia chips.

    • humanspiral@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      12 hours ago

      It is for inference, which is generally good at open backends. Most models done in pytorch, with backend library.

  • jaykrown@lemmy.world
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    22 hours ago

    It almost feels like the Trump administration is trying to help Chinese companies at this point.

    • M137@lemmy.today
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 hours ago

      So many things they’re doing feel like this. A few things I could see being a conscious and planned deal but most things completely go against any interest they have so it’s clear that it’s just an unbelievable level of incompetence, stupidity and ignorance.

      • jaykrown@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 hours ago

        I have a hard time thinking it’s not deliberate. They’re trying to protect US car companies by putting tariffs on Chinese car imports, but what that’s doing is leading to complacency and a lack of competition. Ultimately harming the US and setting us back while allowing the wealthy shareholders to hoard wealth while not needing to worry about staying competitive.

  • Buffalox@lemmy.world
    link
    fedilink
    English
    arrow-up
    155
    arrow-down
    1
    ·
    edit-2
    2 days ago

    Chinese companies are heavily incentivized to use Chinese chips instead of American since Trump blocked trade with China.
    China used to parallel import the chips they needed, and even repackage them with more onboard RAM, making more powerful Nvidia solutions available in China than in the rest of the world.
    But Trumps behavior towards China made the Chinese government decide to limit the use of American technologies for AI.
    There was a point where Nvidia exports to China was basically at a standstill, because China forbade the purchase of a new cut down Nvidia chip made for the Chinese market to circumvent American trade restrictions.

    China is building their own complete stack now, replacing everything with Chinese technologies, right from the AI chips to the entire AI software framework.

    So not only does Nvidia and other American companies lose hardware sales, the entire stack will be threatened with a Chinese alternative, that will likely compete with American options on the international market in the future. If Cuda loses its current dominance, it will be easier for competitors to take marketshare from Nvidia.

    Hopefully this will be good for consumers worldwide.

    • Diurnambule@jlai.lu
      link
      fedilink
      English
      arrow-up
      11
      ·
      22 hours ago

      Please do i can’t be so wet. European are eager to ditch Americans. In the long run Chinese seem to a be a more reliable partner

      • Buffalox@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        arrow-down
        3
        ·
        edit-2
        21 hours ago

        I would prefer it wasn’t like this, PAX Americana seemed to work quite well for several decades, of course USA served their own interests, but they they also provided a somewhat stable world order with a decent degree of freedom.
        Now they have abandoned the ideals of freedom and democracy and international law, to serve their own interests exclusively at immense cost to others, without regard for either law or decency. and of course that is not sustainable to be an ally of.
        I think USA will soon find that without allies, their power isn’t so great after all.

        • Diurnambule@jlai.lu
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          20 hours ago

          They were infiltrated by Epstein dlasses assets and a t fucked the system. At this time look at the tax rate of these deep fucks.

            • Diurnambule@jlai.lu
              link
              fedilink
              English
              arrow-up
              1
              ·
              4 hours ago

              For a short période rich fucker were taxed enough. They are still the fact that the grow of capitalist country where sustained by colonized country ressources.

        • msage@programming.dev
          link
          fedilink
          English
          arrow-up
          4
          ·
          20 hours ago

          You are talking about freedom and peace, but only for the global north.

          It wasn’t like that for rest of the world.

      • ArchAengelus@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        29
        ·
        1 day ago

        It’s only the standard for people who self host their llms and don’t have $500k to throw at hardware for GLM-5.1 or similar models.

        I have qwen3.6:27b on my local hardware and it’s way better than I expected. I’m excited for the rest of the 3.6 line as it comes out, if they can keep up that quality.

        This story is also a nothing burger. Generally, yes, Nvidia will suffer once chinas stack catches up (soon). By then whatever bubble we are in will have normalized one way or the other.

        In terms of actually deploying this model, it doesn’t matter what hardware you’re using. VLLM supports almost everything with SIMD-type hardware instructions.

        More competition will make everyone happy except Nvidia shareholders.

    • Avid Amoeba@lemmy.ca
      link
      fedilink
      English
      arrow-up
      16
      ·
      1 day ago

      Been using Qwen 3.x for a while now for local LLM with search capability. The 3.5 and 3.6 ones are great and run very fast.

      • humanspiral@lemmy.ca
        link
        fedilink
        English
        arrow-up
        2
        ·
        12 hours ago

        3.6 27b is probably most powerful/efficient (to size) model out there. Qwen has a history of leveraging deepseek power as well. (deepseek creating small models with Qwen as the base), and Alibaba is main hosting service for deepseek. Alibaba/Qwen in talks to invest in Deepseek, atm.

        • Avid Amoeba@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 hours ago

          Yeah. The 80b Coder-Next runs at about the same speed on my hw too. I don’t know if it’s any better than 3.6 27b.

      • sp3ctr4l@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        8
        ·
        21 hours ago

        I got Qwen 3.5 running on a Steam Deck.

        It ain’t exactly blazing fast, but it does actually work.

        (Reasonably fast if you go down to the 2B param model, I can get the 9B param variant working, though this makes Steam Decky very hot and bothered.)

        Yeah, you absolutely do not need Nvidia hardware to run an LLM, but we get blasted with their propoganda suggesting otherwise just all the time in the English speaking West.

        Because if you don’t need Nvidia, well, then, this whole AI bubble looks a lot more bubbly.

        • Avid Amoeba@lemmy.ca
          link
          fedilink
          English
          arrow-up
          5
          ·
          17 hours ago

          Take good care of your hw! It’s not like 2 years ago when you could buy stuff off the shelf for reasonable prices. :D

          • sp3ctr4l@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            2
            ·
            17 hours ago

            My Steam Deck is my child.

            Maybe if I can get it to run a ‘good enough’ LLM, and also a robotics kinematics suite…

            I can just start building DOG, with a Steam Deck for a face, instead of a Combine scanner bot.

            • los0220@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              12 hours ago

              Gemma 4 seems nice for local usage, way faster than Qwen models.

              I was able to run 27B Gemma on my PC, where 14B Qwen was to slow due to CPU offload

              • percent@infosec.pub
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                11 hours ago

                +1, exactly the same experience. Except Gemma4:26B really sucks with OpenCode. Works great with Pi though

          • sp3ctr4l@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            20 hours ago

            Sorry, I’m not entirely sure what you mean.

            Did you mean to say:

            “And need to have the best consumer GPU on the market, to run an LLM.”

            … likely alluding to an RTX 5090?

            So you would be saying that basically it is bullshit, the idea that everyone needs extremely expensive hardware, to run an LLM?

            • Diurnambule@jlai.lu
              link
              fedilink
              English
              arrow-up
              2
              ·
              17 hours ago

              Hello, no sorry auto correction and going fast do it to my posts. I wanted to say that NVIDIA is already the worst option for consumer graphic card since AMD made a card with 20go ram which is able to run most open weight models.

    • mannycalavera@feddit.uk
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      2 days ago

      Hopefully this will be good for consumers worldwide.

      Until America decides to tariff anyone using Chinese technology.

      • Buffalox@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        1 day ago

        USA is already losing the tariff war as it undermines the American economy, and hasn’t helped their trade deficits much.
        When EU finally decides to put tariffs on American services, because USA continue with their shenanigans, then USA will have a trade deficit for real.
        Because the trade deficit on goods is vastly outweighed by the surplus on services.
        Even if they have a deficit, it is basically free, because they can pay for it with dollars they print themselves, because the USD is the global reserve currency.
        But Trump is ruining that too, since “liberation day” where Trump introduced his tariffs, the use of the USD as a global reserve currency has dropped, some claim by up to 30%

        All USA is doing is undermine the power they used to have. Everybody threatened by USA are in talks with each other to increase cooperation.

        EU, Canada, Australia, UK, Japan, are making deals to cooperate around USA including on military.
        The Gulf countries are now negotiating with China on economy, which will potentially be the end of the petro dollar. And they are looking to Europe especially Ukraine for defense equipment to replace American equipment.
        South American countries have been working with China for years, and USA subsidizing Argentina will not change that.

        USA is making themselves irrelevant, the Iran war has shown their military is a paper tiger, that cannot protect their allies, and they are pulling key defense equipment out of Japan and South Korea to aid in the Iran war. Making all allies unsure of the value of cooperating with USA. Japan participating in the EU SAFE program is an extremely clear indicator of that.

        So whatever USA decides, will have very little bearing on the rest of the world. Because for USA, the train has already left the station, the ship has sailed. The world has lost patience with USA, and are now only idling in their relations with USA, while they all seek to strengthen other relations, for both financial stability and military safety.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        30
        arrow-down
        1
        ·
        2 days ago

        It’ll be good for consumers worldwide. America is not the whole world.

        I, for example, am in Canada. We’ve established a bunch of very nice trade deals with China recently, we’re going to end up with access to a bunch of Chinese products that Americans can’t get due to their self-imposed trade war with China.

        • BrinkBreaker@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          1
          ·
          1 day ago

          I think they mean that the US would put trade pressure on countries doing any tech trade with China, not specifically preventing it punishing American companies from using Chinese chips.

          Unfortunately the United States is still a big economy regardless of their politics and Manny is right that the US would throw their weight behind anti China policies to the detriment of other nations.

          How successful such a move would be is up to debate.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            15
            arrow-down
            1
            ·
            1 day ago

            The US is already trying to throw its economic weight around bullying Canada, and we’ve already settled in to an effective economic defensive posture. Those trade deals with China are actually part of it, previously we were supporting various American initiatives to tariff China but the Americans tore up a bunch of agreements with us so we responded in kind. It’s unfortunate but they started it and we’re prepared to hold our own.

              • HuudaHarkiten@piefed.social
                link
                fedilink
                English
                arrow-up
                6
                ·
                1 day ago

                Luckily it seems to be making Europe and Canada stronger as well. Putin can weaken the US all he wants, but as long as Europe and Canada are not just standing around with their hands in their pockets, doing nothing, it wont benefit russia too much.

              • Rekall Incorporated@piefed.social
                link
                fedilink
                English
                arrow-up
                3
                ·
                23 hours ago

                The Americans are doing it all on their own. The russians are just trying to capitalize where they can.

                Sad thing is that it is extremely unlikely that the US will be able to implement any kind of reforms around crime, corruption, judicial independence, restrictions on suffrage.

        • Tollana1234567@lemmy.today
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          funny how the news stop talking about the tariffs when it was still going in the us, im guessing continually reporting about trumps tariffs is actually go to hurt the republican plebs and companies. and i notice alot of products are more expensive or stopped being offered online shopping.

      • Otter@lemmy.ca
        link
        fedilink
        English
        arrow-up
        14
        ·
        1 day ago

        America decides to tariff anyone using Chinese technology

        America already decides to tariff countries regardless of what they’re doing

        • Tollana1234567@lemmy.today
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 day ago

          specific on tech, remember huwei phones it was all the rage as alternative to google and iphones, and samsungs. and then the compares got scared and lobbied for thier bans.

  • RabbitBBQ@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    5
    ·
    13 hours ago

    China is so far ahead of the United States in manufacturing and technology that the United States will never be able to catch up.

    • Nalivai@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 hours ago

      United States was so ahead of China even 20 years ago, China was obviously never able to keep up, until it did, and then it was obvious that China was always going to overcome US, and now it’s obvious that this configuration is forever.
      Something new will be always has been obvious in 20 years time.

    • Melvin_Ferd@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      8
      ·
      13 hours ago

      Any new technology instantly gets turned into some new boogeyman instead of being embraced. Works out great for China when western people become luddite at a very critical window of opportunity from emerging technology. You’d almost think China has a strong ability to flood western sites with propaganda.

      • super_user_do@feddit.it
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        edit-2
        10 hours ago

        Those are two completely different contexts. Every new technology in the west is usually meant to restrict existing freedoms while in China they never historically had said freedoms, so to them it’s just an improvement in productivity while to us is not as much of an improvement at the cost of completely usurping the existing rights that made the west great

        • electricyarn@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          7
          ·
          10 hours ago

          Your premise makes no sense. Vhina abdolutely uses these technologies to restrict freedoms (so does the us).

        • Melvin_Ferd@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          4
          ·
          edit-2
          11 hours ago

          Is your argument that Chinese government is not using technology to restrict freedoms?

          How is that being upvoted

          I would say that decades of western Hollywood has primed the population to mistrust any new technology and only see potential dystopian ends in exchange for entertainment. Combined with China spreading propaganda to encourage those views in western nations. Sites like lemmy and Facebook are heavily used as generators to spread mistrust.

          • super_user_do@feddit.it
            link
            fedilink
            English
            arrow-up
            4
            ·
            10 hours ago

            No this is not what I’m saying but apparently both of you can’t read. I said that Chinese people historically have never had our same amount of freedoms so to them it’s not that big of a deal and its just nothing more than a tool to increase productivity. In the west it’s different because we do indeed have freedoms that are going to be restricted

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    42
    ·
    2 days ago

    What’s left unsaid is the software architecture is extremely interesting, and efficient.


    Ironically, the Nvidia embargo was the best thing to ever happen to the Chinese labs (which Nvidia tried to tell the US govt). It forced them to get thrifty, unlike US labs which (allegedly) fill some GPU farms with busywork for the appearance of high utilization.

  • ag10n@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    ·
    2 days ago

    You can run it on CPU alone. Not surprising they’re building their own AI ecosystem

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      2 days ago

      Not at scale. Even on the new architecture, one really needs some kind of accelerator to make it economical for servers.

      Bitnet-like models might change the calculus, but no major trainer had tried that yet.

      • [object Object]@lemmy.ca
        link
        fedilink
        English
        arrow-up
        8
        ·
        2 days ago

        Even with a bitnet, it’s almost definitely better to train on a high precision float then refine down to bits.

        I would expect bitnet to require more layers for equivalent quality too.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          2 days ago

          I just meant for mass inference serving.

          Yeah, I haven’t seen much in the way of bitnet training savings yet, like regular old QAT. It does appear that Deepseek is finetuning their MoEs in a 4-bit format now, though.

      • ag10n@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        2 days ago

        Yes, you can run it at scale. Which is why it uses Huawei hardware.

        You can run it on anything, scaled or not

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          2 days ago

          Just not power/cost efficiently on CPU only, is what I meant. CPUs don’t have the compute for batching (running generation requests in parallel). You need an accelerator, like Huawei’s, to be economical.

          It’s fine for local inference, of course.

          • ag10n@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            A whole ecosystem that can run on any hardware, efficiently or not, is a whole ecosystem developed for the Chinese market

            • brucethemoose@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              19 hours ago

              …I mean, yeah? It’s obviously developed for the Chinese market.

              But that’s theoretical, for now. No CPU backend I can find supports DSV4, and DeepSeek hasn’t contributed anything yet.

              • KingRandomGuy@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                11 hours ago

                Yeah, I’d expect KTransformers to add support eventually, especially considering their existing support for previous DeepSeek models. One of the tricky parts is that backends need both FP8 and MXFP4 support. As far as I’m aware no inference engine supports both on CPU at the moment (llama.cpp added fp4 support recently, but doesn’t have fp8, while kt-kernel doesn’t support fp4 yet).

                • brucethemoose@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  11 hours ago

                  Not to speak of the new attention scheme and the (IIRC) MLP changes.

                  I’m very much looking forward to ik_llama.cpp implementing it. I don’t think I can quite fit Flash on my rig (hence no Ktransformers for me) but a little quantization of the sparse layers, and it’d be perfect.

          • gens@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 days ago

            LLMs are limited by memory bandwidth much more then calculating power. You need HBM. Dedicated accelerators only lower power usage.

            • brucethemoose@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              13 hours ago

              This is commonly cited, but not strictly true.

              Prompt processing is completely compute limited. And at high batch sizes, where the weights are read once for many tokens generated in parallel, token generation is also quite compute limited. Obviously you want enough bandwidth to match the compute, but its very compute heavy.

              You can see this for yourself. Try ~10 prompts in parallel on a CPU in llama.cpp, and it will slow to a crawl, while a GPU with a narrow bus won’t slow down much.

              Training is a bit more complicated, but that’s not doable on CPUs anyway.

              Now, local inference (aka a batch size of 1), past prompt processing, is heavily bandwidth limited. This is why hybrid inference works alright on CPUs. But this doesn’t really apply to servers, which process many users in parallel with each “pass”.

        • theunknownmuncher@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          edit-2
          1 day ago

          Nope! You don’t know what you’re talking about. At all. But you can have fun running a 1.6 trillion parameter model on CPU at basically 0 tokens per second at scale, MoE or not.

          • KingRandomGuy@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            7 hours ago

            You can actually get kind of acceptable performance on CPU alone, but you need rather specific CPUs, like SPR or newer Intel Xeons. These support AMX, which is almost like a mini tensor core, so you can actually get decent throughput in TFLOPs out of GNR Xeons. Memory bandwidth with max channels is also acceptable, something like ~800 GB/s per socket with maxed out MRDIMMs, which is not too far behind consumer GPUs like 3090 and 4090.

            Not anywhere near the performance of real GPUs of course, and not something acceptable for scale or production workloads, but good enough for local inference.

            • theunknownmuncher@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              edit-2
              1 day ago

              You’ve proved my point that you don’t know what you’re talking about by blindly linking to the git repo. Couldn’t find any source that supports your claim? I wonder why.

              Sure you can serve one request at a time to one patient user at a slow token per second rate, which makes running locally viable, but there is no RAM that has the bandwidth to run this model at scale. Even flash would be incredibly slow on CPU with multiple requests. You’d need the high bandwidth of VRAM and to run across multiple GPUs in a scalable way, it requires extremely high bandwidth interconnects between GPUs.

              • ag10n@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                4
                ·
                1 day ago

                Thank you for proving my point. It can be run on a cpu

                “It’s slow, it’s inefficient” it still runs

                It’s a foundational model just like R1 was.

                • theunknownmuncher@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  arrow-down
                  1
                  ·
                  1 day ago

                  Yes, you can run it at scale.

                  at scale

                  Shift those goalposts! We went from “at scale” to “it still runs”

  • BETYU@moist.catsweat.com
    link
    fedilink
    arrow-up
    7
    arrow-down
    14
    ·
    2 days ago

    the only thing deepseek is really good for is to fuck over NVIDIA and chat gpt or American ai in general because deep seek is bast on chat gpt after all anything that brakes this monopoly is good and anything associated with it like ram.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        15 hours ago

        No. Not even close. Non-US models are trained (and run) on peanuts compared to big US models, because they don’t have mega GPU farms and have no other option. Deepseek in particular went all-in on software architecture efficiency.

        …Ironically, the Nvidia GPU embargo was the best thing that ever happened to the Chinese devs. It made them thrifty.

        Many tried to warn US regulators of this, but they had AI Bros whispering in their ears. The US tech system is just too screwed up, I guess.