• 1 Post
  • 468 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.

    And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).

    …It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn’t log requests.




  • At risk of getting more technical, ik_llama.cpp has a good built in webui:

    https://github.com/ikawrakow/ik_llama.cpp/

    Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

    For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

    And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

    …That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.





  • relatively minor efficiencies don’t make up for the compile times lol.

    It’s sometimes even a regression. For instance, self-compiled pytorch is way slower than the official releases, and Firefox generally is too unless you are extremely careful about it. Stuff like Python doesn’t get a benefit without patches.

    I think the point of Gentoo is supposed to be ‘truly from source’ and utility for embedded stuff, not benchmark performance. Especially since there are distros that offer ‘march’ optimized packages now.


  • What Toot said.

    Things I would emphasize are:

    • The community “critical mass” is amazing with the wiki, online posts and such. You get a lot of support that isn’t ancient, jank and ad hoc like Ubuntu.

    • Arch emphasizes paying attention. It’s not a hands free OS: you have know what graphics drivers you run, and what your desktop environment is. When you update, you have to watch the log for emergency messages and such, including official notifications from the arch repos themself. It’s not a “hands off” OS where you can operate without knowing anything about it, but the reward is that shit gets fixed quick, officially, without having to stray from defaults and break your system, or accumulating a bunch of hacks you have to maintain yourself.

    • Much of Arch’s bad reputation comes from AUR. Don’t use anything from the AUR (instead of an official repo package) unless you absolutely have to. This is when stuff starts breaking. Installing standalone apps that aren’t on the repo via AUR is fine, but to be clear, avoid things that integrate with the system if you can.

    • It doesn’t have to be hardcore barebones like Gentoo, there are all sorts of Preconfigurations like Garuda and Endeavor. I recommend CachyOS (which I have kept for two years now, and will into the future).





  • Honestly the iPhone performance is over rated now.

    I just came from an Android 9 Razer Phone 2 (with an ancient SD845) to a brand new iPhone 16 plus…

    And the IPhone feels slower.

    The UI is slower. Scrolling is more stuttery. Heavy webpages that ran fine on my Android phone crawl on the iPhone. It literally has the same amount of RAM (8GB), so it can’t run anything more complex either. And it’s more unintuitive too, with all these slow and wierd gestures just to do basic things, while other features are convoluted.


    And I used to be a massive iOS fanboy. I just want my jailbroken iPhone 5 back :(




  • I don’t get this at all.

    • I’m horrifically lonely. I’m basically a shut in. I need a hug.

    • I’ve got serious rejection anxiety, among other problems. But I used to have a lot of friends in the past, which makes it even worse.

    • I… Get attached to characters, like some in TV or wholesome NPCs in games. I wanna hug them.

    • I’m a local ML enthusiast. I’ve been tinkering with chatbots before ChatGPT was even a thing, including “RP” finetunes.

    I’m the target market.

    …And I don’t see the appeal of this?

    It’s the same reason I don’t get why guys pay to gawk at (excuse me if this is insensitive) people twerking on OnlyFans or Instagram or whatever. There’s no human connection. Do guys really think thumbs-upping some performer behind a screen is emotional attention?

    Same with chatbots. They can be cerebrally interesting story generators, sometimes, sure. Even an interesting ‘mirror’ to bounce private thoughts off of.

    But a girlfriend?

    I really don’t get it. I can’t empathize with that.

    I don’t get the world at all, and how billions of people seem to think they’re in some kind of two-way human relationship with influencers or (apparently) chatbots. Where’s the connection?


    And yeah, I saw this is a ad, but the idea itself is still weirdly interesting.





  • Oh wow, that’s awesome! I didn’t know folks ran TDP tests like this, just that my old 3090 seems to have a minimum sweet spot around that same same ~200W based on my own testing, but I figured the 4000 or 5000 series might go lower. Apparently not, at least for the big die.

    I also figured the 395 would draw more than 55W! That’s also awesome! I suspect newer, smaller GPUs like the 9000 or 5000 series still make the value proposition questionable, but still you make an excellent point.

    And for reference, I just checked, and my dGPU hovers around 30W idle with no display connected.