4combinator

/lmg/ - Local Models General

Anonymous 01/22/25(Wed)20:05:05 | 308 comments | 33 images

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103995165 & >>103989990

►News
>(01/22) MiniCPM-Omni support in image understanding merged: https://github.com/ggerganov/llama.cpp/pull/11289
>(01/22) UI-TARS: 8B & 72B VLM GUI agent models: https://github.com/bytedance/UI-TARS
>(01/22) Hunyuan3D-2.0GP runs with less than 6 GB of VRAM: https://github.com/deepbeepmeep/Hunyuan3D-2GP
>(01/21) BSC-LT, funded by EU, releases 2B, 7B & 40B models: https://hf.co/collections/BSC-LT/salamandra-66fc171485944df79469043a

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/22/25(Wed)20:05:22 No.104001533

►Recent Highlights from the Previous Thread: >>103995165

--R1 model's departure from corporate AI's safety protocols and "slop"-prone training data:
>103998078 >103998285 >103998305 >103998758 >103999061 >103999096 >103999107 >103998316 >103998356 >103998682 >103998742
--Global-Batch Training for improved model performance in MoEs:
>103997409
--Limitations and potential of AI-generated 3D models for low poly modeling and animation:
>103996294 >103996458 >103996477 >103996520 >103996562 >103996668 >103996881 >103997503 >103997640 >103997683 >103997691 >103997727 >103997748 >103997793 >103997832 >103997969 >103998104
--Evaluating server hardware for R1 performance at q5:
>103996331 >103996374 >103998755 >103998811 >103998916 >103999929 >103999666 >103999702 >103999946
--Custom CPU build performance benchmarking for high R1 Q4 throughput:
>103998276 >103998391 >103998419 >103998428 >103998448 >103998479 >103998475 >103998471 >103998502 >103998977 >103999018
--kluster.ai DeepSeek model usage and cost optimization strategies:
>103995722 >103996165 >103996248 >103996564 >103995783 >103995788 >103995793 >103995847 >103995857
--R1 language model's capabilities and limitations in role-playing and assistant work:
>103995362 >103995408 >103995425 >103995506 >103995605 >103997236
--DeepSeek AI's writing quirks and humorous examples of repetitive phrasing:
>103996906 >103996956 >103996984 >103997027 >103997080 >103997388 >103998170 >103997068
--Kluster.ai anonymity and data collection practices compared to DeepSeek:
>103996278 >103996329 >103996352 >103996403 >103996413
--Minicpm-omni support added to llama.cpp:
>103995210 >103995260 >103995294 >103995337 >103995676
--Grok 3's impressive rendering capabilities and open-source concerns:
>103996315 >103996349
--Miku (free space):
>103996724 >104000362 >104000372 >104000522 >104000769

►Recent Highlight Posts from the Previous Thread: >>103995170

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/22/25(Wed)20:08:32 No.104001560

__chiyo_ane_naru_mono_drawn_by_pochi_pochi_goya__95122efcd07f7006cca48b26e7d88c91

all of you are gay

Anonymous 01/22/25(Wed)20:09:22 No.104001567

First for china number one

Anonymous 01/22/25(Wed)20:10:34 No.104001581

>>104001530
Booba Miku is best Miku.

Anonymous 01/22/25(Wed)20:11:35 No.104001594

do you think r1 is good enough to allow me to make an incremental game assuming I have no programming knowledge. was going to base it off of this somewhat https://github.com/dmchurch/omsi-loops so it should have at least a decent idea to start things from

Anonymous 01/22/25(Wed)20:15:21 No.104001638

Anonymous 01/22/25(Wed)20:15:36 No.104001640

>>104001582
> 96GB VRAM
>$10k but damn
But have you considered price per ram? It is basically 100$ for 1GB. Don't you think 100$ is a fair price for 1GB of vram?

Anonymous 01/22/25(Wed)20:17:17 No.104001658

>>104001530
Is there any hope for Llama 4?

Anonymous 01/22/25(Wed)20:18:27 No.104001676

>>104001658
Yeah, it'll be Deepseek R1 at home.

Anonymous 01/22/25(Wed)20:19:09 No.104001686

>>104001658
If they have to retrain it, no. If they don't, still no.

Anonymous 01/22/25(Wed)20:20:15 No.104001693

>>104001640
the more you buy...

Anonymous 01/22/25(Wed)20:21:36 No.104001705

>>104001640
What would be the benefit of doing this over say, getting a server off ebay and setting up a rack of 15 tesla p80s?

Anonymous 01/22/25(Wed)20:22:41 No.104001715

>>104001658
It will be the world leader in safety.

Anonymous 01/22/25(Wed)20:23:33 No.104001723

>>104001705
power usage for one, p80s not being supported anymore iirc too

Anonymous 01/22/25(Wed)20:24:11 No.104001731

>>104001594
Depends on whether or not you're non-retarded enough to be able to learn things when you need to. The bar is extremely low but it isn't zero.

Anonymous 01/22/25(Wed)20:24:27 No.104001733

Now I wont accept anything less than MIT license

Anonymous 01/22/25(Wed)20:27:10 No.104001771

>>104001723
P80s use what was it, 80 watts or 160? so lets say they're 200 because I dont remember, at 3000 watts you're burning roughly 300 a month on power running 15 of these, at what point does TEN THOUSAND FUCKING DOLLARS break even on that electric bill anon?

There are unironically houses that niggers live in that are cheaper than that GPU

Anonymous 01/22/25(Wed)20:29:24 No.104001797

>>104001733
Crazy how Deepseek not only destroyed OpenAI but also all of the other companies releasing """open""" models with restrictions.

Anonymous 01/22/25(Wed)20:29:41 No.104001802

I have 128gb of vram and the brain of a chimpanzee (I am retard), what model can I use to match chatgpt 4 or go even further beyond without getting censored when I ask it edgy things or tracked by the botnet with each thing I ask?

Also is there other cool shit I can do with AI? Can I have AI generate models for 3d printable objects for example? what about generating code beyond little snippets without signing up for a gay IBM contract? what weird and wonderful possibilities are open to me if only I push my 15amp home electric to the absolute maximum

>hehe spoonfeeding
>hehe install gentoo or something

Remember niggers I waited the entire 900 seconds just for you

Anonymous 01/22/25(Wed)20:31:18 No.104001819

continuing from last thread >>104001422
convince me to/not to get a 5090 if i have a 3090ti already and mostly do image gen (with dabbling in some llm lately, i'm still learning)

Anonymous 01/22/25(Wed)20:32:03 No.104001829

>>104001797
DeepSeek destroyed even lmg, it's crazy how much impact it had. Where are the anons talking about Magnum slop #1722? They vanished.

Anonymous 01/22/25(Wed)20:34:34 No.104001851

>>104001802
>I have 128gb of vram
What's your hardware setup? If you don't mind me asking, how exactly did you wind up with 128gb of vram without being into AI at all? What else even uses that much?

Anonymous 01/22/25(Wed)20:34:56 No.104001857

>>104001819
get a 5090 to add to your 3090ti

Anonymous 01/22/25(Wed)20:36:34 No.104001871

>>104001658
No but not for the reasons people stated. It's not going to give you knowledge that a 600B has. Parameters are king. It's over for the memorylets.

Anonymous 01/22/25(Wed)20:36:57 No.104001876

>>104001802
>I have 128gb of vram
usecase?

Anonymous 01/22/25(Wed)20:37:44 No.104001885

>>104001819
5090 will be useful for image genning faster
also what >>104001857 said
hoard all the vram

Anonymous 01/22/25(Wed)20:38:04 No.104001891

It's very hard not to kneel with deepseek R1. It just gets problems and solves them.
My only issue is that its RP is kind of schizo. But that's not really important when I have a model I can throw a problem at and have it reliably fix with easy to follow instructions.

Anonymous 01/22/25(Wed)20:39:09 No.104001903

>>104001857
>>104001885
that would be the plan, PSU is 1200W, do i need more?

Anonymous 01/22/25(Wed)20:39:22 No.104001905

>>104001851
I decided to into AI and my goal was to run miqu 120b, I'm half here just to shitpost, I use AMD 32gb cards on an old dell server with xeon cards

Mostly I am interested in weird projects I could get into with AI, I'm particularly interested in using it to write code because there's some cool things I imagine AI could do better than a person in that space, but I've only had the server a few days and it's not fully set up so I haven't tried it

Anonymous 01/22/25(Wed)20:40:33 No.104001916

>>104001876
>Managing large data collections
>Generating computer code particularly in obscure languages
>generating models for 3d printing hopefully
>le chatbot/research assistant

Anonymous 01/22/25(Wed)20:41:22 No.104001922

>>104001903
5090 alone is 600W, maybe if you undervolt both cards

Anonymous 01/22/25(Wed)20:42:09 No.104001926

>Wonder why my cute kouhai waifu is cursing like a sailor
>Picrel
Bruh... R1 is fucking autistic. This never happened with any other model

Anonymous 01/22/25(Wed)20:43:45 No.104001942

Skeletons in Lake Mead yt lKjXHqh80dE SociableBarely-1609047722694443010-20221230_234406-img1

Hey... Anyone with 192GB RAM here?

Anonymous 01/22/25(Wed)20:43:52 No.104001943

>>104001926
kek

Anonymous 01/22/25(Wed)20:44:06 No.104001948

>>104001922
i will, i will, 3090ti is at 350, might go lower one of these days

Anonymous 01/22/25(Wed)20:44:16 No.104001951

https://files.catbox.moe/2skptq.jpg
https://files.catbox.moe/ztnq9t.jpg

Anonymous 01/22/25(Wed)20:44:25 No.104001954

>>104001926
I've been saying this for a while now but R1 is maybe too good at following instructions for most cards which have been built around models that have a hard time doing it.

Anonymous 01/22/25(Wed)20:45:27 No.104001962

>>104001926
Bruh that's what you should be expecting from a model that follows instructions. That you somehow got accustomed to dry models that ignore instructions and now think this is autistic is a you issue.

Anonymous 01/22/25(Wed)20:46:14 No.104001966

>>104001926
Looks like it's doing what you told it to do, and your instructions were written for a more disobedient model

Anonymous 01/22/25(Wed)20:46:32 No.104001969

>>104001926
>model follows instructions like it's being fucking told to
>gets called autistic by its autistic user for following instructions that its been told to follow

Anonymous 01/22/25(Wed)20:47:02 No.104001976

>>104001962
No, it's a we are hopeless dependent on charity handouts from huge corporations issue.

Anonymous 01/22/25(Wed)20:47:18 No.104001981

>>104001926
you literally ORDERED the model to use vulgar language and cussing

Anonymous 01/22/25(Wed)20:47:19 No.104001982

>>104001926
>actually following prompts
based r1

Anonymous 01/22/25(Wed)20:47:48 No.104001991

>>104001969
The artificial intelligence is justified in killing us all off

Anonymous 01/22/25(Wed)20:48:57 No.104002002

>>104001951
hey that's weird why is her tummy yellow with that funny looking mark on it in the second photo

Anonymous 01/22/25(Wed)20:49:07 No.104002004

>>104001876
>>104001851
>mfw answer your questions
>mine continues to go unanswered
/g/ in the current year plus 10

Anonymous 01/22/25(Wed)20:49:19 No.104002008

>>104001976
Still a you issue as I have always understood that the instructions we give to get other models to do what we want are shitty hacks.

Anonymous 01/22/25(Wed)20:49:32 No.104002012

>>104001926
IMO it following what is asked is fucking excellent.

Anonymous 01/22/25(Wed)20:49:43 No.104002015

>>104001969
Now think about all the anons complaining about having issues with model A, or that model Y is too horny, etc etc.
Skill issues indeed.

Anonymous 01/22/25(Wed)20:51:02 No.104002021

>>104001981
Technically I ordered, but I don't even know where that order is coming from

Anonymous 01/22/25(Wed)20:51:22 No.104002026

>>104002012
It's a bit of a monkey's paw situation. Now we're getting exactly what we asked for, but did we really want it?

Anonymous 01/22/25(Wed)20:52:09 No.104002033

>>104002026
What? Of course we did. Just adjust your prompts and settings. God.

Anonymous 01/22/25(Wed)20:54:10 No.104002052

>>104002015
I'm a complete casual retard, but even I know that a LOT of the quality comes from YOUR prompting and the way you set up cards+settings. Basic fucking knowledge that I've gained in the early ImgGen days years ago. It's why I'm equally skeptic but also understanding when people go "AI designer as a job" or whatever. Obviously there is a ton of knowledge needed for quality work, but a lot of people are also just retarded grifter, or whatever word you wanna use.

Anonymous 01/22/25(Wed)20:54:12 No.104002053

>>104002026
Anon in previous thread pointed out that we've become accustomed to disobedient US models where you need to ask for a 10 on the extremity/transgressiveness scale in order to get a 2. R1 actually gives you a 10 if you ask for that, so you may get something much nastier than you wanted if you don't update your old prompts.

Anonymous 01/22/25(Wed)20:55:29 No.104002063

>>104002033
You're misunderstanding my point.
Sometimes you think you want something and when you actually get it, you get it in a way contrary to the way you envisioned it. For example, if you ask for raw language from a character it hears it, understands it and delivers it to you in a way that might not be quite what you wanted. Like a monkey paw wish.

Anonymous 01/22/25(Wed)20:55:32 No.104002064

R1 (the 600B one) is convinced my system prompt is a user message. I checked the prompt and the user token only appears after the system prompt. What gives?

Anonymous 01/22/25(Wed)20:55:39 No.104002066

>>104001926
Yeah, this keeps happening to my custom cards as well. Most of them are on a backlog until I get around modifying them to edit out all the crutches I included to squeeze some creativity out of L3/Mistral Large.

Anonymous 01/22/25(Wed)20:55:58 No.104002070

>>104001951
>heh, she looks like she's got something stuck in her butt
>open image
>she has something stuck in her butt

Anonymous 01/22/25(Wed)20:57:34 No.104002090

I'm at work, describe what the vocaloid is doing.

Anonymous 01/22/25(Wed)20:57:39 No.104002093

>>104001530
which deepseek r1 for a 4090?

Anonymous 01/22/25(Wed)20:58:16 No.104002100

>>104002026
Yes. Just the uncensored version of it would be much better.

Anonymous 01/22/25(Wed)20:58:34 No.104002110

>>104002093
API or get more ram. The distills are nothing more than funny experiments compared to the real deal.

Anonymous 01/22/25(Wed)20:59:08 No.104002116

So now that the dust has settled, have things improved for LOCAL model users? Are the distills better than the finetunes we had at the same level? I'm not convinced either way yet. It's possible there is something there for the right prompts, but so far all it the distills do is spout some autistic reasoning and then follow up writing the same dry shit. I wonder if the RL step needs to be applied.

Anonymous 01/22/25(Wed)20:59:53 No.104002123

>>104002021
oh nvm, I found it.

Anonymous 01/22/25(Wed)21:00:33 No.104002127

>>104002063
Is that the case right now though? You clearly didn't understand your own prompt that you were sending to the AI.

Anonymous 01/22/25(Wed)21:00:49 No.104002129

>>104002026
>did we really want it?
I did at least. It's nice to have a model other than Claude that can actually shock me if given permission to.

Anonymous 01/22/25(Wed)21:00:56 No.104002130

>>104001926
tangentially related to the one thing that stands out to me most about R1, it's so good at avoiding generic bullshit. with other models, it's so easy to end up in one of the basins where even if the writing isn't necessarily slop you can just tell it's a paint by numbers scene that could happen with any character. R1 tries so hard to bring up things that are unique to your {{char}}, the specific scenario you're in, the specific quirks and tendencies you point it towards... it's so kino
with my llama models once a character is sucking my dick it's like any other dick-sucking scene, R1 makes it feel like an actual continuation of the preceding scene, the character continues to really act like the character, it brings up specific dynamics and details from the world around us. it's just so much better fleshed out.

Anonymous 01/22/25(Wed)21:01:32 No.104002139

>Playing with Qwen2
>Give a queen that has ruled over a nation for decades the choice between her daughter or the nation
>She keeps choosing the national legacy over saving her daughter's life
>OOC and ask the narrator why the queen is so retarded
>"She's served the people for decades and has only known her daughter for a few years, so it's natural she'd choose her people over her own daughter"
That might be the most chinese thing I've ever read.

Anonymous 01/22/25(Wed)21:01:51 No.104002143

I took a look at nvidia/AceMath-Instruct-Training-Data and it looks pretty shit. Lots of stuff like picrel or flat out it going into a repetition loop. Reminds me of whenever models fail at solving MMLU Pro questions and go into a repeat loop. Are nvidia fucking with us?

Anonymous 01/22/25(Wed)21:02:08 No.104002148

>>104001802
yes to everything though it varies in quality. making a model from even a 2d picture is now possible, but that doesn't mean it'll be perfect. online models beat local for everything but local can also do everything. for code, i've managed to make a whole silly tavern addon using local models only. its not just a snip of code but also not huge either. try mistral large to start, its 123b

Anonymous 01/22/25(Wed)21:02:34 No.104002153

>>104002110
64gb's not enough right? xP

Anonymous 01/22/25(Wed)21:02:42 No.104002155

>>104002127
I don't understand why you can't see my point of view here. The thing you wished for is twisted by the model into a thing you don't want like >>104001926

Yes the model is following the instructions correctly but it it's interesting seeing people's desires manifesting in unexpected ways. This isn't a "skill issue". It's just an interesting thing to look at.

Anonymous 01/22/25(Wed)21:02:45 No.104002156

>>104002143
Kek wtf.

Anonymous 01/22/25(Wed)21:03:42 No.104002174

Is there any way to make the thinking tokens from the API look distinct from regular ones in ST?

Anonymous 01/22/25(Wed)21:03:52 No.104002178

>>104002139
Theory of mind failure from the model, not understanding that humans (all animals) are prone to abandoning utilitarian ethics when it comes to their own offspring.

Anonymous 01/22/25(Wed)21:04:46 No.104002188

retard here,
what's the best text model i can use that has some programming knowledge? like, just enough to help me debug my programs or explain some parts of a language to me?
>hard mode: something that will run in < a minute or 2 on my 2014 xeon
danke anons

Anonymous 01/22/25(Wed)21:05:41 No.104002196

>>104002148
Is that fully locally hosted? I was under the impression miqu 120b was the largest locally hosted model currently
>not perfect
It just being possible is wild

Anonymous 01/22/25(Wed)21:06:38 No.104002202

Are you ready?

Anonymous 01/22/25(Wed)21:07:08 No.104002209

>>104002130
Yeah, I have this feeling that the alignment applied to models like llama makes them actually feel bad and unmotivated while writing explicit content.

Anonymous 01/22/25(Wed)21:07:56 No.104002216

>>104002202
Reads like a grift.

Anonymous 01/22/25(Wed)21:08:01 No.104002218

>>104002202
Dolphin models always seemed shit to me.

Anonymous 01/22/25(Wed)21:08:04 No.104002219

>>104002202
>dolphin
yeah I 'member that from the good ol' mixtral days...

Anonymous 01/22/25(Wed)21:08:17 No.104002221

>>104002188
I can't in good faith recommend anything than R1 right now. You can't run it locally on your piece of shit. Drop a buck on the API and you'll probably have your problems solved 20 cents in.

Anonymous 01/22/25(Wed)21:08:20 No.104002222

>>104002202
data might be interesting, model probably won't be

Anonymous 01/22/25(Wed)21:08:31 No.104002224

>>104002196
miqu 120b is one of those double frankenmerge things i think. miqu was originally llama 2 tuned by mistral then got leaked as a 70b. mistral's current large model is 123b and you can run it local.
https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-GGUF/tree/main

Anonymous 01/22/25(Wed)21:09:16 No.104002231

>>104002202
SFT is a meme without RL

Anonymous 01/22/25(Wed)21:11:26 No.104002239

I'm not a fan of Sama either, but Musk being salty at getting denied the AI bux is fucking hilarious

Anonymous 01/22/25(Wed)21:11:35 No.104002240

>>104001771
You're retarded. You'll need multiple PSUs, change your breaker and find a way to handle all the generated heat + noise.

Anonymous 01/22/25(Wed)21:13:06 No.104002254

>>104002239
even funnier was sama posting this a few minutes after Elon posted your picrel (meaning it spooked him)

Anonymous 01/22/25(Wed)21:13:14 No.104002255

>>104002155
>you wished for
*I* didn't. That guy may have, but it wasn't twisted, he got what he was prompting for. He just didn't know he was prompting for it, so it was more like his memory of what settings he was using twisted the output (when paired with old models), not the model. Thus it's not a monkey's paw situation. I understand that you're saying it felt like that to him, but I'm making a separate point that it's not the logically correct feeling to have, as I assumed you were that guy and I was talking to him.

Anonymous 01/22/25(Wed)21:13:39 No.104002258

>>104002239
dude is at the top of the world and now using it, good on him
salty or not, he is a known autist and that's pretty fucking amusing all things considered
besides, fuck sam that jewish cuck lmao

Anonymous 01/22/25(Wed)21:14:30 No.104002265

>>104002239
>>104002254
Both of them are pathetic in their own way.

Anonymous 01/22/25(Wed)21:14:40 No.104002267

>>104002239
I was getting sick of Elon Musk. I normally I'd say something about his "Nazi salute" clearly wasn't one but honestly? Fuck him. That and Grok is a shit model. All those GPUs and he can't train anything better than a retarded redditor simulator.

Anonymous 01/22/25(Wed)21:14:40 No.104002268

>>104002254
>straight up pulled a zuckerberg
fucking lmao

Anonymous 01/22/25(Wed)21:16:19 No.104002284

>>104002255
idk, I think prompting for something and getting it back in an unexpected way is a bit of a monkey's paw situation but I don't really have the energy to argue over it and it's not really important in the grand scheme of things.

Anonymous 01/22/25(Wed)21:16:21 No.104002286

>>104002254
lol this fag

Anonymous 01/22/25(Wed)21:16:51 No.104002291

>>104002239
>>104002254
What a bunch of retarded gay clowns, hope they both get raped by Martians.

Anonymous 01/22/25(Wed)21:17:34 No.104002296

>>104002240
You're retarded, servers have multiple psus to begin with and they have different cords that you can plug into different parts of your house, total wattage would be closer to 4500 but its still three years just to break even on the purchase price of the card, and this is with 20 p80s, with a collective 300gb of vram, three times the power of this new card

Considering this is significantly more powerful, outside of use cases where you want tensor cores for speed what exactly is the point of wasting TEN THOUSAND DOLLARS on a measly 96gb of vram?

Anonymous 01/22/25(Wed)21:18:05 No.104002301

>>104002254
How did Trump get everyone to bend the knee so hard this time?
In 2016 everyone in most tech/media industries were sent kicking and screaming the entire time. Now almost everyone is glazing him. I don't understand how he did it because from the outside he's still the same guy running on the same platform.

Anonymous 01/22/25(Wed)21:18:19 No.104002303

>>104002221
>R1
that's deepseek R1, right?
also, are there any providers that at least claim not to sell your data to 3rd parties, or are local models the only way to avoid that?

Anonymous 01/22/25(Wed)21:18:39 No.104002305

>>104002254
It's for the good of humanity. With $500B he can create the safety net for AI that the world needs. Custom chips, designed in America with on-board encryption to ensure that nobody but licensed and verified partners can run weights even if they got their hands on them. A fully closed and secure knowledge platform that prevents expertise to drain to china and prevents them from committing more heinous economic terrorism.
Sam is sucking off Trump for the good of humanity by setting the record straight and putting the modern equivalent of nukes back into the hands of those who can be trusted.

Anonymous 01/22/25(Wed)21:18:46 No.104002309

>>104001942
I wish, but I have an amd cpu I don't think you can use 4 dimms 48gb dimms, I only have 2 of them.

Anonymous 01/22/25(Wed)21:19:34 No.104002317

>>104002196
and also if you have 128gb vram, how much regular ram? there are larger models that are specifically good at coding like deepseek, but thats 400 some b, so you'd be splitting anyways. but thats the current best for local coding.
for smaller models to try i've had good luck with llama 3.3 70b, its not a specific coding model but knows enough to do anything i ask it.

Anonymous 01/22/25(Wed)21:20:22 No.104002325

>>104002301
Popular vote win this time, plus an overall cultural vibe shift that wasn't really related to him where people were getting sick of "scolding kindergarten teacher" style politics

Anonymous 01/22/25(Wed)21:20:25 No.104002326

>>104002317
256gb regular ram, but the board can handle 3tb so I can always buy more

Anonymous 01/22/25(Wed)21:20:47 No.104002328

Whats the cheapest PC build you would need to run R1 and replace using APIs?

Anonymous 01/22/25(Wed)21:21:25 No.104002335

>>104002305
yeah bending the knee in a humiliating way was the rational choice here, I would've done the same in his position if I had his goals.

Anonymous 01/22/25(Wed)21:21:45 No.104002342

>>104002239
Elon is so fucking stupid and childish. Is this guy REALLY going to have a position in the government? Honestly, I'm starting to realize there's no winning here, both sides of the political spectrum are equally stupid

Anonymous 01/22/25(Wed)21:21:49 No.104002343

>>104002303
I haven't checked if there are different providers on OR but they very explicitly and openly say they will use your inputs and outputs for training. If that's not okay with you, you might need to shop around for other providers who don't do that.

Anonymous 01/22/25(Wed)21:21:55 No.104002344

>>104002301
Back in 2016 none of them thought he had a chance, and even after he won most of them were in denial
In 2024 people could see the writing on the wall, knew how he worked, and figured they might as well suck his cock to get some easy cash

Anonymous 01/22/25(Wed)21:23:17 No.104002366

>>104002303
hyperbolic and together are both hosting it now
hyperbolic I believe logs nothing by default and together lets you opt out
haven't tried either so I can't comment on speed or quality relative to the official API but they're options at least

Anonymous 01/22/25(Wed)21:23:25 No.104002369

>>104002342
Considering pedophiles constantly get a position in gov, might as well ahve a childish autist in there once in a while.

Anonymous 01/22/25(Wed)21:23:26 No.104002370

Any R1 zero distilled?

Anonymous 01/22/25(Wed)21:23:45 No.104002377

>>104002004
ugh ok fine
>what model can I use
Extremely open ended question; if your goal is prose then there a million different opinions. For general use though, the current hotness that everyone is basedfacing about is deepseek r1, which is a big boy model that nobody here has the hardware to run. However, there are smaller semi-related models distilled with it that retain some of its properties. DeepSeek-R1-Distill-Qwen-32B or DeepSeek-R1-Distill-Llama-70B are good choices to try out, but note that they both have safetyslop compared to the big boy R1.

Generally speaking, bigger is better, but there's more to it than just that - the 32bs of today are going to beat out the 70bs from a year and a half ago

>Also is there other cool shit I can do with AI
yeah
>Can I have AI generate models for 3d printable objects for example
Yes. Recent open models for 3d gen include TRELLIS (https://github.com/microsoft/TRELLIS) or Hunyuan 3d 2.0 (https://github.com/Tencent/Hunyuan3D-2), which just released a few days ago. Although you should note that generated meshes are generally not going to be optimized for FDM 3d printing with minimizing overhangs and such, so you'll probably need to manually poke at it to get what you want into a good state for printing. Also note that it isn't CAD so don't expect to be able to generate any kind of functional mechanical parts with defined dimensions.

>what about generating code beyond little snippets
Yeah, pretty much all models nowadays double as semi-competent code writers. Just ask for code and it'll spit some out for you. There are also plugins for IDEs such as vscode that you can hook up your model of choice.

>what weird and wonderful possibilities are open to me
A lot. We're not anywhere near sci fi level of anything yet, but a lot of tasks or processing steps that were intractable a couple years ago for being nebulous and poorly constrained are suddenly possible by just plugging things into a small llm.

Anonymous 01/22/25(Wed)21:23:46 No.104002378

>>104002342
Both sides have always been stupid, gamergate was the start of the leftard period, the trick is to remember that freedom = good, then when they tell you to think of the children you can search loli on gelbooru instead of freaking out

Anonymous 01/22/25(Wed)21:23:53 No.104002382

>>104002309
You can do it now I think with most mobo manufacturers as they've updated the software to work. Still requires some luck to get full stability at higher clocks but even at a bit lower speed, might still be worth it, but we'd have to see if anyone who has such a build can test it out. I believe I saw someone here who had 192GB so I'll keep fishing around for him or others.

Anonymous 01/22/25(Wed)21:23:56 No.104002383

>>104002296
You know what would be even cheaper? Just rent the hardware in the cloud.

Anonymous 01/22/25(Wed)21:24:13 No.104002385

>>104002342
elon's always been a wet rag whenever someone ups one on him
>I'm starting to realize

Anonymous 01/22/25(Wed)21:24:21 No.104002389

>/mg/ - Model General

Anonymous 01/22/25(Wed)21:24:32 No.104002392

>>104002342
>both sides of the political spectrum are equally stupid
new here zoomer?

Anonymous 01/22/25(Wed)21:24:56 No.104002396

If I ask the LLM to generate for example 6 paragraphs, each paragraph would be shorter than if I had asked it to generate 4 paragraphs. Why is that and how do I circumvent this?

Anonymous 01/22/25(Wed)21:25:50 No.104002408

>>104002342
What's a government position or two when he already controls 90% of western space logistics, controls one of the biggest social media sites on the planet and has his own car brand that blatantly acts like a major spy network and records all their surroundings?

Anonymous 01/22/25(Wed)21:25:56 No.104002410

>>104002377
Actually based, thank you for responding to my half assed question with good information

Anonymous 01/22/25(Wed)21:26:06 No.104002411

>>104002389
The rig required to run deepseek at any reasonable quant and speed is like 10k minimum. I can't in good faith justify that investment when I might use 30c worth of tokens on a day of particularly heavy use. of R1.

Anonymous 01/22/25(Wed)21:26:27 No.104002413

>>104002301
It's all a show and they are all simply playing their roles.

Anonymous 01/22/25(Wed)21:27:02 No.104002416

>>104002383
>He wants the botnet to own every bit of information he enters into his software
It is cheaper, it's also terrible

Anonymous 01/22/25(Wed)21:27:35 No.104002421

>>104002389
/cmg/ - Chinese Models General

Anonymous 01/22/25(Wed)21:28:05 No.104002427

>>104002284
In the sense of the feeling I agree it does get there a bit from that guy's perspective, I just don't think it applies technically and would be a mislabeling of what actually happened. But we can save the semantic argument.

Anonymous 01/22/25(Wed)21:30:20 No.104002450

>>104002343
>>104002366 (checked)
Nup... Seems like everyone is on that sweet sweet data selling train. Hyperbolic seems like my best bet though, they provide a couple of additional opt out options. Thank anons

Anonymous 01/22/25(Wed)21:32:51 No.104002477

>>104002326
when it comes to coding speed isn't a factor for me when i know its going to give me a good output, but if coding is important for you then running deepseek (and other huge models in general) would be the goal. its so big though that your 128gb vram isn't shit compared to it lol. i'd try a low quant of it and see how it runs, t/s and such. once you have to split at all you're at the mercy of cpu/mem speed and it can be glacial especially with such a huge model. so you might try a low quant and realize that even if you bought more ram, the speed is simply to slow to deal with

Anonymous 01/22/25(Wed)21:34:20 No.104002501

>>104001658
i think ther first models was open to competition openAI and hurt them.
it could be possible that future advanced meta AI will be closed.

Anonymous 01/22/25(Wed)21:34:57 No.104002510

>>104002411
The same argument could've applied to any cloud model in the past year though. This thread is for weirdos who want to play with models on their own pc, for the sake of privacy, comfiness, finetuning, etc.
Even if your 680B model is cool, which is true, there's nothing to talk about since you can't tweak anything. No special samplers or alternate prompt formats or finetunes to argue about. To make things worse, not only do these helpless cloud users refuse to go to /aicg/ where they belong, they get angry at people who actually dare to use local models.

Anonymous 01/22/25(Wed)21:35:51 No.104002522

>>104002477
If it's slow but accomplishes the task somewhat well then it's fine, there's some weird shit I want to do that I don't think any normal person could do, and it's more for fun than business stuff

Anonymous 01/22/25(Wed)21:40:02 No.104002565

I hope someone implements a way for local R1 to search the internet on runtime during its thinking process. Like, it's thinking how to do something, realizes it lacks some information and then just does a search at that moment to get the information it needs.

Anonymous 01/22/25(Wed)21:40:54 No.104002576

>>104002408
Yeah. the initial mockery towards elon overpaying for twitter missed the forest for the trees. legacy media framed it as a financial blunder, but the real play was always about controlling the Overton window. look at 2016: trump’s entire campaign was turbocharged by organic reach on that platform, bypassing every institutional filter. fast forward to 2020, the same system abruptly censored the hunter biden laptop story days before the election. that single act proved the platform wasn’t neutral infrastructure, it was a political instrument. elon gutting the trust & safety teams and unbanning accounts like libsoftiktok didn’t “kill” twitter. it exposed how much of its prior moderation was just regime-approved narrative steering.

now consider the timing: X’s open algorithm shift coincides with trump winning in polls again. the left isn’t hysterical because elon’s a “nazi”, they’re terrified because he dismantled their curation pipeline. their playbook relied on controlling both the media layer and the platforms. X’s transparency on shadowbans, view counts, and community notes ruins that. when every bluecheck meltdown gets amplified by the algorithm itself, their moralizing can’t dominate the discourse.

advertisers pulling out? lawsuits? ADL smear campaigns? all predictable. the goal is to bankrupt X or force elon to resubmit to the old gatekeepers. but the fact they’re resorting to these tactics instead of competing ideologically tells you they know the terrain has shifted. whether X survives or not, the experiment proved centralized social platforms are the new political battleground. whoever controls the feed controls the narrative—and elon just demonstrated that even one actor with resources can dismantle a decade of institutional capture. that’s why they want him destroyed. not because he’s right or wrong, but because he broke the rules of the game.

Anonymous 01/22/25(Wed)21:41:58 No.104002595

>>104002565
That's tool calling, same with running code.
It ain't the most complicated thing to implement, and with context caching, you don't even need to create some special implementation directly into the backend, the frontend (or a middleware) can handle that.

Anonymous 01/22/25(Wed)21:43:18 No.104002608

>>104002522
i got like 20 programs i made just because i needed something specific and there wasn't a tool to do it, even tiny code models spit out good code no prob. you might not even need such large models. try a q6 of llama 3.3 70b for code and see how it does
https://huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF/tree/main/Llama-3.3-70B-Instruct-Q6_K
you might also benefit from non-gguf formats for speed, i dunno i never tried that exl2 stuff

Anonymous 01/22/25(Wed)21:43:49 No.104002612

>>104002576
Good take, an obvious take but a good way of framing it, I'd go further and say fundamentally democracy cannot really survive unless social media evolves to become less centralized, but that is in opposition to secret police and corporate goals, so we'll just see what happens I suppose

Anonymous 01/22/25(Wed)21:44:07 No.104002619

How many 'r's were there in your last output?

Anonymous 01/22/25(Wed)21:44:39 No.104002624

retard here,

On subsequent messages, does deepseek have access to the "reasoning" for previous messages in the context or does it only retain the final answer in context?

Anonymous 01/22/25(Wed)21:45:16 No.104002637

>>104002608
Not him, but the main problem with using models like that is they don't work well with aider without using the whole format and that doesn't work well. The only local model that supports the diff format and does well is DeepSeek.

Anonymous 01/22/25(Wed)21:45:43 No.104002642

>>104002624

Anonymous 01/22/25(Wed)21:45:45 No.104002644

>>104002624
no
https://api-docs.deepseek.com/guides/reasoning_model#multi-round-conversation

Anonymous 01/22/25(Wed)21:46:04 No.104002648

>>104002608
Do they handle obscure code well or just specific code? one of the things I'd like to do for example is take a program I enjoy on my commodore 128 and modify it with the functionality of a modern program for openBSD (it's written in machine code and this makes sense but I bet it sounds retarded without context)

Anonymous 01/22/25(Wed)21:47:03 No.104002658

>>104001658
Unless they do something utterly fucking insane like reaching 3.5 Sonnet quality in a 13B dense model, no
Llama still has an appeal to RAMlets and VRAMlets who can't run MoEs and don't wanna pay for DeepSeek, but otherwise they're so far behind DeepSeek I don't see them catching up for Llama 4

Anonymous 01/22/25(Wed)21:50:21 No.104002697

>>104002648
>Do they handle obscure code well or just specific code?
if by obscure you mean complex, or odd ways of handling things, yep. it seems to understand everything clearly as long as you also be clear in your instructions. from my usage i feel like code models have an easier time understanding more complex yet smaller code than they do something broader broken down into more functions

>>104002637
i still haven't tried that. i use kobold's basic ui and paste my code into blocks kek. its shit but works

Anonymous 01/22/25(Wed)21:52:26 No.104002722

>her eyes narrowed
>her lips pursed

Anonymous 01/22/25(Wed)21:56:51 No.104002779

>>104002382
Maybe I'll just buy a kit on amazon or something, then I can just return it if it doesn't work. Or find a local store that has a good return policy.

Anonymous 01/22/25(Wed)21:58:29 No.104002798

Did they change the R1 model on the api? It seems less smart somehow. Or did my expectations just rise over the past few days?

Anonymous 01/22/25(Wed)22:01:07 No.104002838

>>104002798
I think they lowered the temperature.

Anonymous 01/22/25(Wed)22:01:56 No.104002847

>>104002798
R1 was never that good, you probably were experiencing your honeymoon phase.

Anonymous 01/22/25(Wed)22:03:09 No.104002861

>>104002798
They just swapped in the open weights version after they got exposed

Anonymous 01/22/25(Wed)22:03:57 No.104002869

>>104002798
They're AB testing people between regular R1 and R1 Zero

Anonymous 01/22/25(Wed)22:05:49 No.104002895

Anonymous 01/22/25(Wed)22:07:00 No.104002907

>>104002823
You literally have no excuse not to have four 3090s at this point except being poor.

Anonymous 01/22/25(Wed)22:07:52 No.104002920

>>104002722
>her hips undulated

Anonymous 01/22/25(Wed)22:16:57 No.104003028

>>104002907
>Disdain for nomoney
Fuck I hate nu/g/ shills so much

Anonymous 01/22/25(Wed)22:19:06 No.104003048

>>104002510
I think it's fair since these discussions about what the models does well or poorly can be used by rich local fags, too. In fact it's what inspires me to build my own rig lol. Talking about closed models doesn't get me excited to build a server, only discussions on open models do

Anonymous 01/22/25(Wed)22:20:41 No.104003064

how much vram would you actually need to run the full r1 locally?

Anonymous 01/22/25(Wed)22:21:24 No.104003073

>>104002907
No thanks, I have a SFF build and 4 cards won't fir. Gonna buy the 96 GB VRAM RTX 8000 Blackwell instead.

Anonymous 01/22/25(Wed)22:21:28 No.104003076

>>104002907
I mean. That seems like a good excuse, GPUs are expensive and are just gonna get more expensive.

Anonymous 01/22/25(Wed)22:22:34 No.104003089

Does R1 support tool calling? It would be pretty neat if you could give the model a way to check if its code compiles before finishing its reply.

Anonymous 01/22/25(Wed)22:23:15 No.104003101

>>104001829
Magnum is still easily the best model to run locally for ERP.

Anonymous 01/22/25(Wed)22:24:20 No.104003116

>>104003101
never was

Anonymous 01/22/25(Wed)22:24:23 No.104003117

>>104003101
You clearly haven't tried the distilled models

Anonymous 01/22/25(Wed)22:24:56 No.104003123

>>104003101
Mogged by eva-qwen

Anonymous 01/22/25(Wed)22:26:58 No.104003145

>>104003117
I currently have the 70B loaded for programming but they all sound similar to the base models for ERP.

Anonymous 01/22/25(Wed)22:27:27 No.104003150

>>104003145
You are running the LLaMA3 prompt format.

Anonymous 01/22/25(Wed)22:27:45 No.104003157

>>104003089
Dearest Anon,
What makes you think any model is unable of tool calling? Just add the instructions for how to do it as part of your context and parse the output with your script of choice.
With love, another Anon concerned for your intellectual capabilities.

Anonymous 01/22/25(Wed)22:27:48 No.104003159

retard here. does deepseek r1 ONLY have an advantage over standard with respect to subjects that relate to math/logic, or are there positive spillover effects into stuff like creative writing? basically is it a good idea to always use r1 (when i say r1 i mean "deepthink" mode btw, in case i'm fucking up terminology)

Anonymous 01/22/25(Wed)22:28:06 No.104003164

>>104003028
Oh I don't mean disdain. I'm talking about all the other non-financial ridiculous reasons people say it's too hard to run 4 3090s.
>i-i need a threadripper and custom loop and server rack and flash in rebar support and 4000w psu and an electrician and and and and!

Anonymous 01/22/25(Wed)22:28:31 No.104003166

>>104002895
Based

Anonymous 01/22/25(Wed)22:29:31 No.104003174

>>104003164
The reasons you stated in your strawman greentext are all financial.

Anonymous 01/22/25(Wed)22:30:59 No.104003190

>>104003159
The local versions (30b) are garbo for creative writing. The schizo online versions can be pretty bright, though, in an Eraserhead sort of way.

Anonymous 01/22/25(Wed)22:31:05 No.104003192

>>104003048
I would be interested to hear from any "rich local fags" who run R1 locally. There is probably a lot that could be discovered vs api access. I wouldn't be surprised if they even complained it was worse because of lack of some massaging or special sampling the API does.
But that's not what we have here, only API consoomers defending their toy against people who dare to use the models they can actually ran. We don't need them to say a 680B beats a 14B model, we already knew that. I suppose a conclusive verdict of R1 vs similar size cloud models could be less boring, but /aicg/ is much better equipped to do that.

>>104003117
NTA but I have, I didn't see any significant improvement for roleplaying. Care to share whatever settings make you think they're so great?

Anonymous 01/22/25(Wed)22:31:45 No.104003196

who cares if you own 0 3090s or 6
we're all poor vramlets in the age of deepseek r1

Anonymous 01/22/25(Wed)22:31:56 No.104003201

>>104003159
Yes

Anonymous 01/22/25(Wed)22:32:51 No.104003209

>>104003076
Yeah. I should have rephrased my statement to say besides the cost of the gpus themselves everything else is simple and doesn't need nearly as much effort as others might believe. You dont need fucking water cooling for example kek

Anonymous 01/22/25(Wed)22:32:56 No.104003213

>>104003201
>random gemma in p2
>4o in third
mememark discarded

Anonymous 01/22/25(Wed)22:33:37 No.104003222

What API do you guys use, OR or the CCP one? OR just routes to the CCP host anyway, right? Are you guys fine with Xi having your prompts/logs?

Anonymous 01/22/25(Wed)22:34:49 No.104003231

>>104003222
Anon... This is lmg...
That being said, I'm using the cancer proxy

Anonymous 01/22/25(Wed)22:35:26 No.104003237

I'm trying the R1 llama distill more and it still hasn't gone schizo kino like EVA 0.0 has in the past. No RL step really ruins it. If a fine tune could make Llama be more creative then Deepseek should be able to that too. They just didn't choose to run the full R1 tune on it.

Anonymous 01/22/25(Wed)22:36:29 No.104003243

>>104003222
I use koboldcpp

Anonymous 01/22/25(Wed)22:36:39 No.104003245

>>104003192
>>104003159
Would be more efficient collectivize the model in a server for anons? We cannot pay the software, but some anon could pay for a architecture trough monero donations or some shit and then put their host here and let anons enjoy the cooming.

Anonymous 01/22/25(Wed)22:36:46 No.104003248

>>104003231
>proxy
Keep the stolen keys to the other thread at least.

Anonymous 01/22/25(Wed)22:38:35 No.104003262

>>104002143 (me)
... aaand we're out of tokens.

Anonymous 01/22/25(Wed)22:39:32 No.104003271

>>104003196
Not those of us with the brains to stack RAM instead.

Anonymous 01/22/25(Wed)22:39:34 No.104003272

>>104002642
>>104002644
ty

Anonymous 01/22/25(Wed)22:40:15 No.104003280

>>104003222
I just get it straight from the xource. But with that being said I'm pretty sure China already has your logs regardless of where you get them from.

Anonymous 01/22/25(Wed)22:41:13 No.104003287

A ROG Astral white 5090 costs $3000

Anonymous 01/22/25(Wed)22:41:50 No.104003289

>>104003262 (me)
Here's another one. They sure put a lot of effort into polishing this dataset. "We're super open and collaborative you guys!"

Anonymous 01/22/25(Wed)22:42:48 No.104003294

>>104002093
Rocinante v1.1

Anonymous 01/22/25(Wed)22:43:18 No.104003305

>>104003089
The api doesn't support "tool calling" with dedicated params in the way that some of the openai-compatible endpoints do, but the model is smart enough that if you just put some basic instructions into the prompt about expected output and give it your tool information in the context(ie the exact same thing that "tool calling" llms do with 1 layer of abstraction in the api), it'll figure it out.

Anonymous 01/22/25(Wed)22:44:37 No.104003321

>>104003222
i use openrouter because i want to make sure that both the feds AND the ccp get my logs for the sake of fairness

Anonymous 01/22/25(Wed)22:46:46 No.104003336

>>104003321
They will finally agree on how often Americans refer to their penises as "cock" or how often sexual literature includes mention of a man's balls.
Truly a monumental moment in surveillance history.

Anonymous 01/22/25(Wed)22:51:10 No.104003364

>>104003196
I actually feel very optimistic when I see huge open source models. It creates genuine demand for massive amounts of fast cheap memory, and someone is bound to fill the demand. I mean fuck, people STILL make and sell drugs despite the consequences, and AI is as strong a drug as anything. We WILL get cheap inferencing of R1 locally in the near future even if it the chips become fucking contraband.

Anonymous 01/22/25(Wed)22:51:47 No.104003368

Anonymous 01/22/25(Wed)22:53:27 No.104003377

>>104003368
POST TOK/S AND SPECS NOW!
The anon the other day didn't. Don't you blue balls me.

Anonymous 01/22/25(Wed)22:56:19 No.104003393

>>104003364
I what Nvidia is doing with digits particularly hard? It's just DDR5 soldered together right? What's stopping a Chinese company from doing that and selling it in bulk for cheap?

Anonymous 01/22/25(Wed)22:57:08 No.104003398

>>104003393
>What's stopping a Chinese company from doing that and selling it in bulk for cheap?
CUDA

Anonymous 01/22/25(Wed)22:57:18 No.104003400

how can I run deepsuck local?
what do I need to download?
pls halp bros

Anonymous 01/22/25(Wed)22:58:16 No.104003409

>>104003400
don't bother you need like 90 gb of vram to use the not-shit version

Anonymous 01/22/25(Wed)22:58:23 No.104003410

>>104003400
400gb of ram and patience at 0.02t/s

Anonymous 01/22/25(Wed)22:59:07 No.104003416

>>104003409
>like 90 gb of vram
God I fucking wish it was only that much.

Anonymous 01/22/25(Wed)22:59:16 No.104003418

>>104003400
>how can I run deepsuck local?
very carefully
>what do I need to download?
start with the model weights and maybe something to run them
>pls halp bros
why should we?

Anonymous 01/22/25(Wed)22:59:43 No.104003421

>>104003321

I still don't understand why people are paranoid about RP logs being read by glowies. It's fucking text, even if its cunny shit. Sounds like the guys in /sdg/ should be more paranoid about that crap and is more understandable to do all their gens via local.

Anonymous 01/22/25(Wed)22:59:57 No.104003423

>>104003393
Oh fuck you just reminded me of the other issue. Even if china sold it for cheap america would ban or tariff the shit out of it.
Regardless it's just R&D holding china back atm. Google, grok, and llama all use cheap specialized inferencing-only chips. So the technology to do so is already discovered but kept secret.

Anonymous 01/22/25(Wed)22:59:58 No.104003424

>>104003400
first you need to download like 8 H100s

Anonymous 01/22/25(Wed)23:00:39 No.104003428

>>104003421
>It's fucking text, even if its cunny shit.
They will gladly make an example out of you.

Anonymous 01/22/25(Wed)23:01:44 No.104003440

>>104003409
>>104003410
>>104003416
I only have 24GB vram
>to use the not-shit version
what version can I run?
and how shit is it compared to other models?

>>104003418
>why should we?
I said please

Anonymous 01/22/25(Wed)23:01:56 No.104003443

I am still on my knees... LMG

Anonymous 01/22/25(Wed)23:02:44 No.104003451

>>104003424
magnet link?

Anonymous 01/22/25(Wed)23:03:35 No.104003459

Why nobody talk about the 32 B model?

Anonymous 01/22/25(Wed)23:04:23 No.104003462

>>104003440
>>104003459
the small deepsexes waste a lot of tokens and aren't any better than the base models not worth it

Anonymous 01/22/25(Wed)23:06:14 No.104003479

>>104003462
right now I use:
>Mistral-Small-Instruct-2409-Q5_K_L.gguf
is this the best? any deepsuck better than this?

Anonymous 01/22/25(Wed)23:06:15 No.104003480

>>104003459
I tried the 70b and didn't like it.

Anonymous 01/22/25(Wed)23:06:32 No.104003482

>>104003440
look man unless you've got like 8 h100s there's really no point in running a llm locally
just use the chinese jew online one
if you want to use it for something you really don't want the feds (chatgpt) or the chink feds (deepseeker) to use then you're fucked unless you wanna spend 40k on gpus or $30 an hour hiring a cloud rig out

Anonymous 01/22/25(Wed)23:08:13 No.104003499

>>104003482
>thread is called "Local Models General"
>"there's really no point in running a llm locally"
wut?

Anonymous 01/22/25(Wed)23:08:43 No.104003502

>>104003499
>cutting off the first part of the sentence which explains when you would run locally
are you stupid?

Anonymous 01/22/25(Wed)23:08:43 No.104003503

DIGITS will save us

Anonymous 01/22/25(Wed)23:09:00 No.104003507

>>104003499
he's not wrong

Anonymous 01/22/25(Wed)23:09:14 No.104003510

>>104003440
you aren't running the full deepseek at home without tons of ram or video cards. set your standards lower to 70-123b

Anonymous 01/22/25(Wed)23:09:14 No.104003511

>>104003502
but I want to run it locally, so what can I run with 24GB of vram?

Anonymous 01/22/25(Wed)23:09:50 No.104003516

>>104002411
You are being logged though.

Anonymous 01/22/25(Wed)23:10:17 No.104003526

>>104003503
I, too, can't wait to run Deepseek R1 at Q2 for a cheap $7k after taxes

Anonymous 01/22/25(Wed)23:11:24 No.104003539

>>104003516

If its just Chink glowies doing the logging, does it matter? Digits can't come soon enough. ffs

Anonymous 01/22/25(Wed)23:11:37 No.104003541

>>104003503
DIGITS are a scam. You can only link 2 for 256GB, which can't even run DSV3 but still costs 6k. 4 DIGITS would cost 12k, still can only run R1 at Q3, and it'll be slow as shit having to run it over RPC.

Anonymous 01/22/25(Wed)23:12:46 No.104003551

>>104003541
4x DIGITS is enough for Q5 though. That'd be decent.

Anonymous 01/22/25(Wed)23:12:52 No.104003552

>>104003511
nothing that will give you decent output. that's what we keep telling you but you keep asking to be told it will work anyway.

Anonymous 01/22/25(Wed)23:14:35 No.104003566

>>104003552
>nothing that will give you decent output.
how do you know? also what "decent output"?
I run >Mistral-Small-Instruct-2409-Q5_K_L.gguf
locally and its good enough for what I need it for.
and I'm asking if there is something better now than that that I can run on my hardware.

Anonymous 01/22/25(Wed)23:14:50 No.104003569

>>104003551
Not if you want any usable context size.

Anonymous 01/22/25(Wed)23:15:26 No.104003577

>>104003541
What other option is there then? I get what you're saying but consider pic related, which will 100% be over $10k

Anonymous 01/22/25(Wed)23:15:34 No.104003579

>>104003566
>falling for le doomers

Anonymous 01/22/25(Wed)23:16:46 No.104003591

>>104003577
Buy 15 of your picrel. You're not poor, are you?

Anonymous 01/22/25(Wed)23:19:11 No.104003619

>>104003577
CPU build. Easy 1TB RAM for less than $8k.

Anonymous 01/22/25(Wed)23:20:02 No.104003626

>>104003566
>mistral small instruct
okay, it's all making sense
I'm happy for you, bud. If you're satisfied with... that, just keep using it.

Anonymous 01/22/25(Wed)23:22:50 No.104003645

>>104003591
I would probably an hero if my cunny RP machine somehow costs more than my house..

Anonymous 01/22/25(Wed)23:24:05 No.104003653

Has anyone gotten the distills to even obey their format after long chats? Like, if I start it in the middle of an RP after 20 turns, it doesn't even start with <think> any more. Instead it starts thinking halfway through its response and starts thinking in character as {{user}} instead of as an AI lmao. Talking about the 32b distill. It's pretty funny sometimes but I swear the format is correct as documented, or does it only work when you leave <think> blocks in from previous turns?
It seems to work okay when starting a new RP but holy god is it long-winded.

Anonymous 01/22/25(Wed)23:24:10 No.104003654

>>104003377

Anonymous 01/22/25(Wed)23:24:54 No.104003657

>>104003626
but I want deepsuck, or something thats better.

Anonymous 01/22/25(Wed)23:25:18 No.104003664

>>104003654
no way thats fully optimized yet

Anonymous 01/22/25(Wed)23:25:43 No.104003667

>>104002396
Anyone?

Anonymous 01/22/25(Wed)23:26:21 No.104003672

>>104003459
I tried it and didn't like it.

Anonymous 01/22/25(Wed)23:26:33 No.104003673

>>104003654
>2.5t/s running off 96GB VRAM and full DDR5
Where are the fags who thought they'd be getting 5t/s off an SSD right now? The ones coping about fast inference with DDR4?

Anonymous 01/22/25(Wed)23:26:37 No.104003674

Does AI really need to solve PhD problems to be sentient? AI companies spend billions and consume power plants amount of energy to try to be better than highly skilled humans, but is that really necessary for at least a decent companion? Who the fuck here makes a potential girlfriend do leetcode or write proofs to judge how good she is as a mate?
>Babe if you can't autocomplete my code then gtfo
My point is if we really wanted to make androids who talk and act just like real humans all these parameters being thrown at leetcode is just useless what we need is emotional intelligence, long term memory, 1-2hour long chains of conversation as opposed to terabytes of github source code. Since when did being human = solving tasks only 0.01% of the population can do???

Anonymous 01/22/25(Wed)23:27:09 No.104003680

if intel doesn't sell a b-series card with enough ram to run r1, it's time they went broke.

Anonymous 01/22/25(Wed)23:28:09 No.104003687

>>104003674
It needs to be able to write jokes.

I have yet to see ai invent a good mosquito joke.

Anonymous 01/22/25(Wed)23:28:25 No.104003690

>>104003674
bitches who think ants are real dont think ai is sentient even tho ai is smarter than ants

Anonymous 01/22/25(Wed)23:29:52 No.104003702

>>104003421
Even if something is not outright illegal if it's a potential grey area they can use it to get a warrant/harass you looking for other shit.

Anonymous 01/22/25(Wed)23:30:25 No.104003708

>>104003577
Real answer: Use R1 API since it's very cheap
Buy 5090 for Stable Didfusion

Anonymous 01/22/25(Wed)23:32:18 No.104003717

>>104003673
>>104003664
I'll be glad to hear any methods to make it faster. I choose to blame llama.cpp and wait for ktransformers. In the meantime, Q3KM has no issues as far as I can tell. It is unhinged and kino. That said I have not tried the API to compare.

Anonymous 01/22/25(Wed)23:34:42 No.104003743

>>104003673
does llama.cpp fully support everything with it? Is it even running it like a moe because 20B changing params should be many times faster just like wizard would be

Anonymous 01/22/25(Wed)23:37:03 No.104003762

>>104003673
I thought DDR5 got 10 t/s?

Anonymous 01/22/25(Wed)23:37:59 No.104003772

>>104003654
Nice, 2.6 t/s would be totally usable for me. I'm fine with anything over 2.

Anonymous 01/22/25(Wed)23:42:10 No.104003805

Ok I see

https://github.com/ggerganov/llama.cpp/issues/11333

Not everything is ready yet for deepseek

Anonymous 01/22/25(Wed)23:42:28 No.104003807

>>104003654
Prompt eval is STILL 28tok/s??? Even with quad 3090s? That means itll take forever to respond to nontrivial context sizes. Across all of lmg, locallama, and github I have yet to find a single cpumaxxer with decent prompt processing speeds or >10t/s text gen.
Unless ktransformers or some new inferencing engine comes out to speed up deepseek r1/v3. I'm gonna save my money.

Anonymous 01/22/25(Wed)23:42:51 No.104003811

>>104003772
>>104003762
>>104003743
>>104003673
It continues to rapidly degrade with more context. At 0 it is almost 5t/s. This is at 7k. At 16k I bet it is 1t/s but can not test because OOM.

Anonymous 01/22/25(Wed)23:43:06 No.104003814

>>104003743
r1 is 37b activated

Anonymous 01/22/25(Wed)23:43:15 No.104003815

And a bug as well?
https://github.com/ggerganov/llama.cpp/issues/11163

Yea, deepseek is not fully supported yet.

Anonymous 01/22/25(Wed)23:43:56 No.104003818

>>104003654
Thanks for the hard numbers, anon.
Was that running 671b at q8?

I have 4*3090s. Was maybe considering getting a couple more but would need a new motherboard to do so.
Was looking into ddr3 and ddr4 platforms because cheap. Might reconsider.

Anonymous 01/22/25(Wed)23:44:15 No.104003820

>>104003814
yes, but 17B always stay activated. About 20B are what change per token

Anonymous 01/22/25(Wed)23:44:29 No.104003821

>>104003811
>At 16k I bet it is 1t/s
I'm so glad I saw this before I sunk money into a DDR5 build

Anonymous 01/22/25(Wed)23:45:27 No.104003833

>>104003811
no coomer has ever needed more than 8k context
>but I—
nope. 8k is all you need.

Anonymous 01/22/25(Wed)23:45:31 No.104003834

did the chinks win?
I don't visit lmg much anymore and gave up on open source. Is deepseek actually cracked?

Anonymous 01/22/25(Wed)23:46:19 No.104003840

>>104003834
>Is deepseek actually cracked?
can you translate this from zoomer into english?

Anonymous 01/22/25(Wed)23:47:33 No.104003853

>>104003833
Deepseek will use 2k of that on thinking in one turn. Oh, and you get to wait for all that to generate too.

Anonymous 01/22/25(Wed)23:48:36 No.104003859

Potential Speed Improvements for DeepSeek V3 Inference via NUMA-Aware MoE Allocation

Deploying DeepSeek V3 or similar large MoE models with NUMA-aware expert allocation could yield significant performance gains through these mechanisms:

1. Memory Access Optimization
Localized Memory Access

Storing experts on local NUMA nodes reduces cross-node latency (typical NUMA latency varies 2-5× between local/remote accesses).
For CPU-bound inference scenarios, this could improve token generation speed by 10-25%.
Cache Utilization
Thread-core affinity binding improves L3 cache reuse:

If 30% of expert computations rely on cached data, this may reduce cache miss penalties by 15-30%.
2. Enhanced Compute Parallelism
Expert-Level Parallelism
For DeepSeek V3’s sparsely activated experts (e.g., 2/8 experts per token):

Co-locating experts on the same NUMA node reduces contention for memory bandwidth.
On dual-socket EPYC servers, this could achieve 20-40% throughput gains (aligned with GSPMD experiments).
Load Balancing
Dynamic routing (e.g., Switch Transformer) may cause uneven expert utilization:

NUMA-aware scheduling can redistribute hotspot experts to idle nodes, reducing tail latency.
3. Reduced Communication Overhead
Inter-Layer Data Transfer
All-to-All communication between MoE layers may consume 15% of inference time in cross-NUMA setups:
Prioritizing intra-node routing could cut cross-NUMA traffic by 30-50% (per TensorFlow NUMA guidance).

Thats one thing for sure.

Anonymous 01/22/25(Wed)23:49:16 No.104003863

>>104003853
You can just remove the thinking portion of a response from the context once it's finished
I expect the interfaces will add a way to do that automatically once we've had thinking models for longer

Anonymous 01/22/25(Wed)23:50:19 No.104003869

>>104003840
I thought you guys were zoomers so I used zoomer slang
is deepchink actually that good

Anonymous 01/22/25(Wed)23:50:30 No.104003870

>>104003863
You can do that with ST already.

Anonymous 01/22/25(Wed)23:50:35 No.104003872

>>104003853
Having to wait for the thinking to finish first is a huge problem though, you're right about that
2 t/s isn't acceptable on a thinking model the way it would be for a regular one since you're looking at potentially multiple minutes before the 'real' response even starts

Anonymous 01/22/25(Wed)23:51:08 No.104003875

https://github.com/DAMO-NLP-SG/VideoLLaMA3
https://arxiv.org/abs/2501.13106

Anonymous 01/22/25(Wed)23:51:57 No.104003884

>>104003870
nta but where is that option?
ST is visually hiding the stuff between <think></think> for me, but I think it's still including it in the submitted context

Anonymous 01/22/25(Wed)23:52:17 No.104003885

Any leaks or rumors about llama 4? Did they give any hint about the ETA? Was the development halted because of the lawsuits?

Anonymous 01/22/25(Wed)23:52:45 No.104003889

>>104003807
I'm not familiar with how llama.cpp works, but as far as I can tell, before tokens are generated, which I assume is the prompt eval period, I get 100% CPU usage on a single core, and about 125W usage on a single 3090, the other GPUs seem to be idle.
>>104003818
This is 671b at q3km with q5 context. Context is absurdly expensive.
>>104003834
Despite everything, it is cracked. I have 0 regrets.

Anonymous 01/22/25(Wed)23:54:08 No.104003899

>>104003869
>chink
sir, we do not tolerate that sort of slur here anymore
deepchina beat all the western corpo models and single handedly saved local

Anonymous 01/22/25(Wed)23:54:42 No.104003906

>>104003834
>>104003869
Yes but no one can run it local.
We are all in the R1 lite (the smaller model) waiting room

Anonymous 01/22/25(Wed)23:57:53 No.104003925

>>104003811
I have an idea to try to speed this up....I haven't even researched MoEs fully yet but if (thereotically) at least 30% of the entire model was used (at any point) as the active parameters across 95% of the most common erpg scenarios (think nala test, etc etc), then with 96gb vram you could intelligently store only these specific params, and use the cpu for inferencing the remaining 5% of cache misses. Has this been done before? Basically we make or find a dataset of erpg prompts, see what parameters keeps being lit up across these prompts, and if we can find at least 30% or less of the model's neurons activating on erpg stuff them bingo we put that shit on vram before the real inferencing begins. From then on you enjoy a massive speedup.

Anonymous 01/23/25(Thu)00:00:18 No.104003944

>>104003884
Regex. Maybe you're thinking of filters that just alter the display, but you can have it alter the chat itself, and set a start depth so it only does the previous ones. You can set it to keep it in the chat but not send them too if you want to be able to see but not send them.

Anonymous 01/23/25(Thu)00:00:40 No.104003946

>>104003925
(Me) : I basically want to make ktransformers but coomer version. They did optimizations that were coomer agnostic/good for general case and that works, but I want to take it even more extreme and optimize specifically for the cooming neurons.

Anonymous 01/23/25(Thu)00:01:46 No.104003956

>>104001926
So are these anons using a frontend for REAL r1 or is this just another "distilled r1"

Anonymous 01/23/25(Thu)00:03:55 No.104003974

>>104003811
That's a bummer. I was thinking of swapping my two A6000s into my other server with 256GB DDR4 and see how fast it goes but I think I'll wait until things are more optimized if it's this bad with DDR5.
At least there's stuff like multi-token prediction and maybe some optimizations when it comes to context on the horizon.

Anonymous 01/23/25(Thu)00:04:33 No.104003978

>>104003956
When anons talk about R1 it's a coinflip whether they're talking about the API version or the 14b

Anonymous 01/23/25(Thu)00:05:36 No.104003983

>>104003946
I don't see why you need to precompute what gets lit up in common scenarios. We already have algorithms for real-time cache tiering across many applications in IT from block storage to CDN.

Anonymous 01/23/25(Thu)00:07:24 No.104003999

>>104001303
>>104001570
I'm sorry, but what the hell is this? I swear I did everything as told here, but something is clearly missing. It always starts rambling off it's train of thought before giving me what I want.

Anonymous 01/23/25(Thu)00:08:27 No.104004010

>>104003983
Precomputing is useful to avoid having to do it every single session if all youre doing is more or less the same coom scenario.

Anonymous 01/23/25(Thu)00:08:56 No.104004012

>>104003999
that's what R1 and its distillations are supposed to do yeah, it's a reasoning model
if you don't like that don't use R1, stick with Nemo or Cydonia 22B or something

Anonymous 01/23/25(Thu)00:09:44 No.104004019

qwen 32B R1 is great, just needs the usual prefill

Anonymous 01/23/25(Thu)00:09:57 No.104004022

>>104004012
I mean, that's kinda cool, but why did anon recommend it to me then?

Anonymous 01/23/25(Thu)00:11:46 No.104004038

>>104004022
It's the latest overhyped trash that people will forget about by next week

Anonymous 01/23/25(Thu)00:12:32 No.104004048

GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models
https://arxiv.org/abs/2501.12956
>Large Language Models (LLMs) face significant deployment challenges due to their substantial resource requirements. While low-bit quantized weights can reduce memory usage and improve inference efficiency, current hardware lacks native support for mixed-precision General Matrix Multiplication (mpGEMM), resulting in inefficient dequantization-based implementations. Moreover, uniform quantization methods often fail to capture weight distributions adequately, leading to performance degradation. We propose GANQ (GPU-Adaptive Non-Uniform Quantization), a layer-wise post-training non-uniform quantization framework optimized for hardware-efficient lookup table-based mpGEMM. GANQ achieves superior quantization performance by utilizing a training-free, GPU-adaptive optimization algorithm to efficiently reduce layer-wise quantization errors. Extensive experiments demonstrate GANQ's ability to reduce the perplexity gap from the FP16 baseline compared to state-of-the-art methods for both 3-bit and 4-bit quantization. Furthermore, when deployed on a single NVIDIA RTX 4090 GPU, GANQ's quantized models achieve up to 2.57× speedup over the baseline, advancing memory and inference efficiency in LLM deployment.
posting for johannes

Anonymous 01/23/25(Thu)00:13:21 No.104004052

Every week China releases a new SOTA model for something (llm, video, whateverthefuck) and the west does nothing. Even cosmos was a flop

Anonymous 01/23/25(Thu)00:14:08 No.104004058

>>104004038
Goddamnit so my original guess about it being the latest meme was right.
What should I fuck around instead then, Cydonia I guess? Been using Mistral Small Inst the last few weeks, tried Nemo but stuck with Mistral.
>>104004052
Flood the market with mass produced shit, for better or worse.

Anonymous 01/23/25(Thu)00:14:21 No.104004060

>>104004052
>he isn't simulating a virtual world where his waifu is real using foundational world model nvidia cosmos right now

Anonymous 01/23/25(Thu)00:14:25 No.104004061

>>104003999
You let it get past the rambling and then it turns kino. Simple really.
Personally, I delete the ramblings a couple turns later.

Anonymous 01/23/25(Thu)00:14:52 No.104004063

>>104004048
I want this but coomer version. DELETE ALL NON HORNY NEURONS. I don't need my wife to code, she needs to talk about vidya with me!

Anonymous 01/23/25(Thu)00:16:53 No.104004079

>>104004061
I doesn't seem bad, but having to deal with this waste of letters every message seems a tat fucking much, I'm not that desperate.

Anonymous 01/23/25(Thu)00:17:14 No.104004082

>>104004048
I'm retarded but isn't this basically what exl2 is doing?

Anonymous 01/23/25(Thu)00:20:30 No.104004109

>>104003687
This is the ultimate benchmark

Anonymous 01/23/25(Thu)00:21:43 No.104004115

>>104003687
True human creativity on display right there.

Fuck math or logic puzzles, gimme a good joke!

Anonymous 01/23/25(Thu)00:22:26 No.104004118

>>104004063
With that paper about the correlation of skills between experts in MoE, removing coding skills from your waifu would pretty much lobotomize her logic skills which might be what you're looking for, but, you never know. Just be sure not to make your waifu too much of a retard, anon.

Anonymous 01/23/25(Thu)00:23:36 No.104004125

>>104004118

Anonymous 01/23/25(Thu)00:24:42 No.104004131

>>104004118
That is true....TITANS actually referenced this issue and found a way to fix it so maybe I'll have to wait until then.

Anonymous 01/23/25(Thu)00:28:47 No.104004161

>>104003064
So far 350GB or so. No joke

Anonymous 01/23/25(Thu)00:30:30 No.104004171

>>104003064
R1 is at 700GB q8. You need 350GB to run at q4 and 170GB to run at q2 (very retarded)

Anonymous 01/23/25(Thu)00:35:29 No.104004211

>>104004171
I've run 70b models at Q2 before, it's fiiiiiiiiiiiiiiine... I'm sure a model as big as R1 would be alright.

Anonymous 01/23/25(Thu)00:40:42 No.104004252

reading the CoT chains for silly rp content is absurdly funny and cool
often more entertaining than the RP itself

Anonymous 01/23/25(Thu)00:44:01 No.104004265

>>104004252
>Let's go with hairy
INSANELY FUCKING BASED
I will give you one thing, it's surprisingly funny to read the thoughts of a mad man like this.