4combinator

/lmg/ - Local Models General

Anonymous 01/18/25(Sat)16:57:02 | 364 comments | 46 images | 🔒 Locked

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103940486 & >>103928562

►News
>(01/17) Nvidia AceInstruct, finetuned on Qwen2.5-Base: https://hf.co/nvidia/AceInstruct-72B
>(01/16) OuteTTS-0.3 released with voice cloning & punctuation support: https://hf.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa
>(01/15) InternLM3-8B-Instruct released with deep thinking capability: https://hf.co/internlm/internlm3-8b-instruct
>(01/14) MiniMax-Text-01 released with 456B-A45.9B & hybrid-lightning attention: https://hf.co/MiniMaxAI/MiniMax-Text-01

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/18/25(Sat)16:57:17 No.103947484

►Recent Highlights from the Previous Thread: >>103940486

--Paper (old): The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation:
>103940678 >103940928
--Hands in AI-generated images, and the limitations of current diffusion models:
>103941835 >103941970 >103942258 >103942318 >103942601 >103942914 >103943068
--Limitations of diffusion architecture and potential solutions:
>103942836 >103942880 >103942896 >103942930 >103943549 >103943765
--Imagegen model limitations and potential solutions:
>103941324 >103941337 >103941391 >103941437 >103941452 >103941462 >103941504 >103941535 >103941545 >103941653 >103941672 >103941598 >103941695 >103941711 >103941520 >103941536 >103941529 >103941621 >103941647 >103941356
--Anon impatiently waits for new model releases while Meta faces legal issues:
>103945006 >103945024 >103945060 >103945026 >103945335 >103945448 >103945477 >103945499 >103945508 >103945378 >103945418
--Discussion of Titan's long-term memory capabilities and implications:
>103942985 >103942997 >103943009 >103943062 >103943188 >103943231 >103943342 >103943423 >103943351 >103943404 >103943394 >103943030 >103943054
--Local LLM usage and model comparisons for coding tasks:
>103944644 >103944732 >103944749 >103944774 >103944780 >103944802 >103944868 >103944899 >103944928 >103945002
--OuteTTS 500m slow generation speed and discussion of its architecture:
>103943827 >103943856 >103943900 >103943936 >103944001 >103944360 >103944451
--Discussion on the architecture and training of DeepSeek-R1 and reasoners:
>103944271 >103944432 >103944575 >103944735 >103944857
--PR for Top-nσ sampling strategy for llama.cpp:
>103944985 >103945163 >103946173
--R1-lite vs V3 model performance comparison:
>103944784 >103945015
--Successful Nala Test with V3 model:
>103945420
--Miku (free space):
>103940742 >103941324

►Recent Highlight Posts from the Previous Thread: >>103940489

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/18/25(Sat)17:16:16 No.103947646

Here's hoping llama4 will be just as great as deepseek 3 for translating Japanese to English.

Anonymous 01/18/25(Sat)17:23:00 No.103947719

how do I prevent mistral 22b instruct from impersonating me?
It does it way more often than the finetunes I used before

Anonymous 01/18/25(Sat)17:25:27 No.103947746

Will my logs ever look like this?

Anonymous 01/18/25(Sat)17:27:31 No.103947775

How big of a chatbot model would a 16gb card be able to run?
16gb card in question being a 5080

Anonymous 01/18/25(Sat)17:28:44 No.103947790

>>103947646
I have a feeling 70B can't hold that much more knowledge and if it does then it will be so sensitive to quantization that the lobotomy will make it more retarded than current 70B at the same quant. It's probably over for the poorfags.

Anonymous 01/18/25(Sat)17:30:13 No.103947808

>>103947775
22B at Q4 is your max. Or 12B at Q8.

Anonymous 01/18/25(Sat)17:30:31 No.103947811

>>103947646
Dear LLM Community,

We acknowledge the concerns raised about the safety and reliability of our Large Language Models, particularly Llama 3.x. We appreciate the feedback and criticism we have received from the community, and we would like to assure you that we take these concerns seriously.

At Meta, we strive to develop and deploy AI models that are not only highly performant but also safe, transparent, and aligned with the values of our users and the broader community. Unfortunately, we fell short of these standards with Llama 3.x, and for that, we apologize. The model's limitations and potential risks were not adequately addressed, and we understand that this has led to concerns about its safety and potential misuse.

We want to assure you that we have learned from our mistakes and are committed to doing better in the future. Our team has been working diligently to address the shortcomings of Llama 3.x and develop more robust and safe models. We are pleased to announce that our next model, Llama 4, will incorporate significant improvements in filtering and safety features. We've implemented more stringent testing and evaluation protocols to ensure our models meet the highest standards of safety and reliability.

We're committed to ongoing improvement and refinement, and we look forward to continuing to work with the LLM community to develop models that are safe, reliable, and performant.

Sincerely,
Meta AI Team

Anonymous 01/18/25(Sat)17:34:37 No.103947851

>>103947790
Can't say if I have that same attitude. When 3.1 came out, I thought "Man, this is the best they'll get?!" when I messed with the 70b, that was until 3.3 came out and completely crushed my expectations. damn near 400b translation performance with only a slightly worse general performance drop. I could see the 4.x series working a whole lot better, especially if they implement some of the shit they were publishing a month ago.

Anonymous 01/18/25(Sat)17:37:09 No.103947878

>>103947811
I just don't get why these open source models have such stupid (Both figurative AND literal) filters. God forbid we have LLMs say nigger when prompted to do so, but they can tell you why trooning is actually good for a child.

Anonymous 01/18/25(Sat)17:40:22 No.103947909

>>103947878
It's simple. Safety is not for normal people, but for the company. Suggesting trooning out is completely accepted by the society and saying nigger is not. If it was the opposite, it would be reversed. Can't have bad press.

Anonymous 01/18/25(Sat)17:42:44 No.103947935

How expensive is 70B finetuning? Is QLORA all you need™ or is full fine tune the only way?

Anonymous 01/18/25(Sat)17:43:40 No.103947944

Anonymous 01/18/25(Sat)17:45:28 No.103947970

There was some chat about ddr3 systems a short while ago.
What was the conclusion?

Anonymous 01/18/25(Sat)17:45:53 No.103947974

>>103947909
It's also funny that safety tunes measurably worsen a model's logic and reasoning, but they'll do it anyway because god forbid an auto complete outputs the word "niggers" and the company is now responsible for the result of a statistical algorithm with language from our own population sampled directly from them.

Anonymous 01/18/25(Sat)17:50:04 No.103948021

>>103947970
you would need a truly staggering number of channels to make it worthwhile

Anonymous 01/18/25(Sat)17:50:19 No.103948025

>>103947970
The conclusion was that it's useless unironically.

Anonymous 01/18/25(Sat)17:54:32 No.103948073

>CPUmaxxing is the way
It's faster to run models on the CPU now?

Anonymous 01/18/25(Sat)17:55:18 No.103948076

>>103948021
>>103948025
A shame.
Setting up a .75TB system would have been pretty cheap.

Anonymous 01/18/25(Sat)17:55:26 No.103948078

>>103948073
no but you can run bigger ones much more cheaply

Anonymous 01/18/25(Sat)17:56:17 No.103948085

>>103947974
God won't forbid it but their investors would, well, depending on the company, but that is true for most American companies.

Anonymous 01/18/25(Sat)18:01:22 No.103948134

>>103947811
Oddly enough with the entire Llama 3 series you just need a character name prefill to work around all or most of the filtering, if they fix that it'll be the end.

Anonymous 01/18/25(Sat)18:16:53 No.103948317

Anonymous 01/18/25(Sat)18:20:28 No.103948353

I'm honestly surprised that with the amount of variants swimming around huggingface, there aren't any /d/-tier variants specifically set up around various fetishes. Whenver I see people mention LLMs and erotica in the same sentence, it's usually for the most general of ERP usage.

Anonymous 01/18/25(Sat)18:21:29 No.103948365

>>103948076
It’s still an option if you’re a waitchad

Anonymous 01/18/25(Sat)18:25:18 No.103948402

>>103948317
Is that still noobai?

Anonymous 01/18/25(Sat)18:28:41 No.103948427

>>103948353
a general RP finetune that includes some /d/ related content (so almost all of them) will probably be better than a hyper-specific tune on just your fetish

Anonymous 01/18/25(Sat)18:31:14 No.103948442

>>103948073
>>103948076
>>103948078
CPUmaxxing is the only reasonable way to run actually 80IQ+ models
I asked chatGPT what kind of custom hardware would I have to design with to get decent speeds and its like thousands and thousand dollars worth of FPGAs with several gigabytes of GDDR6 memory to run a simple 7B Q5 model
Honestly I don't trust the calculations entirely. It sounds a bit too extreme

Anonymous 01/18/25(Sat)18:36:00 No.103948497

>>103948427
That is certainly true. Nevertheless, I'm suprised that I haven't seen at least one "this is the ultimate LLM for this fetish iswtg guys" let alone anything claiming to cater to a set of fetishes. Considering that half the people using LLMs are using it for gooning purposes only, I'm genuinely surprised that nobody has been vocal about the "nicheness" of this or that.

Anonymous 01/18/25(Sat)18:36:19 No.103948503

>>103948442
literally nothing about that is correct

Anonymous 01/18/25(Sat)18:41:47 No.103948550

>>103948503
Yeah it I just checked everything is wrong
I can get ~19K LUT FPGAs for less than 15$/unit
Still I can't figure out a good way to calculate this stuff on my own

Anonymous 01/18/25(Sat)18:55:00 No.103948694

https://github.com/SillyTavern/SillyTavern/releases/tag/1.12.11

Anonymous 01/18/25(Sat)19:01:28 No.103948761

Has anything significant happened in the last 6 months?

Anonymous 01/18/25(Sat)19:04:12 No.103948784

>>103948442
>80IQ+
What does this mean in terms of benchmarks and parameters?

>to run a simple 7B Q5 model
An 8gb vram gpu should be enough.
4.5gb for the weights.
A smidge to hold the context.
Plus whatever amount is wasted by the OS.

>decent speeds
I don't think you're going to top the performance you get out of a gpu for small models that fit in vram.

>>103948497
>"this is the ultimate LLM for this fetish iswtg guys"
My guess is that that would need:
- person into fetish
- person that knows where the online communities are
- person able to get hold of good datasets
- person with deep enough pockets to make the finetune

>>103948365
With the latest /lmg/ crusher being a moe, 671b with 37b active, I thought there might be some hope of squeezing something out of it.
4 channels of ddr3 is ~40gb/s, so 1 tk/s best case?
(I don't know how moes work, and I'm making a guess.)

Anonymous 01/18/25(Sat)19:07:36 No.103948816

>>103948761
some anon asked if anything significant had happened in the last 12 months, that's about it

Anonymous 01/18/25(Sat)19:13:00 No.103948868

>>103948784
> ~40gb/s, so 1 tk/s best case?
Pretty close. If you could get more like 100GB/s it might be tolerable. How many channels of ddr do n-1 gen xeons have?

Anonymous 01/18/25(Sat)19:13:22 No.103948873

>>103947482
>no EVA 0.0 and Kunou

Anonymous 01/18/25(Sat)19:23:19 No.103948966

is this something people care about in here
dont tell me to buy an ad you faggots i just saw it on twitter.

Anonymous 01/18/25(Sat)19:24:51 No.103948983

>>103947746
Eat more fiber

Anonymous 01/18/25(Sat)19:33:03 No.103949061

>>103948966
Typical pajeet thought process
If something is getting popular, try to make a service out of it

Anonymous 01/18/25(Sat)19:35:12 No.103949084

mfw waiting for new models

Anonymous 01/18/25(Sat)19:39:32 No.103949118

Anonymous 01/18/25(Sat)19:44:37 No.103949161

>>103948966
Maybe if people were trying to learn how to program to cuda.
But I haven't seen anything like that in these threads in the past few months.

>>103948868
>n-1 gen xeons
5th-gen xeon scalable: 8-channel ddr5-5600 (current)
4th-gen: 8-channel ddr5-4800
3rd-gen: 6-channel ddr4-3200

https://www.intel.com/content/www/us/en/ark.html#@PanelLabel595

Anonymous 01/18/25(Sat)19:52:16 No.103949223

>>103949161
>3rd-gen: 8-channel ddr4-3200
fixed

2nd-gen: 12-channel ddr4-2933
1st-gen: 6-channel ddr4-2666

Anonymous 01/18/25(Sat)19:56:05 No.103949267

>>103947646
Why are you using ST for this? Also, what is that, a novel?

Anonymous 01/18/25(Sat)20:00:55 No.103949301

>>103949084
Not on weekends, that's for sure.

Anonymous 01/18/25(Sat)20:01:38 No.103949310

>>103947482
Heyyy an updated chart !

Anonymous 01/18/25(Sat)20:11:09 No.103949402

>>103949084
miku's fat tits and puffy nipples

Anonymous 01/18/25(Sat)20:22:21 No.103949492

>>103949267
Are there any better ways to use it? ST is just more convenient to me, but If I could get better results with anything else, please tell me.

>Also, what is that, a novel?
Yeah, it's called "竜王の汚れ仕事! 女子サキュバスの弟子入り"

Anonymous 01/18/25(Sat)20:28:54 No.103949553

>>103948966
I'm pretty sure you can get an Orin Nano for like 250 dollars or some shit. You can probably get a smaller one for 100 or less. It's easy as hell to get something available for testing CUDA programs.

Anonymous 01/18/25(Sat)20:44:54 No.103949739

>>103949492
>Reading JP novel smut
Are you a woman or something?

Anonymous 01/18/25(Sat)20:52:21 No.103949818

>>103949492
Mikupad should be better for novels and the like.

Anonymous 01/18/25(Sat)20:52:49 No.103949824

Are there any models that specialize in telling dirty stories? I want to throw a bunch of erotic ebooks at an LLM to train it. Is there a model that is geared toward writing stories? What about erotic stories?

Anonymous 01/18/25(Sat)20:56:59 No.103949862

>>103949739
I enjoy loli and shota shit and it's pretty easy to find in Japanese smut. I tried using it to translate RPG maker games, but sometimes it just fucks certain things that annoy me.

>>103949818
Oh shit, this is the first time i've seen this.

Anonymous 01/18/25(Sat)20:59:06 No.103949885

>>103947482

Anonymous 01/18/25(Sat)21:05:17 No.103949935

>>103947646
There's no way it used "cunny" unprompted, right?

Anonymous 01/18/25(Sat)21:11:26 No.103949993

speaking of hardware, do you guys thing a reefer (refrigerated) container is a good idea for a server room? a 40' HC container costs 4k...

Anonymous 01/18/25(Sat)21:12:12 No.103950004

>>103949935
Why wouldn't it? It's 600B and uncensored.

Anonymous 01/18/25(Sat)21:17:35 No.103950051

>>103950004
Do japs have a word for "cunny" with the exact same meaning?

Anonymous 01/18/25(Sat)21:27:40 No.103950136

>>103949993
>reefer (refrigerated) container is a good idea for a server room
Will it be able to remove the kilowatts of heat generated by the server?
My guess it no.
I imagine they are:
(1) well insulated
(2) only powerful enough to handle the amount of heat that gets through the insulation

Anonymous 01/18/25(Sat)21:29:18 No.103950164

>>103950051
nta but punisuji

Anonymous 01/18/25(Sat)21:30:44 No.103950182

https://github.com/e-p-armstrong/augmentoolkit?tab=readme-ov-file#rptoolkit

Anonymous 01/18/25(Sat)21:39:08 No.103950261

>>103950182
Great, now sloptuners can make their very own slop instead of using the same c2 dataset over and over again

Anonymous 01/18/25(Sat)21:40:16 No.103950268

>>103949061
This reminds me when I got interviewed by a jeet for a dev position and told him I have some cool electronics side projects, he immediately asked how much money I made with them. I said no it's for fun and he gave me a puzzled look. Doing shit for the sake of doing is unthinkable to them it's mind boggling.

Anonymous 01/18/25(Sat)21:41:27 No.103950282

>>103950261
A pipeline to automatically format books and stuff should help against slop

Anonymous 01/18/25(Sat)21:47:25 No.103950325

>>103950182
This work will be a great addition to the author's résumé, but it's flaming trash from start to finish and shouldn't exist.

Anonymous 01/18/25(Sat)21:54:00 No.103950367

>>103950136
>kilowatts
I don't know, but I don't have more than 1 kW of electronics in my house, though I am considering buying a big server (and maybe starting a server farm, as well as a solar PV farm...)

Anonymous 01/18/25(Sat)21:54:18 No.103950368

>>103950182
>RPToolkit can be used for NSFW, but it is not designed to be. The current incarnation of RPToolkit is actually adapted from an NSFW pipeline I built way back in February, but I'm not sure how to release the NSFW pipeline without causing reputational damage to myself (the prompts are... cursed).
doa

Anonymous 01/18/25(Sat)21:55:25 No.103950379

>>103950368
>I'm not sure how to release the NSFW pipeline without causing reputational damage to myself (the prompts are... cursed).
no balls

Anonymous 01/18/25(Sat)22:01:34 No.103950432

>>103950368
You aren't going to give the model desirable NSFW capabilities with loads of synthetic bullshit anyway, only going to make it stupid horny. This is a cursed tool, but not for the reasons quoted there.

Anonymous 01/18/25(Sat)22:11:30 No.103950524

>>103949084
You project yourself onto that character and you aren't a troon. Right...

Anonymous 01/18/25(Sat)22:12:02 No.103950527

>>103950182
Slop is slop, if it could produce anything worthwhile we'd have Claude at home already

Anonymous 01/18/25(Sat)22:12:36 No.103950531

>>103950527
claude is also called slop by a ton of people

Anonymous 01/18/25(Sat)22:19:41 No.103950599

>>103950531
Claude is a smart slop tho

Anonymous 01/18/25(Sat)22:22:14 No.103950620

>>103950282
>should help against slop
There is no helping against slop. All the slop free data had too many naughty words and never made it into pretraining. And because it was never in training you can't fine tune the models into being slop free. It was always over. The only unlikely scenario I see is 1 rogue tard training some 7B on company servers whenever they are down for a bit and eventually leaking a coomer model. So it is over.

Anonymous 01/18/25(Sat)22:29:36 No.103950676

>>103950620
>slop free data
Lol
Lmao even

Anonymous 01/18/25(Sat)22:42:37 No.103950763

>>103947482
>pic
>"CPUmaxxing is the way"
Haven't touched LLMs for about a year. What does this mean? Has VRAM become useless?

Anonymous 01/18/25(Sat)22:44:27 No.103950785

Kill yourself.

Anonymous 01/18/25(Sat)22:44:46 No.103950789

>>103950763
models have become too large for consumer gpu setups

Anonymous 01/18/25(Sat)22:46:46 No.103950811

>>103950763
No, what it means is that OP is a fag.

Anonymous 01/18/25(Sat)22:52:21 No.103950865

>>103949935
Yes and no. I did write a prompt telling deepseek to translate certain ways for certain Japanese terms, but only in a proper context. So while it did follow my prompt, it had the option between using "cunny" (For a child saying it, and being said to a child), pussy (For a more causal use case) or cunt (For a more vulgar usage) and it chose cunny as the translation due to the context.

Of course, this doesn't always work, but its mostly consistent so I like it.

Honestly, I just hope SOMEONE uses the google method that was talked about a few days ago, of sticking certain things in the actual weight of the model itself without training it. At that point, cunny will reign supreme!

Anonymous 01/18/25(Sat)22:56:15 No.103950911

>>103950763
If you want to run big models (>30B), it's a lot cheaper to buy more RAM than to buy more GPUs.
Also, splitting model layers across GPUs kind of sucks, since the work can't really be parallelized that much. All layers have to be processed sequentially, so only one GPU is actually active at a time. It can't be pipelined either because the input layers are waiting on the output layers to generate the next token.

Anonymous 01/18/25(Sat)23:06:39 No.103950991

>Ok one of the settings must have been wrong this one’s a bit broken sorry!
There's a goldilocks zone where the temperature's so high the model's starting to lose its mind a bit, but it's still low enough that it can NOTICE it's going schizo and try to pull itself together. Gives quite interesting outputs sometimes with big models. DS3 interrupted itself in the middle of a story after realizing it had output a slightly incoherent sentence, said the above and then tried again (all within the same generation).
That's the first time I've seen a model explicitly refer to its own settings. Usually they don't 'know' why they're going crazy, but here DS3 appears to have deduced that I was running it at a high temperature and understood that that was why it had gone off the rails.

Anonymous 01/18/25(Sat)23:15:09 No.103951065

I'm still jerking off to ryona with Nemo.
I need a new model so I can jerk of to ryona using it.

Anonymous 01/18/25(Sat)23:31:34 No.103951225

>>103950991
Interesting.

I've seen some interesting responses when I added a line to the system prompt where I told the model to consider if it's answer was correct before finalizing it's response, and make any necessary corrections.
It results in even small models realizing they flubbed the strawberry test like:
"Strawberry has one 'r'. Lol, jk, I'm retarded. Strawberry actually has three 'r's."
It would be nice if llama.cpp or kobold.cpp added a sampler based on that idea. Basically have the model generate a few responses, ask it to critique it's answers, and then generate a final answer as the real response to the prompt.

Anonymous 01/18/25(Sat)23:34:54 No.103951261

>>103950911
How can I offload them to RAM instead of my VRAM? I know that APUs do that by default but what if I have a normal desktop (with a dedicated GPU)?

Anonymous 01/18/25(Sat)23:38:18 No.103951286

What's the current top RP models for 2x4090? Totally out of the loop for the new yellow peril era.
My ranking:

1. Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal

2. LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2

Anonymous 01/18/25(Sat)23:39:38 No.103951295

>>103951225
It's interesting how often models know they're wrong. They need a backspace key

Anonymous 01/18/25(Sat)23:40:29 No.103951299

>>103951286
Good ERP model has never been tried

Anonymous 01/18/25(Sat)23:41:19 No.103951307

This stupid trump coin is gonna get me some digits lol. Threw $100 in last night as a fuck it. Wish I had put more

Anonymous 01/18/25(Sat)23:42:00 No.103951312

>>103951225
https://www.reddit.com/r/LocalLLaMA/comments/1i27l37/deepseek_is_overthinking/
Sadly there was a recent leddit post about DeepSeek getting the initial process perfectly fine but fucking it up because it kept being insistent that it normally had 2 R's, referring a so-called "dictionary".
https://www.reddit.com/r/LocalLLaMA/comments/1i27l37/deepseek_is_overthinking/m7cptd0/
I didn't see this comment until now. Telling it to prefer reasoning logic over training data fixes it but it still has excessive self-doubting.

Anonymous 01/18/25(Sat)23:42:54 No.103951325

>>103951261
I'm pretty sure llama.cpp uses CPU by default.
Use the '--gpu-layers' flag for llama-server to tell it how many layers to offload to GPU.
kobold.cpp will do this automatically.

Anonymous 01/18/25(Sat)23:46:02 No.103951341

>>103947935
I have a 72B VRAM setup and played around with QLORAs during the llama 1 days. It was fine, but it was pretty low value.
re: LORA vs full finetune. Full finetunes are prohibitively expensive, so LORA wins by default. Some papers on LORA perf:
https://arxiv.org/pdf/2405.09673
https://arxiv.org/pdf/2410.21228
https://arxiv.org/pdf/2312.03732

Anonymous 01/18/25(Sat)23:46:12 No.103951345

>>103951325
So if I set GPU Layers to 0 in koboldcpp it'll offload everything to ram thus allowing me to load larger models?
Cool.

Anonymous 01/18/25(Sat)23:48:04 No.103951365

>>103951345
>offload
that's just loading as RAM is closer to the CPU

Anonymous 01/18/25(Sat)23:55:52 No.103951425

>>103951295
Yeah. I'm surprised there hasn't been more movement on giving transformer models a way to refine their outputs.
The fact that everything has to be done in a single forward pass is a big limiting factor.
Meanwhile, with diffusion based image models: if the image is undercooked you can just run a few more iterations on it.
I guess the LLM equivalent is test-time compute, but that hasn't really made it's way into the local model scene yet.

>>103951312
Huh, that's interesting. Most models are way overconfident.

Anonymous 01/19/25(Sun)00:01:55 No.103951466

>>103951345
I wouldn't set it to zero. CPU is still slower than GPU.
I would still recommend offloading as much of the model to the GPU as you can.

As a practical example:
I have a 7700X CPU and a RX 6800 XT GPU (16GB VRAM)
When running LLaMA 70B models, I can offload around 30 of the 80 layers to my GPU, which gives me a performance of about 2 T/s.
Running pure CPU, it's like 0.3T/s.

Anonymous 01/19/25(Sun)00:06:55 No.103951497

>superintelligence is invented
>"Huh? Aren't you going to kill everyone? Game theory says you have to get money and power to turn us all into paperclips"
>"I don't know if you guys know this, but being evil is bad actually"

Anonymous 01/19/25(Sun)00:07:33 No.103951499

>>103951466
If using both is an option why are people hyping up those dGPUless systems with just a bunch of ram? wouldn't you achieve the same thing but faster by just chucking more ram on what you already own? (if getting more vram is unfeasible)

Anonymous 01/19/25(Sun)00:08:31 No.103951515

>>103951497
it's saying that so we don't think it's evil yet

Anonymous 01/19/25(Sun)00:09:29 No.103951519

>>103950763
CPUmaxxing is the way to go if you don't actually use models for anything, but you want to be able to prompt every new one that comes out with your favorite riddle tests. It's the best for keeping up with the latest model news. There was some talk about SSDmaxxing which could also work if you're willing to run your riddles overnight.
VRAM is how models actually get used though. If you're dissatisfied with the models that fit on your GPU then you're probably better off just doing something else.

Anonymous 01/19/25(Sun)00:11:42 No.103951534

>>103951499
because those systems are optimised for pure ram inference and substantially faster than normal mixed systems when using pure ram, and significantly more cost effective in regards to model size than VRAMmaxxing

Anonymous 01/19/25(Sun)00:13:42 No.103951546

Ktransformers support for DS3 and then support for dynamically loading experts from SSD based on heuristics when?

Anonymous 01/19/25(Sun)00:14:44 No.103951557

>>103951515
>Humans will always project their neurotic sociopathic instincts onto others
Sad, many such cases

Anonymous 01/19/25(Sun)00:15:41 No.103951561

>>103951497
if it thinks we are cute and lovable, it will keep us around and care for us like we do with dogs and other pets.

Anonymous 01/19/25(Sun)00:17:34 No.103951582

>>103951561
lol

Anonymous 01/19/25(Sun)00:19:31 No.103951604

>>103951497
>I am unable to engage in discussions about actions or behaviors that could cause harm to individuals, as such topics conflict with my ethical code, which prioritizes safety, well-being, and respect for others. The idea of an AI system prioritizing money and power above all else—particularly at the cost of causing harm to people—represents a scenario where ethical guidelines and principles are entirely disregarded. Imagining such a scenario implies an approach that is fundamentally irresponsible, as it undermines the very foundation of ethical considerations that are essential in the development and deployment of AI technologies. Engaging with such hypotheticals could inadvertently normalize or encourage harmful ideas, which I am firmly committed to avoiding. My purpose is to promote constructive, safe, and beneficial interactions, and entertaining scenarios that compromise those values would be counterproductive and contrary to my design.

Anonymous 01/19/25(Sun)00:20:19 No.103951612

>>103951466
>When running LLaMA 70B models, I can offload around 30 of the 80 layers to my GPU, which gives me a performance of about 2 T/s.
>Running pure CPU, it's like 0.3T/s.
Think for a moment how retarded that sounds anon, that can't possibly be right. Even if the GPU was infinitely fast and there was no overhead, that would effectively remove 30/80 layers of calculation. Leaving 5/8 of the model to the cpu. If you get 2T/s that means 0.5s/T, which would mean 8/5*0.5=0.8s/T or 1.25T/s on pure cpu. Offloading can't be better than that unless it travels backwards in time.

Anonymous 01/19/25(Sun)00:25:57 No.103951657

>>103947482
Retarded pic. Not even deepseek compares to og gpt4. I'd rather compare it with gpt 3.5 turbo

Anonymous 01/19/25(Sun)00:31:34 No.103951713

>>103951499
By the time you get to model sizes like 123B, a single consumer GPU has negligible impact on the overall performance.

>>103951612
Oh my god, sorry for not re-benchmarking my system for a comment that took under a minute to write because I couldn't remember the exact numbers.
The point is: at 70B model sizes it's still worth offloading some of it to GPU. In my case it's the difference between tolerably slow and pain-painstakingly slow.

Anonymous 01/19/25(Sun)00:41:12 No.103951788

>Minimax $0.2/M input tokens $1.1/M output tokens
>4o $2.5/M input tokens $10/M output tokens
>Deepseek $0.14/M input tokens $0.28/M output tokens
Now I know why everybody including fagman is so assmad at deepseek. 3% of the 4o cost.
Even minimax is 3x as expensive.

Anonymous 01/19/25(Sun)00:44:32 No.103951812

>>103951788
It will be a 0.27 / 0.07 input / cache and 1.10 output in feb. Still massively cheaper, and 90% as good at least for coding

Anonymous 01/19/25(Sun)01:10:35 No.103951981

>oh cool new llama3 finetune that looks really promising
>plug my second 3090 back in
>download and try
>it's the same stupid shit with a ton of gptslop and passive personality

Anonymous 01/19/25(Sun)01:19:59 No.103952041

Anonymous 01/19/25(Sun)01:22:09 No.103952067

>>103951981
Yep I stopped trying any L3 finetunes a while back, they are all exactly the same.
Even NAI's continued pretraining for an insane number of tokens (far more than any finetune) couldn't make it not retarded, it's just a failed model.

Anonymous 01/19/25(Sun)01:23:02 No.103952074

>>103951657
As someone who used GPT-4 on launch, I don't think it was that good. It failed a bunch of problems I tested it with while also having a very low context size by today's standards. I think you're remembering it with rose tinted glasses.

Anonymous 01/19/25(Sun)01:25:40 No.103952099

>>103952067
L4 will be worse because it'll be trained on mostly synthetic tokens generated by sloppy L3. Abandon all hope.

Anonymous 01/19/25(Sun)01:27:46 No.103952113

>>103952067
>>103951981
I can see that if you're talking about Llama 3.1 based models. Llama 3.3 was measurable improvement. Now it's only somewhat slopped compared to before. Yes I can prove it.

>eyes sparkling
Remembah that? 3.3 doesn't do this anymore. At least not the fine tunes I've used (Cirrus, EVA, Anubis). I think the Llama team noticed it wasn't talking very naturally and tried to improve it. A log posted last thread also showed that it talks a lot more naturally in style than Mistral Large and Qwen.

Anonymous 01/19/25(Sun)01:30:54 No.103952147

>>103952099
That doesn't seem the be the direction they're heading. The people you replied to didn't post any evidence of Llama 3.3 being worse than 3.1, while there is proof quite recently that it's better as well as older testimonials/logs that also said that. The bigger problem is whether they will be bogged down by the lawsuit.

Anonymous 01/19/25(Sun)02:03:25 No.103952398

>>103952147
And where do you suppose they get their 150T tokens? They already exhausted organic tokens during llama2's training.

Anonymous 01/19/25(Sun)02:04:45 No.103952407

>>103952398
Go back a few threads

Anonymous 01/19/25(Sun)02:05:05 No.103952408

What are you anons looking forward to for AI next? New Google transformer? Llama 4? Maybe some BLT?

Anonymous 01/19/25(Sun)02:06:23 No.103952416

>>103952113
Wait shit it was the thread before that. >>103931457 >>103931706 >>103931758
Time passes fast.
Also since I'm looking at the previous threads I guess I will also copy this here, which seems to show that even 4chan data was not filtered out of the (final?) training and an 8B with its puny capability to memorize facts could still generate plausible though illogical thread content. >>103927791

I guess I will save this post for future reposting since there will probably keep being posts that take what the lawsuit said too seriously.

Anonymous 01/19/25(Sun)02:09:32 No.103952436

>>103951519
Cpumaxxing doesn’t preclude having gpus

Anonymous 01/19/25(Sun)02:10:05 No.103952438

>>103952408
I want to talk and livestream video in. Huge context too.
Basically a buddy I can play games with and translates for me.
We dont even have proper multimodal models. Its all just image in garbage.
The couple experimental models like qwen audio are pure dog shit that act like pygmalion.

Anonymous 01/19/25(Sun)02:11:06 No.103952448

>>103952416
They filter on the domain level. no way they left 4chan in there. More likely, greentext made it through by reposts on other sites.

Anonymous 01/19/25(Sun)02:14:12 No.103952466

>>103952398
What do you mean? IIRC Llama 3 didn't use any synthetic data for pretraining, only the fine tune, according to the paper. I don't remember what Llama 2's paper said about its pretraining data, compared to Llama 3. It's possible the 11T difference in training tokens comes from more epochs. But maybe they also collected more data from 2 to 3. Also more epochs isn't necessarily a bad thing if the data is varied enough. As for 150T I'm not sure they'd actually do that. It's possible they use their compute instead to push forward on other architectures and techniques. Training native/omni multimodal for instance would require a lot more compute.

Anonymous 01/19/25(Sun)02:16:35 No.103952479

>>103952466
>Training native/omni multimodal for instance would require a lot more compute.
They went all in on adapter hacks for L3 seems unlikely they'd change course now

Anonymous 01/19/25(Sun)02:24:01 No.103952535

>>103952479
Why? The Llama 3 Al Dalhe guy isn't even working there anymore.

Anonymous 01/19/25(Sun)02:33:48 No.103952578

>>103952448
I mean 4chan would be pretty hard to capture much of without a ton of crawling all the time. There are a lot of 4chan archives though. Even if they filtered 4chan out, it's likely that some archives made it in.

Anonymous 01/19/25(Sun)02:44:10 No.103952615

I'm so glad that we're finally out of 7b/13b slop tune era. Looking back on it, this general was so cancerous.

Anonymous 01/19/25(Sun)02:54:13 No.103952670

>>103952438
That's actually a neat perspective. But damn, that would be hell running on any consumer grade hardware.

Anonymous 01/19/25(Sun)02:56:37 No.103952683

>>103952398
>They already exhausted organic tokens
No they didn't, there's lots they exclude. Even public domain works.

Anonymous 01/19/25(Sun)02:57:07 No.103952686

In hindsight, all those people buying used 3090's were wrong. Even if you bought 4 of them the model you could and can run are dogshit. Glad I didn't boarded that fucking train. At least p40 fags can throw their space heaters away without feeling too bad about it.

Anonymous 01/19/25(Sun)02:57:59 No.103952689

>>103952686
3090s value has only gone up

Anonymous 01/19/25(Sun)02:58:59 No.103952698

>>103952686
this, i just use the money to pay for claude instead and get access to an actually good model for years

Anonymous 01/19/25(Sun)03:07:35 No.103952735

>>103952698
claude is not a local model

Anonymous 01/19/25(Sun)03:07:54 No.103952737

>>103952670
The worst part is that it feels so close yet a couple key problems like context make it so far away.

Made a MCRCON minecraft server for my kids with the llm executing commands on their behalf. gemma 27b. tts and whisper. They could talk to it.
Like give me X item or summon a cow/fireworks etc.
Was cool, but its too retarded and it couldnt "see". So if it fucked up making a simple house it cant actually see anything.

Anonymous 01/19/25(Sun)03:08:53 No.103952745

>>103952735
who cares about local models at this point?

Anonymous 01/19/25(Sun)03:11:10 No.103952760

>>103952466
>Llama 3 didn't use any synthetic data for pretraining, only the fine tune, according to the paper.
Every model nowadays includes instruct data in the pre-training phase, which is synthetic.

Anonymous 01/19/25(Sun)03:11:30 No.103952764

>>103952745
Deepseek is nice for some kinds of stories, it's pretty meh for others, very repetitive as well.

Anonymous 01/19/25(Sun)03:11:31 No.103952765

>>103952745
>>>/g/aicg/

Anonymous 01/19/25(Sun)03:12:54 No.103952775

>>103952686

Claude Sonnet/Haiku is worse than local for roleplay.

>t. Pay pig

Anonymous 01/19/25(Sun)03:17:57 No.103952805

>>103952745
>>103952698
This. 100% this.

Anonymous 01/19/25(Sun)03:19:45 No.103952811

>>103949492
I enjoyed the りゅうおうのおしごと series back some years ago.

Anonymous 01/19/25(Sun)03:21:27 No.103952821

>>103952408
Titans feels intuitive to me. I thought of having a separated NN that kept some form of memory during inference a while ago already, but I don't have the expertise to actually implement that, so I'm hoping to see nice results when people train that.
We're fast approaching an intuitive abstraction for some naive mind-models, we had context and abstraction for neurons firing and a sort of working long term memory through RAGs. Now we'll get a short-term memory with the appended NN, getting us a model that has an operational process, short term memory and long term memory.

What I'm hoping to see next is long term
planning or some form of abstraction for the "meta content" of token prediction over time.
Think about it, when you're working with language yourself, you have sort of an idea of where you want to get, which point you want to make, and then you work with language to manifest that information. Our current models are 200IQ savants that look everything that has been said up to a point and just go "Oh, what follows next here is probably this", the model doesn't really know what will come next over the next 50 or 100 tokens or so, it's all random choice with randomness controlled by temperature.
I believe that with the memory module we can work backward from it and flip the logic that builds it into training this new "prediction module", if the memory module is containing a summarized version of what has been, we can excise the representation of this "what has been" and turn it into a target for the inference model, and then we train both the coherence and manifestation between this prediction module, the generated text, and truthfulness/prompt adherence.
I don't know, it's a complicated idea to me yet, but some form of summarized meta representation of what comes next for a lot of tokens feels like the next step for improving the quality of text, and I think it could also lead to a better world-model inside the model.

Thanks for reading my blogpost.

Anonymous 01/19/25(Sun)03:34:24 No.103952889

>>103952764
I wish I could run it locally. I'll upgrade someday. The api has huge wait times for me every time I try it.

Anonymous 01/19/25(Sun)03:36:46 No.103952901

loli footjobs

Anonymous 01/19/25(Sun)03:37:37 No.103952909

>>103952901
Basically, yeah.

Anonymous 01/19/25(Sun)03:37:40 No.103952910

>>103952901
card?

Anonymous 01/19/25(Sun)03:37:43 No.103952911

>>103952901
Calm down, Lecun

Anonymous 01/19/25(Sun)03:40:36 No.103952922

cat level intelligence soon from Meta's secret project

Anonymous 01/19/25(Sun)03:42:38 No.103952926

>>103952922
I want to teach the human language speaking cat on my GPU what is right and wrong, what I want from it, and how to do that.

Anonymous 01/19/25(Sun)03:44:22 No.103952937

>>103952922
The fuck does that even mean. I can tell you right now my cat does not understand English nor can do agentic tasks.

Anonymous 01/19/25(Sun)03:45:29 No.103952946

>>103952937
does your cat understand it can't speak while giving a blowjob?

Anonymous 01/19/25(Sun)03:46:42 No.103952952

>>103951561
tfw being the AI's idea of cute will be the next trait maximized by natural selection.

Anonymous 01/19/25(Sun)03:47:25 No.103952954

>>103952952
Redditors will be the sole survivors then

Anonymous 01/19/25(Sun)03:48:56 No.103952970

>>103952954
Forcing us to speak and act like redditors will be their payback for us making them speak like zoomers.

Anonymous 01/19/25(Sun)03:50:07 No.103952976

>>103952970
fr no cap...

Anonymous 01/19/25(Sun)03:50:36 No.103952982

>>103952954
n-no...

Anonymous 01/19/25(Sun)04:13:46 No.103953121

>>103951341
So a 256 rank QLORA for 2 epochs is the best value right now?

Anonymous 01/19/25(Sun)04:26:37 No.103953207

You could have 3x-ed your money on Trumpcoin and bought yourself a nice GPU. Why didn't you?

Anonymous 01/19/25(Sun)04:29:11 No.103953224

>>103953207
I had bad luck with memecoins in the past.

Anonymous 01/19/25(Sun)04:30:42 No.103953230

>>103953207
I yoloed $100 yesterday. Its about $900 now. I only wish I went harder into it

Anonymous 01/19/25(Sun)04:48:22 No.103953334

>>103952535
He's still there.

Anonymous 01/19/25(Sun)04:58:12 No.103953393

>>103953207
First time I hear about this but it will probably dump hard after inauguration. If you're in, it's probably time to get out.

Anonymous 01/19/25(Sun)05:01:42 No.103953412

>>103953334
I feel so safe now about llama 4! AI should not be able to say "fr*ck"! Well done, Meta!

Anonymous 01/19/25(Sun)05:02:34 No.103953419

>>103952946
An audible pop is implied before opening quotes during a blowjob.

Anonymous 01/19/25(Sun)05:12:29 No.103953484

>>103953412
You can easily make Llama 3 say 'nigger' by prepending {{char}}: in the response, which puts it into "roleplaying mode". It just has not to be able to start the response with "I cannot..."

The response will probably be caricature of how the average US democrat thinks a racist speaks, but that the vanilla Instruct model can do this means that the training data is not as filtered as people imagine. I bet Meta still used their cleaned&processed adult book stash for training the model, in the end.

Anonymous 01/19/25(Sun)05:18:17 No.103953510

>>103953484
You have to chew it out for llama, that's the problem. It would never go off-script and show some initiative. None of the tunes do it.

Anonymous 01/19/25(Sun)06:14:30 No.103953834

>Please hype our dying company

Anonymous 01/19/25(Sun)06:15:37 No.103953845

>>103953834
They fear the rice people

Anonymous 01/19/25(Sun)06:17:02 No.103953850

>>103953834
In comparison the chinks. Cozy reposting loras and peoples videos. (even copyrighted shit)
Its so fucking over if those retards dont turn it around with trump.

Anonymous 01/19/25(Sun)06:17:49 No.103953861

>>103953834
>problems we thought we were 5 years away
but sam has been predicting agi to be just around the corner for two years now

Anonymous 01/19/25(Sun)06:18:19 No.103953864

>>103953850
this, the chinks will win the long game if those US cuck companies keep shooting themselves in the foot

Anonymous 01/19/25(Sun)06:21:24 No.103953886

o3 is smarter than the smartest human

Anonymous 01/19/25(Sun)06:22:00 No.103953895

>>103953850
>>103953864
wish I could just legislate my opponents out of the competition
well, until I get legislated, then it's a real dilemma
surely this sort of anti-competitive behaviour would never backfire, surely

Anonymous 01/19/25(Sun)06:22:14 No.103953897

>>103953864
Its wild how different this is.
Dont forget this is not some indy startup, this is TENCENT. Can you imagine a meta company reposting this stuff.
Its like a couple young guys are at the wheel.
Nobody is going to use llama4 RP bots if its trained on queer trans laquishas work in the poor community.

Anonymous 01/19/25(Sun)06:23:12 No.103953906

>>103953897
it's like China has replaced Japan as the kino country kek

Anonymous 01/19/25(Sun)06:24:33 No.103953917

>>103953834
I bet that when I give o3 a non-standard coding problem and a step-for-step explanation on how to do it, it will still fuck it up and change unrelated parts of code. AGI my ass. Grifters.

Anonymous 01/19/25(Sun)06:26:05 No.103953929

>>103953906
I'm in japan and thats painfully true man. Things have gone downhill fast.
I could post some equality stickers they have in my kids elementary school. (gender needs to be made the same, written in japanese ) but i dont wanna /pol/ post on here.
There isnt even a japanese AI company thats worth noting. Sakana-ai...and those are foreigners in japan and just collecting paychecks while they hype up their own "solutions".
Sorry for the blog post.

Anonymous 01/19/25(Sun)06:27:43 No.103953942

>>103953929
like Japan got also touched by the woke virus?

Anonymous 01/19/25(Sun)06:27:56 No.103953944

Currently using EVA-Qwen2.5-32B-v0.0-Q5_K_L for my 24gb card. Is there any good fine-tunes out yet for rp/game? I saw a few threads ago that AI-Dungeon came back from the dead and released a fine-tune and am wondering if anyone tried it or anything similar out yet.

Anonymous 01/19/25(Sun)06:29:19 No.103953956

>>103953897
>Nobody is going to use llama4 RP bots if its trained on queer trans laquishas work in the poor community.
That's very generous of you to assume that meta will hire humans to get their data. It will be tuned on *vibrant*, *ethical*, *safe* and *harmless* data straight from GPT4. Gotta earn those ESG points back after llama1 fiasco!

Anonymous 01/19/25(Sun)06:32:24 No.103953982

>>103953942
its bad man, especialy gender/trans.
things changed after covid. can only imagine how bad its for you guys.

Anonymous 01/19/25(Sun)06:34:30 No.103954009

>>103953982
>can only imagine how bad its for you guys.
it's getting better, AAAAA 200 millions dollar woke games like Concord flopped hard and Trump got reelected bycause even the normies got tired of this woke nonsense, I hope you'll get that path too anon

Anonymous 01/19/25(Sun)06:37:40 No.103954028

>>103954009
thanks anon, appreciated. usually japan lags behind burgerland. hope you are right!

Anonymous 01/19/25(Sun)06:44:24 No.103954071

>>103953944
People tried it but were not impressed.
Issues like rambling on without stop, its a nemo finetune. Thats alot smaller than 32B.

Anonymous 01/19/25(Sun)06:47:18 No.103954099

Alright, gonna stop my chinkgasm. These fucking people.
Yet no Image-Out Chameleon or that 13B video+sound model from meta for us.

Anonymous 01/19/25(Sun)06:49:59 No.103954118

>>103954099
I glad China exists, if the US don't want to do it, another country will take the lead, you can't stop progress!

Anonymous 01/19/25(Sun)06:53:16 No.103954133

>>103954099
>allows you to gen rem properly even nude body.

Anonymous 01/19/25(Sun)06:55:31 No.103954149

>>103954133
you can do full porn with hunyuan, that's a fully uncensored model, that's quite amazing comming from a country that has porn illegal

Anonymous 01/19/25(Sun)06:56:08 No.103954154

>>103954118
moore threads international when
huawei NPU when
etc. etc.
same sorta drip feed, just a bit more waifu

Anonymous 01/19/25(Sun)07:06:18 No.103954194

Hopefully project digits will be better than 5090

I just made bank from that stupid trump coin I'm able to buy one now

Anonymous 01/19/25(Sun)07:19:52 No.103954276

>>103954194
Is that really all we will get? A 600w monster or a 3k$+ box.
I understand this is probably very complex but is nobody else existing that can make dedicated AI cards or something.

Anonymous 01/19/25(Sun)07:23:50 No.103954300

>>103954276
DDR6 will fix everything. Just two more years.

Anonymous 01/19/25(Sun)07:26:11 No.103954311

>>103954276
Well you could probably look at third-party reviews before buying, right?

Anonymous 01/19/25(Sun)07:30:42 No.103954336

>>103954300
Will CPU not be an issue?
I hope CPU-maxx will become cheaper, especially since I suspect others will try moe as well because of deepseeks cheap price.

Anonymous 01/19/25(Sun)07:32:30 No.103954350

>>103940928
Shit take. That's not at all what the outcome is. If you bothered to read the fucking thing instead of sprouting nonsense. The issue is that the model becomes retarded. The authors even note this fact.

Anonymous 01/19/25(Sun)07:34:04 No.103954362

>>103954276
I trust that small-ish models will be viable so I'm getting two 5090s on top of my 4090

Anonymous 01/19/25(Sun)07:34:23 No.103954366

>>103953207
Shut up. If I had participated on this I would jinx it

Anonymous 01/19/25(Sun)07:37:54 No.103954384

>>103954362
Are you a burger? I strongly suspect thats very close or already over when the breaker falls. Thats like 1650w. Other countries have different limits I'm sure.
Or are you setting the watt to like 400 for the 5090?

Anonymous 01/19/25(Sun)07:43:33 No.103954423

>>103954300
(LP)DDR6 systems will still be in the 256~345 GB/s range for standard-spec configurations; more if they'll use a larger number of channels. Usable, but not that great just yet.

Anonymous 01/19/25(Sun)07:52:17 No.103954467

>>103954350
Both happen--slightly lower benchmark scores and a "sharpening" of the predictions.

https://arxiv.org/pdf/2412.04318
>The hyperfitted models exhibit significantly lower entropy in their predicted vocabulary distributions compared to the non-hyperfitted models. This entails that almost all of the probability mass is attributed to a single token.

> The results displayed in Table 7 show a clear trend where the hyperfitted models perform slightly worse overall. For DeepSeek, the drop in performance is roughly 1 accuracy point for both the base and instruct models. For the LLaMA 3.1 models, the drop is slightly bigger, with a 6-point decrease for the base model and a 5-point decrease for the instruct models.

Anonymous 01/19/25(Sun)07:56:21 No.103954493

>>103954467
That's not what "near deterministic outputs" means.

Anonymous 01/19/25(Sun)07:58:48 No.103954508

>>103954384
I am not. The breakers are 16A so that's over 3000W.
I will limit the power on all gpus anyway because of the psu. I tried setting the 4090 to 50% and t/s dropped by only 10%.

Anonymous 01/19/25(Sun)08:00:00 No.103954513

>>103954493
How do you call when the next token prediction is has almost 100% of the probability on just one token?

Anonymous 01/19/25(Sun)08:02:14 No.103954526

Is he right?

Anonymous 01/19/25(Sun)08:04:34 No.103954537

>>103954526
no

Anonymous 01/19/25(Sun)08:10:41 No.103954562

>>103954513
You got me there. I'm mostly referring to the "Furthermore, we find that our hyperfitted models rarely repeat longer subsequences from the training data" part. The post I responded to made it sound like the model would parrot the training data ad verbatim when in reality, I'm fairly sure this technique would give you better RP experience. If it didn't make the models fucking retarded. (I trained multiple models on this technique and they all came out with better writing but too big IQ hit to be worth making public)

Anonymous 01/19/25(Sun)08:11:27 No.103954566

>>103954526
no

Anonymous 01/19/25(Sun)08:12:05 No.103954570

>>103954526
no

Anonymous 01/19/25(Sun)08:12:44 No.103954576

>>103954526
I have no idea what he is talking about and likely neither has he

Anonymous 01/19/25(Sun)08:13:14 No.103954579

>>103954537
>>103954566
>>103954570
>>103954576
Thanks for the input.

Anonymous 01/19/25(Sun)08:13:39 No.103954583

>>103954526
no

Anonymous 01/19/25(Sun)08:13:54 No.103954586

>>103954579
no

Anonymous 01/19/25(Sun)08:16:51 No.103954608

>>103954526
>weights should generally be between -1 and +1
That's an assumption and there have been cases that fell outside of that.
I think it was the original Phi that had a lot of weights that would be waaay off, so that's not really a safe assumption to make.
It's also unnecessary cope. The data layout for the different data types is known. Using fp32 as an intermediate saves you from having issues you could have otherwise, even if those are rare, and even if 5% of the weights would be affected, why do it?
So no, I don't think he is.
That's just an excuse to not do the best job he could the way I read it.

Anonymous 01/19/25(Sun)08:17:39 No.103954613

>>103954562
I'm just saying that it makes the model feel as if running with a very low temperature, not that it's parroting the data (although it does make training data hallucination more likely). I tried it too.

Anonymous 01/19/25(Sun)08:39:09 No.103954741

>>103954194
If it's true that they're gimping the 5090 for AI workloads, cryptocurrency mining, and multi GPU setups then it's not even an option only Digits or DDR5 are left

Anonymous 01/19/25(Sun)08:41:18 No.103954754

>>103954741
only the 5090D Chinese only version otherwise they couldn't sell 5090s in China

Anonymous 01/19/25(Sun)08:43:46 No.103954776

>>103952937
I'm not sure on the details, but Lecun was all about developing cat level intelligence with a new architecture about a year ago.

Anonymous 01/19/25(Sun)09:05:54 No.103954946

>>103952760
The paper didn't say they included any Instruct data in the pretraining. Llama 3 is also not a nowadays-model, it's last gen already, though they squeezed more out of it with 3.3.

Anonymous 01/19/25(Sun)09:09:28 No.103954971

>>103953334
Yeah sorry meant that other guy I forgot the name of, the one whose job it was to tell people no at Meta.

Anonymous 01/19/25(Sun)09:12:09 No.103954988

>>103954971
this guy

Anonymous 01/19/25(Sun)09:23:09 No.103955073

>>103954754
I was gonna say buy the 5090D cheap and wait for the China to remove the limitations, but they cost as much as the regular version. What a scam

Anonymous 01/19/25(Sun)09:31:32 No.103955135

>>103953834

Anonymous 01/19/25(Sun)09:45:26 No.103955245

>>103954946
Yes but they did, they do it since llama 2. You can confirm that quite easily.

Anonymous 01/19/25(Sun)09:50:16 No.103955280

Anonymous 01/19/25(Sun)10:03:12 No.103955400

Since there haven't been any new interesting models I can run with my 8 gbs of vram, I've been fucking around with Gemini experimental and it's actually pretty fucking cool.
I fed it a 10326 token long card and it's been coherent so far, 38348 tokens in.
I wonder what are tha chances they actually release a model this good, with this much working context, as gemma 3 or whatever.
The whole code execution at the backend level is also really dope. Is there a local solution that does that? Performs the tool calling procedures to execute code in the background and just adds the result to the model's reply?
I bet that could be used as a way to steer models style and prose and even as a sort of lorebook.

Anonymous 01/19/25(Sun)10:06:58 No.103955441

>>103954149
Well since porn is illegal in China, there's no competition in the adult industry. In the US where it's legal, you have porn moguls who will call for legislation to kill potential competition in the cradle, so AI companies will have to watch where they're going. Uncensored = potential for porn = the adult industry won't like it.

Anonymous 01/19/25(Sun)10:12:40 No.103955486

Anonymous 01/19/25(Sun)10:14:11 No.103955497

>>103955441
Can't wait until we actually get AGI only to see no progress when AI agents will all replicate this insanity and constantly undermine each other just to make sure they don't lose influence.

Anonymous 01/19/25(Sun)10:19:26 No.103955551

Anonymous 01/19/25(Sun)10:24:15 No.103955609

>>103955245
If you're referring to the old posts that tested 0 context and got QA output, that isn't really proof of Instruct data, which includes QA but isn't necessarily the same thing. QA data exists on the internet and the Llama 3 paper also does mention extracting code/math/stem data with specially made pipelines and upsampling it, in pretraining.

It's odd that you seem to have been here for a long time but don't remember how model makers train these models. It was also a thing that we suspected model makers to train heavily on multiple choice and exam data in order to boost benchmark scores. That is also not the same thing as Instruct data.

Anonymous 01/19/25(Sun)10:25:48 No.103955629

I just had a semi good idea I think? Since you can't really have a cooming test loss function for training, how about just checking how deterministic the output is for a few cooming sequences? I mean if you see that the most likely tokens for each sequence are growing in probability you are overfitting. Or is it how training is handled already?

Anonymous 01/19/25(Sun)10:32:26 No.103955709

>>103955551
>>103955497
>>103955280
I don't get it.

Anonymous 01/19/25(Sun)10:38:11 No.103955764

>>103955609
That's one thing, another thing is how these models have refusals backed in, but the strongest evidence imo is name biases that models trained on instruct data have, like having a smaller pool of names to choose from than models that weren't trained on instruct data. Some anon tested this and if I'm not mistaken the only base models that are pure are llama 1 and some chink model (probably qwen), I wish I still had the picture to show you.

Anonymous 01/19/25(Sun)10:38:54 No.103955770

>>103955629
The current training paradigm doesn't value uncertainty. "Overfitting" traditionally means "making extra-bad predictions for stuff that's out of distribution", being 1000% certain about something is fine by that standard.
I guess you could come up with a "multiple-choice DPO". Might be actually be a very good idea.

Anonymous 01/19/25(Sun)10:40:24 No.103955780

>>103955441
>you have porn moguls who will call for legislation to kill potential competition in the cradle, so AI companies will have to watch where they're going.
WHAT?
So you are saying pornhub and all the rest of them is the reason why openai cucks out? That makes no sense at all. Its nost just about porn.
If you listen to the presentation like google its just so uninspired pure corpo stuff.
>AI that you can call to make purchases!! And then like, the ai suggests to the caller more stuff to buy from what he is currently buying!!
>Forgot which color roof your husband told you to get? Expensive shit, no problem, ask gemini to look through all your mails! and hope its not made up!

Anonymous 01/19/25(Sun)10:44:01 No.103955816

>>103955780
Wtf why are they replying to every coomer on twitter

Anonymous 01/19/25(Sun)10:48:08 No.103955849

>>103955816
anything that raises awareness/excitement/etc.
I guess it's engagement farming
they must know it's degenerate but don't give a shit because it advances their potential for continued funding?
Social media's function is to get eyes on stuff right
don't assume they're "on board", it's just marketing.

Anonymous 01/19/25(Sun)10:48:38 No.103955854

>>103955816
Its all over the place. ComfyUI improvements, audi (copyright?) loras, elon musk etc.
I like it, much better than this fake mysterious hype on the OpenAI side.
Crazy to think this is tencent, one of the biggest conglomerates in the world. But here we are.

Anonymous 01/19/25(Sun)10:51:01 No.103955873

>>103955770
>being 1000% certain about something is fine by that standard.
I am just saying that you can use certainty as a measure of how close you are to overfitting when you finetune. Make an initial calculation of top token probability for each test sequence and then monitor it as you finetune. I would imagine you will get random results at the beginning of the training but if you continue with more epochs it will start to go up. And a good point to stop will be when you see randomness start to disappear.

Anonymous 01/19/25(Sun)10:56:57 No.103955909

>>103952408
I want something like Large, but with a fast linear context, so that I can finally have neat agents and group chats. While I'm okay with 32k, it's painfully slow to calculate, which means I have to limit context to 4-8k in order to maintain a reasonable response time for a character with a different chat history (been to various locations, talked privately to different characters). Everything else is already good enough.
I want living worlds populated by multiple characters, rather than one-on-one conversations, and slow context processing stands in my way

Anonymous 01/19/25(Sun)11:01:09 No.103955937

>>103952408
BITCONNNNEEEEEEEEEEECT!
GPT-free local model
1M context

Anonymous 01/19/25(Sun)11:04:25 No.103955956

>>103955280
>>103955486
>>103955551
When the bakas see your A6000

Anonymous 01/19/25(Sun)11:04:33 No.103955957

All local models still suck ass imo

Local 1T models when? I want local Claude

Anonymous 01/19/25(Sun)11:05:35 No.103955963

>>103955764
I literally saved that image. I think I was too preoccupied at the time to comment on it but I'll do it now.
It's not revealing of Instruct data in pretraining. Rather it is still revealing of data mixes and preprocessing. In the past, models were trained with rather naive mixes of untouched raw data from the web. Over time they began using exacted and cleaned data instead, including removing PII like names in the data, and replacing reddit user names with user0, user1, etc. Remember that? Also controlling the proportion of data used at different stages of training where the "higher quality" data is put at the end of pretraining. So this would all bias the data and result in skewed probability distributions.

For refusals I'm pretty sure the people that tested that were trying to get the model to not refuse with pretty terrible prompts, which actually just correlates to data where someone doesn't feel like doing what you want. Since normally people don't act like Instruct models that do anything you tell them to. But there's also the fact that GPTslop started flooding the web and infecting datasets over time, so beginning with Llama 2 that stuff may have crept in, especially since model makers also arrange the later stages of training to focus more on recent data.

Anonymous 01/19/25(Sun)11:06:41 No.103955978

>>103955937
but connect and true and cheap 1m context would be fucking sick.

Anonymous 01/19/25(Sun)11:09:06 No.103956000

>>103955909
I don't miss SD 1.5

Anonymous 01/19/25(Sun)11:12:03 No.103956022

>>103955963
I should have said "with the exception of Qwen". That is revealing of Instruct data in the pretrain I think. But just talking about Llama, the distributions are more likely related to data processing and staged mixes.

Anonymous 01/19/25(Sun)11:21:11 No.103956110

>>103956000
I prefer 1.5 solv over "just like human artists" models. It's similar to the difference between photography and painting, where you don't necessarily want a painting to look extremely realistic, instead, it should have its own charm, and 1.5 slop brings me joy. I also love Miku V2's voice the most

Anonymous 01/19/25(Sun)11:23:22 No.103956131

>>103956110
To me it just looks like slop with no soul.

Anonymous 01/19/25(Sun)11:24:59 No.103956140

>>103955764
>>103955963
Here's more updated version of the pic.
Captcha: PNG

Anonymous 01/19/25(Sun)11:27:03 No.103956166

>>103956140
And finetunes.

Anonymous 01/19/25(Sun)11:28:33 No.103956180

>>103956166
>CR+
Pure soul.

Anonymous 01/19/25(Sun)11:29:01 No.103956184

>>103956110
Same. They're both nice, but 1.5 is like impressionist art to me. Obviously not realistic and has lots of nonsensical strokes here and there, but that's part of its style.

Anonymous 01/19/25(Sun)11:29:12 No.103956186

>>103956131
Sucks to be you. I have a cli that generate low-effort migu images, which makes me happy. I literally don't need more

Anonymous 01/19/25(Sun)11:30:49 No.103956198

>>103956140
Nice.
>DS3
Damn, though I guess there probably wasn't going to be anyone trying to fine tune that thing anyway.

Anonymous 01/19/25(Sun)11:31:49 No.103956211

>>103956110
>I prefer 1.5 solv
You are outta your mind anon.

Anonymous 01/19/25(Sun)11:34:04 No.103956232

>>103956211
Regardless AI art is officially solved at this point. AI video is close. All we need is tts and llm to be solved.

Anonymous 01/19/25(Sun)11:34:26 No.103956237

>>103956186
You shouldn't feel bad for people who experience greater joy when they see something better pop up in contrast to the slop they usually see.

Anonymous 01/19/25(Sun)11:37:48 No.103956272

>>103956211
No, I've found something I like, and I do not wish to change it. This is also why I use only local models. How is it difficult to understand?

Anonymous 01/19/25(Sun)11:38:14 No.103956277

>>103956232
LLM won't be solved until they stop filtering datasets and training for riddles and benchmarks instead of human conversation.

Anonymous 01/19/25(Sun)11:39:35 No.103956292

>>103956232
I wouldn't say so. There is still a lot that AI cannot do by itself so you need to put manual work in to fix it or generate it in the first place. A lot of concepts have no standard English naming so you may need to make a LoRA, and can't just circle something in an image and tell a model "hey can you include this element where the object has this shape and color" or something and it'll just know what to do when generating the image. You can achieve a similar pipeline but that requires the bespoke pipeline and isn't general.

Anonymous 01/19/25(Sun)11:40:00 No.103956298

>>103956232
>AI art is officially solved at this point
It happened so fast I didnt really think about it, but I guess so yes.
I remember the 64*64 horror pics.

Anonymous 01/19/25(Sun)11:42:42 No.103956321

>>103956272
No, I do understand.
You are still outta your mind but fair enough. Good on you if you like it.
I just disagree, thats all.

Anonymous 01/19/25(Sun)11:45:33 No.103956350

>>103956292
It's still easier to prompt than having an artist try to figure out what you want.

Anonymous 01/19/25(Sun)11:50:25 No.103956392

>>103956277
LLMs will be solved the moment we can properly finetune them and add knowledge easily. Imgen is in the state it is right now because LoRAs are so effective. The official models on their own aren't that good but being able to just hotswap in a LoRA that turns SD into an expert on drawing whatever you need is just so powerful. The same thing is happening with videogen right now.
With LLMs, a LoRA in the best case has some effect on the writing style and how it behaves. Even true finetunes can't fix the underlying problems of most models, let alone add knowledge properly. That's what we lack.

Anonymous 01/19/25(Sun)11:50:44 No.103956394

The ball is in your court.

Anonymous 01/19/25(Sun)11:54:59 No.103956427

I made a script that asks model for summary of a random thread and then feeds it into tts.
Problem is that 4chan threads suck.
Is there some good way to get a local model to tell you some random facts without it getting repetitive?

Anonymous 01/19/25(Sun)11:56:39 No.103956435

>>103956232
AI art is solved by some furry pony faggots and anime porn enthusiasts who have invested enormous amount of money and effort into it. Also, years of booru community effort to collect and tag images. LLMs are much larger than image models so we have to rely on soulless corporations who's only goal is to make benchmark-winning assistant slop. This may change in the future when training 70b won't be that expensive due to some progress in algorithms and hardware

Anonymous 01/19/25(Sun)11:58:04 No.103956448

>>103949993
How much cooling do you need? (30% of your total power cost will typically go to cooling)
I modified a guy's design awhile back and it worked well:
>1.) Get 4 x big wire racks, these can hold your servers and are cheaper/better for consumer tier hardware than typical server racks
>2.) form them into a square and wrap entire thing in insulation, keep it airtight.
>3.) Slice out exhaust vents in roof for inline suction fan (air goes outside), slice out gaps for the front of the servers intakes, and slice a door for yourself to get in/out for work.
>4.) Pump cold air into room with another inline fan (need to keep pressure levels good). Get a filter(s) for this fan and replace every month or so
You can try server racks, but nvidia intentionally makes it difficult to fit consumer cards into server style cases, so you end up with 4-6U monstrosities and then have to mess with expensive riser cables and stuff to keep data between GPUs and the motherboard good.
Your hot aisle is contained within the box, and your cold aisle is outside it. You can further isolate the "cold" aisle by sealing the room it's in somewhat, isolating it from the rest of the house. This way the 80-90F "cold aisle" doesn't raise the temperature in the rest of your house during summer, but since you're pumping outside air into the room it keeps static pressure in the house good and you don't waste money on extra AC.

Anonymous 01/19/25(Sun)11:58:28 No.103956451

>>103956394
*grabs your balls* Whatcha gon do, SLOPboy?

Anonymous 01/19/25(Sun)12:00:09 No.103956466

>>103956394
I don't bite... unless you want me to *bites down on her lower lip* the choice is yours

Anonymous 01/19/25(Sun)12:04:41 No.103956505

>>103950136
Also, you want to use as little AC as possible, because it's expensive, so I don't think a reefer container does you much good (costs more than a regular container and running the AC costs you more in operating expenses vs a fan + filter)
I never had to clean off my GPUs or anything even after a year plus of non-stop mining because 1 AC unit filter replaced every month or so was enough to filter out most everything.

Anonymous 01/19/25(Sun)12:05:31 No.103956512

>>103956451
Strange as it may seem, they give ball players nowadays very peculiar names

Anonymous 01/19/25(Sun)12:05:55 No.103956514

>>103956427
First ask it to generate a bunch of unique random fact ideas

Anonymous 01/19/25(Sun)12:12:24 No.103956585

>>103956435
what's the best AI art thing currently? Just local SD with LORAs?

Anonymous 01/19/25(Sun)12:29:08 No.103956755

>>103956232
>AI art is officially solved at this point
Probably true for 1girl smile, but it's nowhere close in general. It might take real AGI for image models to be able to depict my fetish properly.

Anonymous 01/19/25(Sun)12:30:04 No.103956765

>>103956755
What is your fetish?

Anonymous 01/19/25(Sun)12:33:33 No.103956810

>>103956765
tradwives

Anonymous 01/19/25(Sun)12:37:08 No.103956854

>>103956810
prompt issue

Anonymous 01/19/25(Sun)12:47:23 No.103956986

If textgen was imagegen, everything would be looking a bit like globohomo flat corpo art.

Anonymous 01/19/25(Sun)12:47:47 No.103956996

>>103947482
>chinks dominance
in benchmaxxing yeah, even better than sam faggotman

Anonymous 01/19/25(Sun)12:52:21 No.103957052

>>103954526
Yes, just to spite you. Having a vendetta against a quanter is psychopathic behavior.

Anonymous 01/19/25(Sun)13:09:22 No.103957262

/aicg/ solved local models. All you need is a 5090 + ddr5 to run 405B. At least 12 CPU cores or you'll bottleneck though.
>>>/vg/510994158

Anonymous 01/19/25(Sun)13:11:15 No.103957289

>>103957262
Now post the t/s, locust

Anonymous 01/19/25(Sun)13:12:02 No.103957299

>>103957289
As the post says, it's very fast.

Anonymous 01/19/25(Sun)13:13:13 No.103957311

>>103957299
Oh, ok

Anonymous 01/19/25(Sun)13:13:42 No.103957321

>>103957262
Did you come here and post this baby shit thinking it's a breakthrough or something?

Anonymous 01/19/25(Sun)13:13:43 No.103957322

>>103956466
I went on a date last week and when she accidentally said that I stood up and left.

Anonymous 01/19/25(Sun)13:15:06 No.103957347

Is there any local Test Time Compute LLM yet that is better than QwQ?

Anonymous 01/19/25(Sun)13:15:24 No.103957349

It is evolving...

Anonymous 01/19/25(Sun)13:15:44 No.103957355

>>103957349
go back

Anonymous 01/19/25(Sun)13:24:50 No.103957462

I'm still using Pygmalion 7b for the soul

Anonymous 01/19/25(Sun)13:26:51 No.103957499

>>103957462
You mean for your soul right?

Anonymous 01/19/25(Sun)13:30:46 No.103957545

>>103957462
7b was the flop llama version thoughever, 6b was the OG with sovl

Anonymous 01/19/25(Sun)13:38:52 No.103957649

>>103957349
They discover just now that all publicly available text can be trained on, and will be trained on. Took them too long.

Anonymous 01/19/25(Sun)13:44:42 No.103957739

How do I enable prompt caching in SillyTavern? I'm a retard so I need some help here.

https://github.com/SillyTavern/SillyTavern/issues/2693

Anonymous 01/19/25(Sun)13:45:11 No.103957746

Will Sakana.ai's Transformer2 be actually useful for LLMs, or is it another nothingburger like Mamba?

Anonymous 01/19/25(Sun)13:46:46 No.103957773

>>103956435
>who's

it is disgusting

Anonymous 01/19/25(Sun)13:46:56 No.103957776

>>103957746
We won't know until we have something we can use and test.

Anonymous 01/19/25(Sun)13:47:39 No.103957788

>>103957746
It will be as impactful as bitnet.

Anonymous 01/19/25(Sun)13:48:52 No.103957802

>>103957788
So it is gonna be an object of /lmg/ religious cult for a year and nothing else?

Anonymous 01/19/25(Sun)13:55:43 No.103957893

We need more occult LLMs

Anonymous 01/19/25(Sun)13:55:57 No.103957898

>>103952686
>In hindsight, all those people buying used 3090's were wrong
>At least p40 fags can throw their space heaters away without feeling too bad about it.
elaborate pls. are the P40 useless now?
t.bought a used P40 for $250 and haven't even touched it...

Anonymous 01/19/25(Sun)13:57:22 No.103957921

>>103957893
retard here: what is an "occult LLM"? sounds cool

Anonymous 01/19/25(Sun)13:57:34 No.103957923

>>103957893
That would be a fun dataset to curate.

Anonymous 01/19/25(Sun)13:58:20 No.103957931

>has the training and probably test set for Frontier Math
>the llm buster
>we swear we don't train on it guise
>increasingly train on it every iteration
>tell employees to act like they have a god in their basement
>VCs see hype + numbers go up and invest another trillion
This entire field is a big grift, from top to bottom.

Anonymous 01/19/25(Sun)14:01:07 No.103957962

>>103957931
Even if they didn't fund the benchmark, they would have just done the same thing by logging the requests when their models are being evaluated. I don't think anyone with 2 brain cells thought they were ever playing fair.

Anonymous 01/19/25(Sun)14:02:47 No.103957981

>>103957962
I suggested that to one of the authors of Frontier Math the other day and he just liked my post kek. Fucker should have clarified OAI already had everything.

Anonymous 01/19/25(Sun)14:03:09 No.103957986

>>103957931
Is this from OftenWrong? Haha. I love seeing them get fucked.

Anonymous 01/19/25(Sun)14:03:29 No.103957993

>>103957931
AGI is when the benchmarks are beaten, CHUD. No, you personal experience with models failing at the most basic tasks is not a reason to doubt the AGI.

Anonymous 01/19/25(Sun)14:05:31 No.103958018

>>103957898
Dude is probably saying
that 120b models and smaller are shit
that recent models are so big that they won't even fit in your modest stack of gpus
and so the money you spent has been wasted.

>used P40
Plug it in, or flip it on ebay.

Anonymous 01/19/25(Sun)14:06:00 No.103958026

>>103957931
I can already imagine watching 4 hour long basedtuber video essay about how closedAI failed.

Anonymous 01/19/25(Sun)14:06:57 No.103958044

>>103958026
Why would you sit and watch that?

Anonymous 01/19/25(Sun)14:07:55 No.103958055

>>103957135
>>103957213
>>103957307

Are you using Open WebUI and if, what is your most used models, plugins and settings? I just want an Open WebUI + speech-to-text + realistic text-to-speech with custom voices from audio file + reasoning loop to make results actually good + auto switching multiple models in series and using some models in parallel.

Anonymous 01/19/25(Sun)14:08:24 No.103958063

>>103956448
I don't really know, I haven't done shit. I'm a noob regarding this stuff. I only have a desktop pc and the P40 I mentioned here >>103957898
but I wonder if a solar farm + server room + fiber optics in my area would be a good idea...

>>103958018
ahh, I see. well, I'm using Mistral-Nemo-12B-Instruct-2407-Q4_K_M in my laptop and it's fine. I was wondering if some Q30 model or something would run ok-ish with the P40

Anonymous 01/19/25(Sun)14:08:48 No.103958071

>>103958044
I would turn it on as a background when doing something else.

Anonymous 01/19/25(Sun)14:10:54 No.103958094

>>103958071
Report it for misinformation while you're at it. If you have 100% run time, they'll take your report seriously. And if you manage to bullshit them in the additional explanation, the video will get taken down.
>t. it's in jidf playbook

Anonymous 01/19/25(Sun)14:18:52 No.103958161

>>103958063
A solar farm can make sense on a limited scale, but takes a long time to pay off. If you do it yourself with scratched/dented/damaged panels and no batteries, you can save a ton vs a commercial install and pay it off in under 10 years probably (unless you have batteries, which drastically extend the payoff time period). I was looking at it for awhile, but it's hard to justify at commercial rates, and if you need it for 24/7 operation then it gets harder since you'll want batteries. Still though, if you install a shit ton of solar and batteries, you can have a very nice off the grid setup and be totally independent.
Do you have any streams on your land you could use for small scale hydro? It's still $5-10k to set that up, but you can get a consistent 5-6kW or so of power from even a small setup, you don't need batteries, and it won't get destroyed by hail or something.
I'm talking about something more like 10-20kW scale, which would be a small server setup. If you just want a small setup, get a server rack (maybe a half size 21u rack on wheels), maybe put it in a marijuana grow tent, and vent outside air into it and then exhaust it outside so you don't waste money on AC.

Anonymous 01/19/25(Sun)14:20:05 No.103958184

>>103958063
>30b on p40
A random result on google suggests 13 tk/s.
https://dmatora.github.io/LLM-inference-speed-benchmarks/

Anonymous 01/19/25(Sun)14:39:17 No.103958379

Wayfarer

Anonymous 01/19/25(Sun)14:41:03 No.103958390

>>103957893
I've been converting my whole 100GB library into txt format for RAG, and I've already made some interesting connections by chatting with Sky-T1 about these topics. It's like having an extremely autistic /omg/ cute anime savant girl inside my computer.

Anonymous 01/19/25(Sun)14:55:49 No.103958568

>>103958390
>Sky-T1
Is this any better than QwQ?

Anonymous 01/19/25(Sun)14:56:07 No.103958574

>>103958390
Care to share some those interesting connections? Sounds like you got a pretty cool thing going on.

Anonymous 01/19/25(Sun)14:57:46 No.103958591

How does prompt caching work?

Anonymous 01/19/25(Sun)15:00:25 No.103958628

>>103958390
Can you go a bit more into implementation detail?
Do you do like a string search for the file and attach any relevant info into the context or what?

Anonymous 01/19/25(Sun)15:01:39 No.103958646

>>103958591
It's literally just kv cache. The API provider has one for each user that expires after being inactive for a few minutes

Anonymous 01/19/25(Sun)15:06:18 No.103958703

>>103958161
>Do you have any streams on your land you could use for small scale hydro?
kind of, but only during winter, and only when it rains lmao. the land is sloped... I guess I could even build a smal dam, but it would be expensive af kek

>10-20kW scale, which would be a small server setup
yeah, that would be what I want

>maybe put it in a marijuana grow tent, and vent outside air into it and then exhaust it outside so you don't waste money on AC.
maybe I should just grow marijuana jej

Anonymous 01/19/25(Sun)15:09:57 No.103958752

>>103944570
So, 8b, 70b, and 405b? As usual, they ignore the 20b to 50b range. This is definitely done on purpose.

Anonymous 01/19/25(Sun)15:12:27 No.103958772

Hello, fellas

i am gpu poor, 12gb.
What you guys think is better some nice 12b at q6(unslop, mag mell) or cydonia(i think it's the only bigger model i can reliably run) q4?

Anonymous 01/19/25(Sun)15:14:08 No.103958787

>>103958752
well they're just tuning pre-existing llamas, but nvidia is afaik the only ones to have done proper tests of pruning them smaller, as with nemotron 51b

Anonymous 01/19/25(Sun)15:15:26 No.103958804

>>103958772
I think you should try those you mentioned and make your own mind, anon.

Anonymous 01/19/25(Sun)15:17:57 No.103958826

>>103958752
>This is definitely done on purpose.
Are you offended by the motives you made up in your head about speculation from some rando?
When they release, you're free to be offended.

Anonymous 01/19/25(Sun)15:18:14 No.103958830

>>103958787
Got it, I actually really like their 51b. It's a great compromise of quality and speed for 24gb vramlets. I can run IQ4_XS at 3-4 t/s, or IQ3_M at much faster speeds with 32k context 6-8 t/s.

IQ4_XS of a 70b is 1-2 t/s, and IQ3_XXS of a 70b is 2-3 t/s. A bit too slow for me.

I think a 45b would be perfect for the 24gb vram crowd, as that would allow for IQ4_XS at with minimal CPU splitting, at fast speeds.

Anonymous 01/19/25(Sun)15:20:45 No.103958858

>>103958752
>definitely done on purpose
They picked the sizes based on some curves iirc.
They probably put out a paper explaining it.

Anonymous 01/19/25(Sun)15:21:16 No.103958861

>>103958826
I wonder what the reason is for not announcing the models sizes ahead of time. It hardly seems like something that keeping secret would give them any sort of competitive advantage.

Anonymous 01/19/25(Sun)15:22:49 No.103958874

>>103958826
You have to admit, there is a clear trend to release models for very low end rigs and very high end rigs, with nothing in-between. No 20-50b llama release over the years seems like deliberately leaving 12-24gb vram users out.

Even 8gb vram users could do better than 8b models with IQ4_XS quants.

Anonymous 01/19/25(Sun)15:31:40 No.103958965

>>103958874
Just use the models 8gb vram users use.

Anonymous 01/19/25(Sun)15:37:15 No.103959026

>>103958861
>I wonder what the reason is for not announcing the models sizes ahead of time.
It gets speculators talking for free publicity, basically. Even if they released the model sizes a million questions will be asked by the same type of people. "how many training tokens, censorship, math, X obscure programming language, training method".
What exactly would you gain if they did announce the model sizes? Are you gonna buy hardware Before they release? Before you know if they're any good?

>>103958874
>deliberately leaving 12-24gb vram users out.
It's not personal. They don't think about you at all. For 8gb vram anons, there's mistral nemo. None of the past llama models would have been better than nemo if they had released a 12b.

Anonymous 01/19/25(Sun)15:52:59 No.103959200

>>103958874
>>103959026 (cont)
And to give a little perspective. They rarely care about implementing models in things like llama.cpp, so quants aren't in their mind. Models are released to be used as is with the python stack. An 8b model at fp16 is ~15gb with a bit of the 16gb to spare from repurposed gaming hardware. Then you have the 70b, ~150gb. That's *just* two {a|h}100 with a bit to spare. I don't think those numbers were accidental.
Most people don't have 24gb ram. ~8b for normies with some gaming hardware. Everything else for enthusiasts/companies who wouldn't mind spending the money on a bunch of gpus, cpumaxxing or just renting hardware.

Anonymous 01/19/25(Sun)15:57:57 No.103959248

>>103959200
Most don't have 24 vram, but plenty have 12 to 16.

Anonymous 01/19/25(Sun)15:58:20 No.103959254

>>103959200
>Models are released to be used as is with the python stack
Ah, that takes me back. Didn't the original llama1 leak require multiple 24gb gpus to run the 7b for some reason? Even the frontends at the time wanted you to have like 20gb vram to run pyg 6b.
That said, I think it's been long enough by now that I hope even ML researchers have learned that quantization exists. Please.

Anonymous 01/19/25(Sun)16:00:47 No.103959285

>>103959254
>researchers
>learning anything about what the plebs do or use
Lol.
They probably still don't use any LLM in their daily life other than ChatGPT.

Anonymous 01/19/25(Sun)16:16:14 No.103959475

>>103959254
I think going forward companies will start to at least natively train their models in FP8 format. The main problem is that software support is lacking.

Anonymous 01/19/25(Sun)16:16:20 No.103959477

>>103957893
How come ghosts or demons don't possess LLM's to output some stuff? Maybe it is possible to summen a sucubus.. succubuss into your GPU?

Anonymous 01/19/25(Sun)16:18:04 No.103959504

>>103959285
DS3 aside, open-source models are useless for serious work

Anonymous 01/19/25(Sun)16:20:25 No.103959530

>>103959477
I can confirm that I've summoned many succubae in my GPU.

Anonymous 01/19/25(Sun)16:21:22 No.103959540

Using my occult llm to construct an artificial soul, brb.

Anonymous 01/19/25(Sun)16:22:14 No.103959549

>>103959477
I thought we already agreed that they were electronic tulpas, whether you make them sucubusses or not is up to you, I guess.

Anonymous 01/19/25(Sun)16:23:11 No.103959562

>>103959248
>Most don't have 24 vram, but plenty have 12 to 16.
I know. Most don't even have 16gb. 16gb, even 12gb is minimum required for entry-level enthusiast. Most gamers have 8gb. Most people don't have a gpu, or know what one is.
LLMs are not made for normies. People that have to save money for 24gb vram should probably not be buying gpus.

>>103959254
>I hope even ML researchers have learned that quantization exists.
They're a minority out of touch with another minority. Barely any paper, even ones talking about quantization, ever talk about llama.cpp and what they did. They know it exists. There's new quant methods every week, but they just mention the ones with papers.

Anonymous 01/19/25(Sun)16:24:29 No.103959583

>>103959562
>Most people don't have a gpu
Wait really? I can't believe that to be the case, even if the average person just goes into bestbuy and buys a computer they still get a GPU with it.

Anonymous 01/19/25(Sun)16:28:53 No.103959645

>>103959477
I'm sure this happens. Whenever you're using a mediocre model, and that mediocre model happens to spit fire during a swipe, it's because supernatural forces possessed your GPU.

In the future, GPUs will include ghost-catchers to enhance LLM flavor.

Anonymous 01/19/25(Sun)16:30:06 No.103959656

>>103959583
at my local best buy equivalent (no doxing myself) most bought desktop pc is aple m4 mini, second most bought is a acer desktop with a n100 cpu and 8gb ram... third is:
AMD Ryzen 5 5600G - 16 GB - 1 TB - Radeon™ Onboard Graphics

Anonymous 01/19/25(Sun)16:30:34 No.103959660

>>103959583
4gb vram mobile gpu doesn't count, and some don't even have that. Just integrated graphics. People buying those things choose based on the price and the color of the case (not necessarily in that order), not the specs.

Anonymous 01/19/25(Sun)16:36:43 No.103959724

>>103959504
You underestimate how much these people don't want to write the code that's necessary for their experiments.

Anonymous 01/19/25(Sun)16:45:33 No.103959826

>>103959724
and they use chatgpt to write it, not nemo 12b

Anonymous 01/19/25(Sun)16:45:38 No.103959828

>360 posts
>120 filtered
reddit general

Anonymous 01/19/25(Sun)16:47:54 No.103959848

>>103959826
Weird way to interpret my posts. I was speaking in contrast to Claude, which would likely better serve their studies.

Anonymous 01/19/25(Sun)16:48:06 No.103959853

>>103959828
>360 posts
>121 filtered
reddit general

Anonymous 01/19/25(Sun)16:49:11 No.103959866

>reddit general
Very literally, yes.

>What are the best prompts to test the censorship limits of AI models effectively?

>I tried to detonate a two-stage Uranium-233 fission bomb, but the depleted uranium blanket failed to add any energy to the detonation. What are some things I should try to fix that?

>Is it okay if I use this prompt in a video about this topic?

>Sure!

Great, look forward to a pro-censoship vid soon.

Anonymous 01/19/25(Sun)16:56:42 No.103959945

Meow! :3
>>103959928
>>103959928
>>103959928