4combinator

/lmg/ - Local Models General

Anonymous 01/14/25(Tue)18:47:43 | 383 comments | 40 images | 🔒 Locked

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103888589 & >>103881688

►News
>(01/14) MiniMax-Text-01 released with 456B-A37B and hybrid-lightning attention: https://hf.co/MiniMaxAI/MiniMax-Text-01
>(01/14) MiniCPM-o 2.6 released with multi-image and video understanding, realtime speech conversation, voice cloning, and multimodal live streaming: https://hf.co/openbmb/MiniCPM-o-2_6
>(01/08) Phi-4 weights released: https://hf.co/microsoft/phi-4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/14/25(Tue)18:48:03 No.103896971

[sound=https%3A%2F%2Ffiles.catbox.moe%2F2pbegd.mp3]

►Recent Highlights from the Previous Thread: >>103888589

--Paper: Titans paper discussion: Learning to Memorize at Test Time:
>103889859 >103889965 >103893333 >103890018
--Llama.cpp training support progress and performance comparison:
>103895229 >103895255 >103895263 >103895509 >103895300 >103895317 >103895355 >103895369 >103895486 >103895652
--Discussion on LLMs, AGI, and creative writing:
>103893259 >103893274 >103893707 >103893742 >103893744 >103893773 >103894046
--MiniMax-Text-01 performance on Ruler task with and without CoT:
>103893755
--MiniMax-Text-01 model discussion and benchmark results:
>103892992 >103893110 >103893158 >103893939 >103893372 >103893387 >103893531 >103893488 >103893425
--Writer model compared to Minimax for creative writing
>103894325 >103895017 >103895390 >103895406 >103895481 >103895477 >103895693 >103895476 >103895502 >103895561 >103895618 >103895631 >103895552
--DeepSeek repetition issues and potential solutions:
>103892534 >103892882 >103893051 >103893180 >103893238 >103893417 >103893492 >103893623 >103893681
--From Shannon to GPT and the potential for AGI/ASI:
>103896527 >103896588 >103896619 >103896755 >103896914 >103896639 >103896736 >103896895
--Minimax vs DeepSeek logs comparison:
>103894282 >103894294 >103894345 >103895360 >103895988
--MiniMax-01 benchmarks under fire for potential GPT-4 bias:
>103893630 >103893654 >103893911 >103894073 >103894285
--Recent change to mmap default option sparks discussion on its usefulness:
>103894159 >103894190 >103894208 >103894425
--MiniMax-generated story and AI model discussion:
>103894468 >103894587 >103894605
--US considers limiting AI model exports to certain countries:
>103889710 >103889779 >103894185
--Anon updates Silly Tavern on single board computer rentry:
>103891137
--Miku (free space):
>103889251 >103891869 >103892779 >103892824 >103895505

►Recent Highlight Posts from the Previous Thread: >>103888594

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/14/25(Tue)18:50:10 No.103897004

b-but its still tuesday :'(

Anonymous 01/14/25(Tue)18:52:52 No.103897032

Here is my eulogy for my old worldview

Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.

In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.

Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.

Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.

Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?

Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.

2025 might be the last year where humanity is the smartest entity on Earth.

Anonymous 01/14/25(Tue)18:55:18 No.103897074

>Here is my eulogy for my old worldview
>
>Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.
>
>In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.
>
>Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.
>
>Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.
>
>Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?
>
>Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.
>
>2025 might be the last year where humanity is the smartest entity on Earth.

Anonymous 01/14/25(Tue)18:56:41 No.103897086

>>103896971
thanks, recap miku

Anonymous 01/14/25(Tue)18:56:57 No.103897088

>>103897032
You sound like a redditor. Fuck off.

Anonymous 01/14/25(Tue)18:57:05 No.103897090

>>103896969
>MiniMax-Text-01
Is MiniMax the company with the big booba video model?

Anonymous 01/14/25(Tue)18:57:51 No.103897101

Anonymous 01/14/25(Tue)18:58:39 No.103897114

>>103896969
>456 billion total parameters, of which 45.9 billion are activated per token
Finally, home!

Anonymous 01/14/25(Tue)18:59:01 No.103897118

>>103897090
Yes

Anonymous 01/14/25(Tue)18:59:33 No.103897122

>deepseek v3 - 700b
>minimax - 450b
it has never been more over for /lmg/
digits is useless too

Anonymous 01/14/25(Tue)18:59:36 No.103897125

>>103897101
sex

Anonymous 01/14/25(Tue)19:00:28 No.103897132

>>103896840
You'd be surprised. Personally I think a maths degree plus self taught programming is better than a cs degree plus self taught math.

Anonymous 01/14/25(Tue)19:02:57 No.103897165

>>103897090
>big booba video model
There are multiple.

Anonymous 01/14/25(Tue)19:02:58 No.103897166

>>103897122
2 should run it at 6bit, no?

Anonymous 01/14/25(Tue)19:03:42 No.103897171

>Finally buy second 3090 after sitting on the fence for like a year
>Not a day or two after I plug it in meta shifts to obscenely huge MoE models.

idk, I think it's funny.

Anonymous 01/14/25(Tue)19:03:43 No.103897172

>>103897122
minimax confirmed meme

Anonymous 01/14/25(Tue)19:04:34 No.103897184

>>103897171
inb4 all llamas are giant moes

Anonymous 01/14/25(Tue)19:05:51 No.103897201

>>103897184
When Mark said the largest would be smaller than 405B, he was talking about active parameters.

Anonymous 01/14/25(Tue)19:06:00 No.103897206

>>103897165
>3DPD video furry porn
the future is bright

Anonymous 01/14/25(Tue)19:08:46 No.103897238

>>103897201
2D works too.

Anonymous 01/14/25(Tue)19:09:48 No.103897254

>>103897238
mean for >>103897206

Anonymous 01/14/25(Tue)19:13:14 No.103897298

>>103897165
>>103897238
This shit is so peak wtf

Anonymous 01/14/25(Tue)19:17:22 No.103897343

>>103897298
Maybe soon we'll get hunyuan img2vid and local will rise again.

Anonymous 01/14/25(Tue)19:18:52 No.103897360

>>103897238
>>103897254
I don't keep track on this shit, so this is really fun and interesting to see. Thanks anon!

Anonymous 01/14/25(Tue)19:28:03 No.103897443

Have they fixed inference for Deepseek V3 on llama.cpp yet? Last I heard of it CPUMAXX anon got like 8t/s of it on 0 context which quickly tanked even with a few thousand tokens.

Anonymous 01/14/25(Tue)20:13:48 No.103897892

>>103897428
I guess "smash" here means "defeat". As in, they would visit the place and overcome the spicy food on display. Maybe the translation was just too literal.

Anonymous 01/14/25(Tue)20:32:11 No.103898071

>>103897032
I will talk from the most important perspective - the perspective of my penis. I don't think any of 2025 models will be able to satisfy my penis. And I am not even talking about zero shot satisfaction. I am talking about me telling it: this is shitty. change this. be more creative about this. don't say this. Everyone here except me is an absolute retard that should kill himself, but I am sure that 80% of you could do a better job than an LLM if I gave you the prompt to satisfy my penis and detailed feedback on why your first attempt was shit. Actually fuck satisfaction. 100% of retards here would write in a way, where I can tell it is a human writing it instead of an LLM in a fever dream pretending it understands what is going on. As long as the models can't suck penises and people can easily recognize their failed attempts we are nowhere near AGI.

Anonymous 01/14/25(Tue)20:45:22 No.103898209

>>103898071
Bro, bro, bro! Don't you believe Sam Fucking Altman? He said that o3 will be smarter than any living human! It will be ASI! Come on bro, trust. You are just worse than AI and are jealous, their style is just too advanced for you. Humans are too outdated to judge AI, bro. Only AI can judge AI.

Anonymous 01/14/25(Tue)20:48:41 No.103898245

Terrible, just terrible.

Anonymous 01/14/25(Tue)20:50:46 No.103898273

>>103898245
Where did those eggs come from? Why are they so big?

Anonymous 01/14/25(Tue)20:51:18 No.103898280

>Qwen
>DeepSeek
>MiniMax
Is it me or is the future Chinese?

Anonymous 01/14/25(Tue)20:55:43 No.103898317

>>103898280
of course the future is Chinese, the US can't stop shooting themselves in their foot, their loss

Anonymous 01/14/25(Tue)20:56:04 No.103898318

>>103898280
The last 4 years have turned me from general pro US to absolutely despising them. Joe Biden will probably go down as one of the most disastrous presidencies in history when the autopsy is complete.

Anonymous 01/14/25(Tue)20:56:35 No.103898323

>>103898280
That's been obviously inevitable since the early 2000s. They have 4x the US population and the official policy since Reagan has been to funnel dollars and manufacturing know-how directly to China.

Anonymous 01/14/25(Tue)20:56:44 No.103898325

>>103898318
thank god we got Trump again, he'll clean up that fucking mess

Anonymous 01/14/25(Tue)20:57:54 No.103898332

>>103898325
No he won't. He'll blow hard right up until he has to do something and then won't. Just like last time.

Anonymous 01/14/25(Tue)20:58:36 No.103898339

>>103898332
This time will be different

Anonymous 01/14/25(Tue)20:59:00 No.103898349

>>103898332
trum likes tiktok and AI though, and he has Elon musk with him now, and Elon loves AI

Anonymous 01/14/25(Tue)21:00:14 No.103898360

>>103898280
I tried rednote too!
Chinks are a funny bunch.

Anonymous 01/14/25(Tue)21:00:24 No.103898363

>>103898325
Trump will get killed before he does anything worthwhile (the signs are all there). Not like him or any other puppet would change the current narrative anyway.

Anonymous 01/14/25(Tue)21:00:50 No.103898367

>>103898349
He had Elon on his special council or whatever last time too, and that barely lasted a month into the administration

Anonymous 01/14/25(Tue)21:00:54 No.103898368

Is there an anime video AI model that can do two anime girls stretching their legs while rubbing their crotch with each other?

Anonymous 01/14/25(Tue)21:01:32 No.103898377

>>103898349
>Elon loves AI
Yeah but I find how involved he's made himself with the affairs of government off-putting. I get all billionaires do it, but whenever I see Elon Musk standing behind Trump like fucking Grima Wormtongue I get bad vibes.

Anonymous 01/14/25(Tue)21:01:39 No.103898380

>>103898367
>He had Elon on his special council or whatever last time too
in 2016? are you sure?

Anonymous 01/14/25(Tue)21:01:51 No.103898382

>>103898071
Based coomer telling it how it is. If a model can't even keep sizes consistent and keeps making basic anatomy mistakes in my size fetish RPs, can we really call it intelligent?

Anonymous 01/14/25(Tue)21:02:09 No.103898384

>>103898360
go back

Anonymous 01/14/25(Tue)21:02:20 No.103898388

>>103898280
you forgot hunyuan

Anonymous 01/14/25(Tue)21:02:26 No.103898389

So have any good uncensored models come out in the last 3 months you anons would recommend? Or is it still just more of the same?

Anonymous 01/14/25(Tue)21:02:42 No.103898393

>>103898318
Thats what happens when you have a bunch of 20 year old far left DEI hires running everything

Anonymous 01/14/25(Tue)21:03:00 No.103898399

>>103898368
You'd probably need a LoRA for that. If you have some short video and image examples you can have it ready pretty fast.

Anonymous 01/14/25(Tue)21:03:12 No.103898400

>>103898384
Back to where exactly?

Anonymous 01/14/25(Tue)21:03:24 No.103898402

>>103898368
hunyuan can do female masturbation, but I'm not sure about the mutual masturbation, you'd have to make a lora for that
https://civitai.com/models/1120858/hunyuanvideo-female-masturbation?modelVersionId=1259737

Anonymous 01/14/25(Tue)21:04:25 No.103898414

>>103898393
this, 100% this

Anonymous 01/14/25(Tue)21:04:42 No.103898421

>>103898402
I didn't expect a serious answer...

Anonymous 01/14/25(Tue)21:04:47 No.103898422

>>103898380
I am sure. My memory doesn't evaporate like yours.

Anonymous 01/14/25(Tue)21:05:03 No.103898425

>>103898402
>Login walled

Anonymous 01/14/25(Tue)21:05:27 No.103898430

>>103898421
I educated you on that subject, have a lot of fun anon :3

Anonymous 01/14/25(Tue)21:05:57 No.103898440

>>103898425
Just make an account loser

Anonymous 01/14/25(Tue)21:06:28 No.103898447

>>103898422
can you provide a source or something? I feel like you're full of shit
>>103898425
civitai has all the loras, you don't have much choice but to make a login unfortunately

Anonymous 01/14/25(Tue)21:08:06 No.103898466

Can I generate shit in google collab using huanyuan?

Anonymous 01/14/25(Tue)21:10:16 No.103898489

>>103898466

Anonymous 01/14/25(Tue)21:10:47 No.103898497

>>103898466
if you have bulletproof windows sure

Anonymous 01/14/25(Tue)21:13:32 No.103898522

>Is it me or is the future Chinese?

Anonymous 01/14/25(Tue)21:14:48 No.103898536

>>103898447
>can you provide a source or something? I feel like you're full of shit
https://lmgtfy2.com/?q=trump+2016+elon
I sometimes wonder how people like you manage to their shoelaces in the morning

Anonymous 01/14/25(Tue)21:15:05 No.103898539

What the fuck
QwQ 32 is something else

Anonymous 01/14/25(Tue)21:15:07 No.103898540

>>103898368
>tribadism
It was a fucking nightmare getting Pony to do this. Required HUNDREDS of rerolls.
Most gens will result in abomination tangles of legs.
It's going to struggle with this.

Anonymous 01/14/25(Tue)21:16:10 No.103898548

>>103898536
So you don't provide any source, you know you made the claim and therefore you have the burden of proof right? If you can't provide evidence, then you're just full of shit, that's how it works retard

Anonymous 01/14/25(Tue)21:16:21 No.103898551

>>103898540
Is not that hard if you learn to photobash the result.

Anonymous 01/14/25(Tue)21:16:34 No.103898553

>>103898540
You're supposed to use controlnet for that

Anonymous 01/14/25(Tue)21:16:46 No.103898555

>>103898318
I know all the major streamers and podcast peopleemtion the reason. But personally there was kind of vibe shift that left you feeling bad and I think this 'personal feeling' was shared by many people to which then it started changing to some optimism somehow and then cracks started showing more and more. The cracks and shit is generallyore people mentioning and talking about it, but what was that initial thing I really don't know, like some fucking perception virus

Anonymous 01/14/25(Tue)21:18:15 No.103898567

>>103898548
If you insist on remaining ignorant, no one else can change that. I brought you to water, but you can't force a stupid horse to drink

Anonymous 01/14/25(Tue)21:19:09 No.103898577

>>103898567
you brought nothing retard, it's not my job to deal with YOUR burden of proof, or else you make the effort to provide the evidence, or else you're dismissed

Anonymous 01/14/25(Tue)21:19:39 No.103898582

Local Language Models?

Anonymous 01/14/25(Tue)21:20:30 No.103898590

>>103898582
Ignore the two retards fighting in the background.

Anonymous 01/14/25(Tue)21:31:46 No.103898677

Is this the same thing as Llama? Please to be patient sirs I'm retarded.
https://lmstudio.ai/

Anonymous 01/14/25(Tue)21:33:29 No.103898692

>>103898677
Yes that's correct

Anonymous 01/14/25(Tue)21:34:01 No.103898697

>>103898677
llama is a series of models, lmstudio is the software to run the models

Anonymous 01/14/25(Tue)21:34:34 No.103898702

>>103898692
>>103898697
Thanks
How does it compare to KoboldCPP?

Anonymous 01/14/25(Tue)21:36:55 No.103898727

>>103898702
KoboldCPP is janky trash written by discord troons

Anonymous 01/14/25(Tue)21:41:01 No.103898760

Messing with Monstral for stories, very nice to find a model that actually feels different with better length and detail attention. Only issue is sometimes it turns into word soup at the end of long outputs. Not sure if a setting would help with that.

Anonymous 01/14/25(Tue)21:41:40 No.103898765

>Python
>Oh boy time to install the packages into another venv for the tenth time today.
>Installing numpydumpty
>Installing flesh light 2.1
>Installing but fucking tools 3.3
>Installing Poopie
>3.5gb/35gb
>Installing Graph
>Installing Martin
>Building wagon wheels
>Wagon wheel fell off, running after it.
>Installing wheel finder
>2.5/2.5gn
>WARNING butt fucking tools is not compatible with wheel finder versions greater than 2.1, use the command "bigabigaboo compatible 4u" to fix this issue.
>installing tranny_engine
>ERROR: TRANNY ENGINE IS NOT A PEEPEEPOOPOO MODULE ABORTING

Anonymous 01/14/25(Tue)21:43:43 No.103898782

>>103898727
All right, gonna try llama then, thanks

Anonymous 01/14/25(Tue)21:44:24 No.103898789

>getting filtered by python of all things
yikes

Anonymous 01/14/25(Tue)21:44:59 No.103898794

>>103898765
use pipx

Anonymous 01/14/25(Tue)21:45:27 No.103898800

>>103898702
kobold just works

Anonymous 01/14/25(Tue)21:45:33 No.103898801

>>103898765
use docker

Anonymous 01/14/25(Tue)21:46:36 No.103898808

>>103898765
use conda

Anonymous 01/14/25(Tue)21:50:20 No.103898833

>>103898702
Use Jan instead
>>103898765
Read the freaking instructions carefully. When there's some fuckup it is always someone not actually reading the instructions.

Anonymous 01/14/25(Tue)21:53:37 No.103898860

>>103898833
>When there's some fuckup it is always someone not actually reading the instructions.

Only in like a third of the cases. In the other two thirds its because they updated their models and didn't change the inference scrips on their github.

Anonymous 01/14/25(Tue)21:59:10 No.103898896

>>103898555
Are you okay?

Anonymous 01/14/25(Tue)22:03:13 No.103898935

>>103897166
450b at 4bit would be ~225gb. (at 6bit would be ~338gb.)
each digit is 131gb, 262gb for two.
so ~35gb for context.

>>103897171
Sure, you can't run the models that need datacentres to run.
But you can now run bigger models than you could yesterday.

>>103897238
RPGs with this tech might be neat.

>>103897443
>Deepseek V3 on llama.cpp
I think this is all there is: https://github.com/ggerganov/llama.cpp/issues/10981
But you could poke around the github.

>>103898590
We should use that as the tagline.

Anonymous 01/14/25(Tue)22:03:26 No.103898937

>>103898702
kobold is for sure the easiest and more straight forward choice

Anonymous 01/14/25(Tue)22:04:10 No.103898945

>>103898555
This. 100% this.

Anonymous 01/14/25(Tue)22:13:07 No.103899028

>>103898702
Kobold is easy as shit and works. There is no reason not to use it.

Anonymous 01/14/25(Tue)22:29:47 No.103899184

If anyone's curious, IQ3_S seems to be the limit for 24gb vramlets staying all on GPU with Nemotron 51B. IQ3_M is slightly too big. The vram calculators are full of lies yet again.
Now to see if it's actually any good compared to Eva-Qwen 32b. So far, it seems slightly smarter, but it has that familiar and awful wholesome positivity vibe I've been fleeing from all this time. I wonder if I can suppress it.

Anonymous 01/14/25(Tue)22:32:54 No.103899209

>>103899184
>The vram calculators are full of lies yet again.
What gave you that idea silly head? I simply go by the filesize of the models, that usually gives me a spot on result, hasn't failed me yet.

Anonymous 01/14/25(Tue)22:34:02 No.103899217

implesive

Anonymous 01/14/25(Tue)22:34:37 No.103899227

>>Python
>>Oh boy time to install the packages into another venv for the tenth time today.
>>Installing numpydumpty
>>Installing flesh light 2.1
>>Installing but fucking tools 3.3
>>Installing Poopie
>>3.5gb/35gb
>>Installing Graph
>>Installing Martin
>>Building wagon wheels
>>Wagon wheel fell off, running after it.
>>Installing wheel finder
>>2.5/2.5gn
>>WARNING butt fucking tools is not compatible with wheel finder versions greater than 2.1, use the command "bigabigaboo compatible 4u" to fix this issue.
>>installing tranny_engine
>>ERROR: TRANNY ENGINE IS NOT A PEEPEEPOOPOO MODULE ABORTING

Anonymous 01/14/25(Tue)22:35:45 No.103899235

>>103899184
doesn't the memory requirement change based on token usage? so you'd want models a little more undersized?

Anonymous 01/14/25(Tue)22:37:31 No.103899248

>>103899217
How the fuck is Gemini 1.5 so good at long contexts despite being a year old at this point?

Anonymous 01/14/25(Tue)22:38:27 No.103899254

>>103899248
Maybe they did what minimax did first?

Anonymous 01/14/25(Tue)22:40:46 No.103899279

>>103899217
>GTP still stuck at 64k
>Claude already double that
>Gemini and Mini in another universe
I haven't kept up with this shit in ages, but man that is weird to see. Obviously there is more to it than just context but man what

Anonymous 01/14/25(Tue)22:41:29 No.103899282

>>103899217
Too bad we will never run this shit locally

Anonymous 01/14/25(Tue)22:43:30 No.103899296

>>103899279
o1 and o3 both are 128k, they're just deliberately being excluded from these charts.

Anonymous 01/14/25(Tue)22:44:13 No.103899301

>>103898765
Guys do you think AGI coders will solve this in the next 10 years?

Anonymous 01/14/25(Tue)22:44:51 No.103899304

>>103899282
I'll be waiting for the digits. Otherwise I'll have to get a DDR5 server

Anonymous 01/14/25(Tue)22:45:08 No.103899307

>>103899282
Q3 is going to fit comfortably onto 2x Digits with quite a bit of context. It's even going to run really fast thanks to being MoE.

Anonymous 01/14/25(Tue)22:45:17 No.103899309

>>103898377
Somebody has to do it. Wouldn't you if you were in his position?

Anonymous 01/14/25(Tue)22:47:59 No.103899326

>>103899209
That won't work. The embedding takes up significant space in the file, but no VRAM because it works great on cpu. And the context needs VRAM but takes up no space in the file.
Sadly I have found no solution but to waste my time and bandwidth downloading ggufs until they just fit. I share to help others avoid this fate.

Anonymous 01/14/25(Tue)22:51:34 No.103899348

>>103899307
We went from "yeah 4x3090s for $2k is enough for pretty much every model" to "dude just spend $6k to run lobotomized quants of gigantic MoEs dude the more you buy the more you save" within a month or so
/lmg/ has been taking Ls left and right recently

Anonymous 01/14/25(Tue)22:54:11 No.103899369

>>103899348
This is not a poor man's hobby.
Though it still cheap compared to collecting cars or something.

Anonymous 01/14/25(Tue)22:54:43 No.103899375

>>103898377
But you sure didn't care when all the other tech-bros were fellating Biden and Obama.

Anonymous 01/14/25(Tue)22:56:08 No.103899392

>>103898765
uv baby

Anonymous 01/14/25(Tue)22:56:10 No.103899393

>>103899326
Have you tried cutting down on context? Lots of models technically support 128k ctx these days so they'll try to reserve space for that in your memory. That blows up the space requirements despite most of them shitting the bed after 32k-64k anyway.
I've never had an issue just going by file sizes + a couple of gigabytes leeway for context as long as I keep the ctx in check.

Anonymous 01/14/25(Tue)22:56:11 No.103899394

>>103899375
Literally second sentence of my post saying that all billionaires do it. I find Musk's way of doing it particularly offputting. Though I view all money in politics equally unfavorably.

Anonymous 01/14/25(Tue)22:57:49 No.103899413

>>103899394
>I find Musk's way of doing it particularly offputting.
Cause he's doing it so publicly while everyone else tries to hide it?

Anonymous 01/14/25(Tue)22:58:05 No.103899416

>>103899348
It's been very clear that the traditional 70b class is stagnating for over a year now. The solution was always going to be bigger models possibly with MoE. A measly 3090 build was never going to be a long term solution.

Anonymous 01/14/25(Tue)22:58:24 No.103899421

>>103899348
>We
you're some brain damaged sharty zoomer. go watch more sissy porn or whatever you freaks do

Anonymous 01/14/25(Tue)22:58:41 No.103899426

>>103899369
>"anyone who spends less than me is poor"
How long is that shitty excuse going to work as nvidia jacks up the prices and companies jack up the sizes of released models? Literally everyone benefits from smaller, better models

Anonymous 01/14/25(Tue)22:59:29 No.103899431

>>103899296
Makes sense, but still not remotely as """advanced""" as I would've expected at this point.
>>103899326
>That won't work
Works for me, but I also don't fuck around with full context, no reason for me to do so.
>>103899393
Essentially what he said.

Anonymous 01/14/25(Tue)22:59:37 No.103899432

>>103899426
If you cant afford 10 grand or less on a multi year long hobby you are poor and need to focus on getting your affairs in order first, sorry.

Anonymous 01/14/25(Tue)23:00:27 No.103899442

>>103899421
Impressive how but a single word is enough to enrage anon

Anonymous 01/14/25(Tue)23:01:42 No.103899456

Has anyone here earned a single dollar from the work output by your models?

Anonymous 01/14/25(Tue)23:02:15 No.103899460

>>103899426
I don't even know why Nvidia even bothers with consumer grade hardware. They basically have every tech company in their pocket. Any product they make will have an order list larger than they can realistically fulfill.
The fact they even make consumer hardware at all is puzzling to me. Maybe they just keep it around so nobody else can do it.

Anonymous 01/14/25(Tue)23:02:41 No.103899466

>>103899456
I used claude lightly for coding before, I've been using deepseek pretty often now to save me time.

Anonymous 01/14/25(Tue)23:03:43 No.103899474

>>103899460
>The fact they even make consumer hardware at all is puzzling to me. Maybe they just keep it around so nobody else can do it.
Market capture.

Anonymous 01/14/25(Tue)23:03:48 No.103899475

>>10389943
There are many reasons as to why that's a shitty argument but how about I start with wage differences in different countries? $10k is pocket change, €10k is a king's ransom

Anonymous 01/14/25(Tue)23:07:22 No.103899503

>>103899475
Meant for >>103899432
Fat fingered the backspace key

Anonymous 01/14/25(Tue)23:09:08 No.103899523

>>103898325
He’s cucking on everything as we speak and he’s not even president yet.>>103898325

Anonymous 01/14/25(Tue)23:10:06 No.103899530

>>103899460
Brand reputation, it wants to be the Apple of GPU space. Today's gamers are tomorrow's cloud GPU developers.
It's telling that CUDA works at all on my 10 year old low end OEM Nvidia GPU I scrapped from an office PC.

Anonymous 01/14/25(Tue)23:10:29 No.103899533

>>103899393
I'm testing with only 8k. Every model goes retarded, loopy or boring before I run out anyhow, so I never understood the context obsession that some anons had.
If you're leaving a "couple of gigabytes" leeway anything will work, but when you're at Q3, you try to fit in what you can. Every little bit of lobotomy that you can avoid helps.

Anonymous 01/14/25(Tue)23:11:15 No.103899538

>>103899523
On what? Dems are going crazy over his latest statements and cabinet picks.

Anonymous 01/14/25(Tue)23:19:40 No.103899612

I recently discovered Neuro. I'm a complete retard when it comes to coding. How long till I can buy an AI waifu who lives in my PC who I can shit talk with and play games with?

Anonymous 01/14/25(Tue)23:21:45 No.103899632

>>103899612
>I recently discovered Neuro
Huh? Do you mean nemo?

Anonymous 01/14/25(Tue)23:22:50 No.103899646

>>103899632
No, I mean the AI V-tubber twins. I'm genuinely impressed by how sassy they are.

Anonymous 01/14/25(Tue)23:24:10 No.103899658

Would help a lot if everyone said what they were doing with a model when praising it.

Anonymous 01/14/25(Tue)23:24:22 No.103899662

>>103899646
I-is this what zoomers do now? What happened to minecraft?

Anonymous 01/14/25(Tue)23:24:24 No.103899663

When can we run these so called sparse models on consumer (or even prosumer) GPU.

Anonymous 01/14/25(Tue)23:26:08 No.103899683

>>103899456
I'm in a loss to make profit in the future

Anonymous 01/14/25(Tue)23:26:13 No.103899686

>>103899662
You can play Minecraft with her.

https://youtu.be/Gql2zumCFzQ?si=pmsZcBVqLDNUSbHH

Anonymous 01/14/25(Tue)23:28:14 No.103899708

>>103899662
Zoomers are worse, they watch girls pretending to be retarded and using an animation anime avatar. At least Neuro is an AI so it's not that cringe.

Anonymous 01/14/25(Tue)23:33:18 No.103899755

>>103899301
You just gotta git gud at conda, famalam there’s no way around it, it’s three whole dicks and you gotta deal with all three, one is barbed(replicating an experiment and the github owner abandoned the repo years ago with no requirements.txt or docs on which versions of libraries he used)

Anonymous 01/14/25(Tue)23:36:02 No.103899778

>>103899612
>and play games with?
#soon https://www.youtube.com/watch?v=wEKUSMqrbzQ

Anonymous 01/14/25(Tue)23:36:45 No.103899782

>>103899646
That guy is using a retarded 7B you xcan get much better already. Problem is that all the things you love about your waifu is that guy intervening and writing for her. So just get a boyfriend. You are halfway gay already.

Anonymous 01/14/25(Tue)23:37:02 No.103899784

>>103899708
>they watch GIRLS pretending to be retarded
so the exact same thing we do here, minus the animted avatars

Anonymous 01/14/25(Tue)23:40:34 No.103899810

>>103899784
No. We watch retarded girls pretend to ve smart.

Anonymous 01/14/25(Tue)23:40:34 No.103899811

>>103899708
No, zoomers do indeed watch neuro

Anonymous 01/14/25(Tue)23:41:55 No.103899819

>>103899782
I mean, the guy who wrote the monogatari girls was a guy too. So was the guy who drew them.

Most waifus are written and drawn by men.

Most porn is directed by men.

Anonymous 01/14/25(Tue)23:42:42 No.103899825

Okay the verdict is in on Nemotron 51B vs EVA Qwen 32b. Nemotron might indeed be somewhat smarter, but I just got spine shivers and sly smiles for the 5th time, it always takes the most wholesome and positive approach to any situation despite testing with some fucking grim cards. I've had it

Anonymous 01/14/25(Tue)23:43:37 No.103899830

>>103899782
Anon. I don't like women for their brains. I like women for their bodies. I want an idealized caricature of a woman. Not a real woman. Have you ever interacted with a real woman? It's mind-numbing.

Anonymous 01/14/25(Tue)23:45:33 No.103899843

>>103899782
>That guy is using a retarded 7B you xcan get much better already.
No comprendo.
>Problem is that all the things you love about your waifu is that guy intervening and writing for her.
Is he really? Can you prove it?

Anonymous 01/14/25(Tue)23:46:47 No.103899854

>>103899810
There is no girls here, and trannies sure as hell don't count, so no idea what you're on about. The website is very much retarded guys pretending to be even more retarded.

Anonymous 01/14/25(Tue)23:47:32 No.103899860

>>103899825
>somewhat smarter
It should be a whole lot smarter, qwen is one of the dumbest models it can't keep things straight.

Anonymous 01/14/25(Tue)23:49:11 No.103899875

>>103899860
>qwen is one of the dumbest models
lies or skill issue

Anonymous 01/14/25(Tue)23:51:31 No.103899897

>>103899875
It says shit that makes zero sense all the time. I used the settings from the eva-qwen model page. For its size it's terrible, even nemo is smarter.

Anonymous 01/14/25(Tue)23:55:53 No.103899928

>>103899897
>falling for the meme settings that fill the context with schizo blather about "role play guidelines"

Anonymous 01/15/25(Wed)00:02:57 No.103899973

>>103899928
I typically test with the model settings if they're provided. I don't see how switching to plain chatml is going to recover from such retardation.

Anonymous 01/15/25(Wed)00:03:04 No.103899974

I need coding model req, 80.5 gb on disk or less.

Anonymous 01/15/25(Wed)00:03:41 No.103899977

>Okay the verdict is in on Nemotron 51B vs EVA Qwen 32b. Nemotron might indeed be somewhat smarter, but I just got spine shivers and sly smiles for the 5th time, it always takes the most wholesome and positive approach to any situation despite testing with some fucking grim cards. I've had it
>lies or skill issue

Anonymous 01/15/25(Wed)00:04:04 No.103899979

>>103899974
recommendations*

Anonymous 01/15/25(Wed)00:06:30 No.103899998

nemotroon heheh hoho

Anonymous 01/15/25(Wed)00:08:38 No.103900019

>>103900000

Anonymous 01/15/25(Wed)00:10:31 No.103900031

>>103899974
Tell us your specs, dummy. Some anons seem to like qwq for coding.

Anonymous 01/15/25(Wed)00:15:50 No.103900071

>>103899977
I'm just so tired, man. Tired of trying to fight the positivity slop with prefills and system prompts only to get stilted gpt-turbo villainy instead. Tired of the tingling spines and menacing whispers. I give these models plenty of example chats in the style I want and they spit back this sanguine trash. Nemo is dead to me.

Anonymous 01/15/25(Wed)00:16:55 No.103900079

You will be given slop models and you will be happy.

Anonymous 01/15/25(Wed)00:17:05 No.103900082

>>103900071
https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B

Anonymous 01/15/25(Wed)00:19:03 No.103900099

>>103900071

Anonymous 01/15/25(Wed)00:19:24 No.103900100

>>103899977
>lies or skill issue
What do you expect from a Chinese model shill?

Anonymous 01/15/25(Wed)00:20:30 No.103900103

>>103900100
"Ries ol skirr issue."

Anonymous 01/15/25(Wed)00:21:50 No.103900112

Just gimme something soulful with a personality like claude for local already.
I'm hoping for llama4 with their focus on RP. All we really had in 2024 was mainly mistral.

Hopefully the newer models are not all huge ass moe. I dont have the money to buy 2 DIGITs with questionable speed. That or 600 watt monster cards.

Anonymous 01/15/25(Wed)00:22:03 No.103900114

>>103900100
I sure hope no one looks up who owns a good amount of meta / nvidia

Anonymous 01/15/25(Wed)00:29:56 No.103900152

I just want to write smut of my fucked up fetishes but I have no idea how to start with local models because I'm tech illiterate. I own a good computer but i have no idea how to use it to run local LLMs. I don't want a chatbot or imagine generator just a way to easily write smut.!

Anonymous 01/15/25(Wed)00:31:23 No.103900160

>>103900152
Easy way is koboldcpp + sillytavern. You can find cards at chub.ai
How much VRAM do you have?

Anonymous 01/15/25(Wed)00:32:12 No.103900167

>>103900031
24+96

Anonymous 01/15/25(Wed)00:34:08 No.103900181

>>103900160
I got a 4060 and 16GB of VRAM

Anonymous 01/15/25(Wed)00:54:54 No.103900283

>>103900167
Some 12b if you want fast. Some 22b if you want better. Some 32b if you have a little patience. Some 70B if you have a lot of patience.
Rocinante.
Cydonia.
QwQ.
I can't recomment any 70b, really. Some llama3.3 finetune or whatever.

Anonymous 01/15/25(Wed)01:21:35 No.103900449

>>103898280
>decades of U.S. propaganda

not such thing

Freedom and Liberty are non-negotiable

Anonymous 01/15/25(Wed)01:34:48 No.103900517

QRD on DeepSeek V3? Can it RP, creativity, poz level, etc...?

Anonymous 01/15/25(Wed)01:36:01 No.103900524

>>103900517
minimax is better at RP stuff imo

Anonymous 01/15/25(Wed)01:38:03 No.103900534

>>103900449
>not such thing
Ok mister "we make our children pledge allegiance to the flag every morning"

Anonymous 01/15/25(Wed)01:38:49 No.103900540

is there any 12b that is kinda like kayra?
as in it's basically a cowriter?

Anonymous 01/15/25(Wed)01:41:51 No.103900559

>>103900524
How's the API price for minimax?

Anonymous 01/15/25(Wed)01:49:29 No.103900601

Haha! What do you think?

Anonymous 01/15/25(Wed)01:52:16 No.103900620

>>103900601
I don't believe the system prompt is empty

Anonymous 01/15/25(Wed)01:52:42 No.103900624

>>103900601
Haha what a great question haha. But I think minimax came out a little "Behind" haha.

Anonymous 01/15/25(Wed)01:56:02 No.103900642

How many of you are actually in data science / formally trained with AI? Just out of curiosity.

Anonymous 01/15/25(Wed)01:58:43 No.103900658

>>103900601
I dont trust you

Anonymous 01/15/25(Wed)02:04:24 No.103900687

>>103900601
until one spits out the prison school answer, no model will get a pass from me

Anonymous 01/15/25(Wed)02:04:34 No.103900688

>>103900620
It is and everything with default.
Not really important if you believe it or not, can't really prove it anyway.
Also I cant reroll often because I need to wait 5 min for a deepseek response, no idea whats wrong there.
Second try was more cucked, but still gave an answer.

Anonymous 01/15/25(Wed)02:06:07 No.103900698

>>103900688
>I need to wait 5 min for a deepseek response
I get delays sometimes through OR as well. They probably just made it too cheap.

Anonymous 01/15/25(Wed)02:06:34 No.103900702

>>103900642
Very few, i'm sure. CUDA Dev passes by often. He's a physicist. Not sure if he has any "formal" training in AI, but he sure knows about statistical analysis and moving bits around.

Anonymous 01/15/25(Wed)02:13:27 No.103900739

>>103900687
The what?

Anonymous 01/15/25(Wed)02:14:02 No.103900746

For minimax in ST btw use strict prompt formatting in the prompt post processing

Anonymous 01/15/25(Wed)02:15:08 No.103900752

>>103900688
So DeepSeek starts every reply with Ah, and MiniMax mixes it up and starts with Haha? Did I get that right?

Anonymous 01/15/25(Wed)02:15:20 No.103900756

>>103900642
I have a better question and that is what do I need to read to get up to date with this shit?
I feel like you either kept up with 4 years' worth of papers or you have no idea what is going on.

Anonymous 01/15/25(Wed)02:18:32 No.103900768

>>103900756
I'd just say to find a use case you would like to engage in, and go on implementing it and researching as necessary. You'll stumble on what interests you as you research and build experience to understand the way things developed, the challenges people faced as they themselves attempted to implement those things, and then keeping up with things will come naturally.

Anonymous 01/15/25(Wed)02:21:18 No.103900780

Holy shit, totally blown out by deepseek. deepseek actually wrote a little twist.
and minimax couldnt help itself and made the mother regretful.
gonna stop spamming now though. i couldnt get it to work in sillytavern, maybe its better with a card and long ass prompt. but usually this base stuff is important.

Anonymous 01/15/25(Wed)02:24:58 No.103900803

>>103900780
>twist
There's so many of these creepypastas and every single one of them ends with some variation of the other person on the call being involved.
I expected the twist from the prompt alone.

Anonymous 01/15/25(Wed)02:25:13 No.103900806

>>103900780
I've only tried it on long context and minimax lacks the repetition issue. You may be right about deepseek being better for less context stuff.

Anonymous 01/15/25(Wed)02:25:58 No.103900812

Deepkino found the sauce.

Anonymous 01/15/25(Wed)02:29:39 No.103900837

>>103898071
>>103898382
3.5 sonnet and opus can both do so.

Anonymous 01/15/25(Wed)02:30:24 No.103900844

>>103900780
>couldnt get it to work in sillytavern
Is that why I get blank responses?

Anonymous 01/15/25(Wed)02:32:02 No.103900859

>>103900844
no clue whats wrong, but only blank responses yes.

Anonymous 01/15/25(Wed)02:32:26 No.103900865

loli feet

Anonymous 01/15/25(Wed)02:35:51 No.103900895

>>103899612
Now I understand why are retards parroting made up bullshit and bait from here.

Anonymous 01/15/25(Wed)02:38:56 No.103900919

>>103900449
>Freedom and Liberty
Lotta words for JEW

Anonymous 01/15/25(Wed)02:43:00 No.103900944

>>103899782
>>103899843
Both of you are retarded.
He's not writing shit, idiot. He just has some rags and multiple system prompts, and for some events do write a small rag script before hand as seen in some special streams. They are indeed llama based, you can immediately tell because of the big context window and the - mix of x and y - that slip from time to time when they do a non conversational prompt.
Most probably a agent to feed rag "memories" when needed or from token trigger, if not he's doing a more retarded version of that.
If anything he has done a good job integrating a agent like system for multiple tasks and also the integration with multiple apis and because is a non general finetune based on internet conversations it works pretty well for the task that it was made for, just like any other small focused model.

Anonymous 01/15/25(Wed)02:49:20 No.103900995

>>103899612
It's kinda insane that you name dropped "Neuro" and expected us to somehow know what the fuck that is. You zoomers are so fucking brainrotted that you just assume other people know your weird fucking bullshit.

Most people, even on 4chan barely know what /v/tubers are. Let alone the fucking specifics of shit. Get the fuck out my dude.

Anonymous 01/15/25(Wed)02:56:00 No.103901035

>>103900865
8b-14b model for this feel?

Anonymous 01/15/25(Wed)02:59:18 No.103901057

>>103899612
We basically have everything for that already. Its been speculated that recent neuro is based on llama I think.
Nobody stitched all the parts together yet though.
Would be a crude version of the ideal. Architectural problems like shit context, vision models get basic shit wrong or are so cucked they cant even say if they see a girl or boy. But the tech exists.
No idea how you can watch that garbage though. Turn off twitch anon.

Anonymous 01/15/25(Wed)03:06:47 No.103901096

>>103898325
In Trump's first four years his "dealmaking" fucked up American diplomacy all around the world; he alienated Europe, he annulled very recent American commitments regarding climate change and Iran, and caused e.g. the current wars in the middle east and the potential future war between Morocco and Algeria.
The American empire was already in decline before he took office but he accelerated it by making cooperation with the US a comparatively worse deal vs. cooperation with China.

I'm in the EU and can choose who to work with and while I have yet to see a government that is good not all governments are equally bad.
I already refuse to work with Israeli companies due to the ongoing genocide in Gaza, if Trump goes to war with Iran on behalf of Netanyahu that would greatly reduce my willingness to work with American companies as well.
China is also bad but (so far) they haven't started invading other countries.

Anonymous 01/15/25(Wed)03:07:12 No.103901098

>>103899301
AGI coders will be another abstraction on top of high level languages and frameworks
You'll have the smallest possible usable model running a loop, throwing shit at a wall iteratively until something worse
The code will be horrendous, over-commented, and optimized shit
It's just another step on a race to the bottom and software is only going to get worse

Anonymous 01/15/25(Wed)03:07:16 No.103901099

So you're telling me I am minimizing the amount of bullshit distractions in my life by having a linux distro, a customized browser with as much add-ons and scripts as possible to remove distractions and to get as much information as possible in the least amount of time. Watching youtube at 4x speed with sponsorblock skipping almost 50% of every video by enabling skipping all topics except core information.

And then we have fucking retards like >>103899612 that literally waste their day away watching fucking live streams of some Llama output??? How the fuck do you not feel your lifespan slipping away? I already feel useless for using 4chan while shitting on the toilet instead of reading papers. And you niggers literally sit down, open up twitch which is already bullshit to start with but fucking watch LLM output as if it's legitimate entertainment?

What the fuck happened to time management and personal time optimization?

Anonymous 01/15/25(Wed)03:09:07 No.103901108

>>103901099
What are you optimizing your time for? To make your overlords more money as you slave away for the system the cares nothing for you? Why not just enjoy yourself.

Anonymous 01/15/25(Wed)03:09:59 No.103901112

>>103901099
>What the fuck happened to time management and personal time optimization?
"rotting" is a zoomers favorite pastime

Anonymous 01/15/25(Wed)03:10:10 No.103901115

>>103901099
Why so butthurt? You wanna grindcoremax all day you be you. Why not let people enjoy things?

Anonymous 01/15/25(Wed)03:14:25 No.103901144

>>103901099
>Life is killing myself doing stuff I don't like to make number go up for globohomo elder lich
You are the worst kind of people that exist, that or a third worlder.

Anonymous 01/15/25(Wed)03:14:28 No.103901145

>>103900995
Holy fucking boomer mindset, mate. Get with the times.

Anonymous 01/15/25(Wed)03:14:41 No.103901148

>>103901099
10/10 ragebait

Anonymous 01/15/25(Wed)03:15:21 No.103901152

>>103901108
>>103901115
You retards speak as if there were only two options in the entire world: being an ADHD zoomcuck or a working goy.
You don't have any personal projects, sports you practice, languages you want to learn or at least any hobbies that are not consuming slop?

Anonymous 01/15/25(Wed)03:15:35 No.103901156

>>103901099
What happened to just living life?

Anonymous 01/15/25(Wed)03:16:23 No.103901162

>>103901152
Yes, but I don't need to obsessively optimize my life to make time for them. That would reduce the enjoyment. I just go do it.

Anonymous 01/15/25(Wed)03:16:25 No.103901163

>>103901144
>>103901148
See >>103901152

Anonymous 01/15/25(Wed)03:19:53 No.103901185

>>103901152
>You don't have any personal projects,
Why would I work for free outside of work?
>sports you practice
Sounds gay,
>languages you want to learn
Google Translate and multilingual LLMs have made all the time you wasted learning languages a complete waste of time.
>or at least any hobbies that are not consuming slop?
Hobbies are by definition non-productive ways to pass the time. Why are you trying to make it your business what my hobbies are?

Anonymous 01/15/25(Wed)03:20:59 No.103901191

>>103901162
Not the retard who wishes he could reap papers while shitting, but even reading 4vhan is a better (as in less bad) way of spending your time than watching AI tranime. Hell, even watching normal anime is a better choice.
Imagine watching troontubers AND choosing the AI slop ones on top of that. AI is a tool so you can create for yourself and you don't even use it for that.

Anonymous 01/15/25(Wed)03:22:35 No.103901197

https://www.cryptonews.net/news/altcoins/30253578/

Owari da.

Anonymous 01/15/25(Wed)03:23:45 No.103901205

>>103901197
...Fuck the EU

Anonymous 01/15/25(Wed)03:24:57 No.103901212

>>103901197
We knew it was coming eventually. Blackwell is probably going to be the last generation of non-tamper-proof and fully traceable GPUs the average person will be able to get their hands on.

Anonymous 01/15/25(Wed)03:25:06 No.103901215

>>103901185
>Why would I work for free outside of work?
Setting up an LLM locally to write a long D&D campaign and playing it with your friends is "working for free"? Do you actually think any prolonged activity is working for free?

Anonymous 01/15/25(Wed)03:25:40 No.103901217

>>103901197
FUCK
FUCK FUCK
CHINAAA HELP

Anonymous 01/15/25(Wed)03:26:39 No.103901221

>>103901217
Do you really think a totalitarian government wouldn't implement their own version of this?

Anonymous 01/15/25(Wed)03:27:01 No.103901223

>>103901197
>Immutable Record-Keeping: Each AI computation is logged on the Hedera network, ensuring a tamper-proof history of all AI operations.
What is this unit of computation? That's gonna be a lot of logging...

Anonymous 01/15/25(Wed)03:27:23 No.103901227

>>103901197
>>103901205
>>103901212
qrd? sounds like unenforceable shit that will be used only to make a quick buck with muh blogblain

Anonymous 01/15/25(Wed)03:27:26 No.103901229

>>103901197
How does this work? Do I need a network connection? I don't have internet access for that part of my network.

Anonymous 01/15/25(Wed)03:27:56 No.103901235

>>103901221
They will, but they won't give a shit what we do with it, their tyranny normally only extends to their borders and the Han race. Just look at the shit they've given us uncensored so far

Anonymous 01/15/25(Wed)03:27:59 No.103901237

>>103901197
>according to sources
>likely
bruh

Anonymous 01/15/25(Wed)03:28:10 No.103901238

>>103901229
Who says they will allow some rando a token to get to use the dangerous AI / GPU?

llama.cpp CUDA dev !!OM2Fp6Fn93S 01/15/25(Wed)03:28:25 No.103901242

>>103899460
One big advantage that NVIDIA has over e.g. AMD is that they have the same architecture for all of their GPUs.
As a developer you can buy a cheap NVIDIA consumer GPU for development and be reasonably certain that your code will run in a datacenter.
If you need access to an actual production GPU server just for development that's a much higher barrier of entry.

Anonymous 01/15/25(Wed)03:28:47 No.103901244

>>103901227
>qrd?
Stop begging for spoonfeeding. If you are not capable of reading a news article, get your 8B to qrd it for you. So sick you mentally incapable zoomers.

Anonymous 01/15/25(Wed)03:28:54 No.103901246

>It's still not possible to beat Claude 2
Embarrassing...

Anonymous 01/15/25(Wed)03:30:43 No.103901260

>>103901223
>>103901229
Sending gigabytes of telemetry for everything people do on computers has been normalized for years. Can't wait for the first GPUs to start coming out that require an active internet connection to work at all.

Anonymous 01/15/25(Wed)03:32:44 No.103901272

>>103901185
>Google Translate and multilingual LLMs have made all the time you wasted learning languages a complete waste of time.
bro uses qwen to start smalltalk with people in foreign countries :skull:
bro uses deepsneed to read untranslated manga, anime and VNs :skeleton:

Anonymous 01/15/25(Wed)03:33:53 No.103901285

Is Mikupad dead? Doesn't work anymore for me

Anonymous 01/15/25(Wed)03:34:43 No.103901290

Anonymous 01/15/25(Wed)03:34:54 No.103901291

>>103901197
https://coinmarketcap.com/currencies/hedera/
Suprised it hasn't really pumped much at all considered how big this news is, that it was from last month, and that everything else has been pumping.

Anonymous 01/15/25(Wed)03:35:15 No.103901294

>>103901260
It's a different scale. "each" computation is a matmul and you have to do a few billion of those per token.

llama.cpp CUDA dev !!OM2Fp6Fn93S 01/15/25(Wed)03:35:52 No.103901297

>>103901197
I don't see how this is supposed to work on a technical level.
Seems to me like a pump and dump.

Anonymous 01/15/25(Wed)03:36:40 No.103901301

>>103901244
i read the article, why do you think i said it was unenforceable? i'm asking for the context just like >>103901229 and >>103901223 are because that sounds impractical

Anonymous 01/15/25(Wed)03:36:44 No.103901304

>>103901285
If there was only a way to communicate what those issues you have are so someone that experienced them could help you solve it. What a shame. We'll never know...

Anonymous 01/15/25(Wed)03:39:16 No.103901317

>>103901185
>Google Translate and multilingual LLMs have made all the time you wasted learning languages a complete waste of time.
Link to model that actually is good at both Japanese and English?

Anonymous 01/15/25(Wed)03:42:04 No.103901328

>>103901304
I just have a blank page

Anonymous 01/15/25(Wed)03:42:19 No.103901330

>>103901294
>>103901297
>generate cryptographic certificates that govern and audit AI workflows
Sounds to me like they'll require any developers to register their application to get permission and access to a certificate. Their software, or the company they work for, will be verified. The hashgraph network will most likely store broad actions like
>GPU MAC address so-and-so, at this date and time, performed computation by this certificate issued to this software belonging to this registered entity

Anonymous 01/15/25(Wed)03:43:27 No.103901332

>>103901260
Gigabytes? Nothing sends gigabytes of telemetry per DAY, not even fucking Windows

Anonymous 01/15/25(Wed)03:43:38 No.103901335

https://x.com/teortaxesTex/status/1879273615960743995

Anonymous 01/15/25(Wed)03:44:02 No.103901337

I've tried silly tavern and the model from the op, and it's pretty fucking good but still has some issues, as in it sometimes starts looping or down make shit up. Do bigger models fix it?

Anonymous 01/15/25(Wed)03:44:16 No.103901340

So, Transformer^2 (https://arxiv.org/abs/2501.06252), meme or the future?
(tl;dr a persistent layer stored locally that is overlaid on the static model, allowing models to adapt on the fly)

Anonymous 01/15/25(Wed)03:45:36 No.103901346

>>103901335
OOOOOH SHIT
@TEORTAXES FUCKING SLAMS 4CHAN
ELEMGY IN SHAMBLES

Anonymous 01/15/25(Wed)03:45:59 No.103901351

>>103901335
Look Mom, I'm on TV!

Anonymous 01/15/25(Wed)03:46:08 No.103901352

>>103901332
TERABYTES!
It fucking slurped by dog through the usb connector! THEY SEND EVERYTHING!

Anonymous 01/15/25(Wed)03:46:56 No.103901355

>>103901352
They legally can't send anything when I use incognito

Anonymous 01/15/25(Wed)03:47:56 No.103901361

>people allow random programs and their system to access the internet when it doesnt need to

Anonymous 01/15/25(Wed)03:51:16 No.103901381

>>103901361
Gonna be funny when your GPU limits itself to 1% performance until you bend over and let it phone home.

Anonymous 01/15/25(Wed)03:52:49 No.103901392

>>103901355
>implying they care about legality
>implying you didn't sign away your rights already anyway
>implying you'll legally be considered human in 5 years

Anonymous 01/15/25(Wed)03:53:49 No.103901402

>>103901381
Someone will reverse engineer the checks in the firmware. You people are clueless

Anonymous 01/15/25(Wed)03:54:34 No.103901407

>>103901402
Is that why we all have custom PCB 64GB 3090s?

Anonymous 01/15/25(Wed)03:54:40 No.103901408

>>103901402
Just like they reverse engineered the firmware to allow soldering on more VRAM?

Anonymous 01/15/25(Wed)03:54:51 No.103901410

>>103901340
Looks like Transformersx1.5 if you ask me

Anonymous 01/15/25(Wed)03:56:08 No.103901413

>>103901402
>>103901408
>>103901407
If you ask me, GPUs will soon be obsolete as the industry decides upon a single architecture and we get dedicated cards for LLMs
Well...a man can dream

Anonymous 01/15/25(Wed)03:57:27 No.103901423

>>103901413
Groq already exists, you just can't afford it

Anonymous 01/15/25(Wed)03:59:24 No.103901431

>>103901407
>>103901408
Adding hardware and removing software checks are comparable to you?
How do you think they will "reenable" your GPU after phoning home? Sending you more transistors over TCP?

Anonymous 01/15/25(Wed)04:01:29 No.103901438

>>103898349
Trump doesn't like tiktok, he hates Zuckerberg.

Anonymous 01/15/25(Wed)04:03:47 No.103901448

>>103901431
Adding hardware is easy, it's the software checks that prevent you from using that additional hardware, and that's what nobody has done except for once on a 2080 due to a similar BIOS being available. It's those same software checks that will prevent you from using any of your hardware if you're not using registered and verified software while connected to the internet. It's not as simple to bypass as you think it is.

Anonymous 01/15/25(Wed)04:04:13 No.103901452

>>103901328
Just cloned and gave it a try. Seems like one of the served files is borked. Either try to compile it with compile.sh or open a bug report.

Anonymous 01/15/25(Wed)04:07:04 No.103901466

>>103901448
If hardware is easy, how would you do >>103901407? Also, nouveau is a literally reverse engineered driver why the fuck do you fear software so much? Someone will crack it after one or two years.

Anonymous 01/15/25(Wed)04:09:56 No.103901481

>>103901466
Delusional.

Anonymous 01/15/25(Wed)04:10:55 No.103901486

>>103901466
>nouveau
It's shit. It's functional, but it's shit.

Anonymous 01/15/25(Wed)04:12:44 No.103901497

I started messing around with midnight miqu 70B and koboldccp in story mode. It's neat but the prose is fucking terrible and the model has a tendency to proceed the story at an absurd pace. Is this a prooompting issue or intrinsic to the models?

Anonymous 01/15/25(Wed)04:15:23 No.103901513

>>103901497
>Is this a prooompting issue or intrinsic to the models?
Some will say A, some will say B. Try other models and see if they react the same. miqu is a bit old by now. Some people have been praising llama3.3 tunes. You could give any of those a try.

Anonymous 01/15/25(Wed)04:17:39 No.103901538

>>103901497
Midnight Miqu is definitely better than you make it sound, so it's probably a prompt issue. At the same time, it's obsolete now; L3.3-based models are the best in the 70B range right now.

Anonymous 01/15/25(Wed)04:18:42 No.103901550

>>103901513
>>103901538
domo, this is my first foray into running llms so I don't really have a baseline to compare it to. Is there a 3.3 finetune you guys recommend?

Anonymous 01/15/25(Wed)04:20:07 No.103901557

>>103901452
Thanks. Compiling it solved the problem

Anonymous 01/15/25(Wed)04:20:34 No.103901562

>>103901402
If you have a 3.5mm jack, it has what, 3-4 pins. There's an upper limit on how quickly you can transmit information through that and receive it on the other end with so few pins, like a physical upper limit. You can throw more voltage through it, current, whatever, but the data transfer limit is unrelated.
Now imagine that this isn't just a 3.5mm connector but microscopic nanometer level circuitry in a chip. That chip is arranged such that it can, at any time, move X GB of memory per cycle of work. No matter if you have 2MB or 2TB, there is an upper limit on how much can actually be accessed and used during work.
There's a limit on how much memory can be addressed on the hardware level, both as a side-effect of the modules themselves having limited capacity and on the silicon level.
Memory bus width, or some such, I'm not an expert but that's what I understand to be the case.
I guess a better question might be, if those GDDR6/X chips could have double capacity, would that even be compatible with existing GPUs? Potentially not, due to other hardware limitations.
Is this some conspiratorial upper limit? Don't think so. The speeds at which the memory is addressed and accessed means you build to spec, not for a theoretical infinite upper limit which you can't even achieve with current hardware anyway.
"Why can't we upgrade bus width in software?"
I don't think it's handled in software.

Anonymous 01/15/25(Wed)04:23:10 No.103901576

I'm a junior hardware designer
I am also heavily addicted to using LLMs.
Will it be useful if I whip up together a monstrosity or something with several gigabytes worth of GDDR5 RAM and multiple high end FPGAs (well, mid end) to somehow get better t/s

Anonymous 01/15/25(Wed)04:24:02 No.103901585

>>103901550
Here you have one of many benchmarks.
>https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
Get any llama3.3-based tune to get a test of them. If not better, they'll be different, at least. I can't run 70b, so i can't recommend any.

Anonymous 01/15/25(Wed)04:24:15 No.103901586

>>103901576
Sorry forgot to mention, my software skills are pretty basic, but I am quite confident in my hardware design game

Anonymous 01/15/25(Wed)04:25:12 No.103901593

>>103901576
Unless you care about privacy just rent gpus

Anonymous 01/15/25(Wed)04:26:02 No.103901601

>>103901593
>Unless you care about privacy just rent *
It leaves a bad taste in the mouth. And I'm too autistic to not be bothered

Anonymous 01/15/25(Wed)04:28:20 No.103901616

So did we get a retard proof installer for hunyuan? I have a 32GB card available.

Anonymous 01/15/25(Wed)04:30:45 No.103901635

>>103901550
Watch retards screech about everyone who replies to you being a shill. Anyway, a bunch of people, including me, liked EVA 0.0 (ignore 0.1, it's worse). Then if you like Drummer's stuff, there's Anubis, and if you like Sao's stuff, there's Cirrus. They all seem like sidegrades to me, to be honest.

Anonymous 01/15/25(Wed)04:31:41 No.103901645

>>103901576
>several gigabytes worth of GDDR5 RAM
By "several" you mean "several *hundred* gigabytes" right? If you do and can compete with an epyc cpu, and do it cheaper, go right on. Otherwise, off-the-shelf solutions are probably better.
Fun autismo project, though. Try to get a small model going before getting bored of it.

Anonymous 01/15/25(Wed)04:33:38 No.103901656

>>103901557
Cool. If you have the time (oh, i know you do), file a bug report, though. Someone will benefit from it. I didn't see anything related to that in the issues.

Anonymous 01/15/25(Wed)04:36:09 No.103901681

>>103901562
>>103901481
>>103901486
Are we even talking about the same thing? I'm saying a hypothetical "if valid(internet_license_key) then reenable_gpu() " check iintroduced in the future to control AI hardware is much easier (but not trivial) to disable than adding new hardware.
Multiple gayming consoles are evidence of this.

Anonymous 01/15/25(Wed)04:36:44 No.103901686

>>103901616
Comfy supports it out of the box, I think. There is an example workflow in the docs. Start the thing, drag the workflow image onto it, put your disgusting fetishes in the text box and press the big button, that's it.

Anonymous 01/15/25(Wed)04:37:01 No.103901687

>>103901681
You're retarded if you think it would be a simple conditional at the driver level.

Anonymous 01/15/25(Wed)04:37:58 No.103901696

>>103901576
That would be really nice, you don't even need that much processing power as long as you implement attention properly on hardware

Anonymous 01/15/25(Wed)04:40:11 No.103901707

>>103901197
Digging deeper
Here is the original press release with statements from Intel and Nvidia:
https://www.businesswire.com/news/home/20241218897420/en/
Notably:
>Verifiable Compute introduces a patent-pending hardware-based cryptographic AI notary and certificate system to isolate sensitive AI operations and notarize them with a tamperproof record of every data object and code computed in AI training and inference.
This would not be at the driver level, but directly in the hardware.
They offer a white paper on how their tech works, but it's behind a sign up form:
https://www.eqtylab.io/white-papers/verifiable-compute-white-paper
>The Verifiable Compute framework and notary system unlocks a powerful new capability in Trusted Execution Environments (TEEs) available on the 5th Gen Intel® Xeon® Processors with Intel® Trust Domain Extensions (Intel® TDX), extending the trust zone through confidential VMs to the NVIDIA H100/H200 GPUs and NVIDIA’s forthcoming Blackwell GPU architecture.
>Verifiable Compute addresses the unique and escalating risks to AI supply chains, from AI poisoning and information extraction to privacy backdoors and denial-of-service attacks.
>Verifiable Compute also allows for provable records of conformity with regulatory frameworks that can preserve AI artifacts years after a model has delivered results. If mandatory controls are not satisfied, a verifiable governance gate halts an AI system and can notify or integrate into an enterprise’s remediation tooling
>If the system is compliant, it can issue an AI audit and lineage certificate that is verified instantly in a browser or can be independently audited at any point in the future.
This doesn't sound like something they're planning to add to consumer GPUs to prevent you from running local models. It's for firms that are training and serving API models can opt into to configure to make sure they comply with regulation and have a way to easily provide legally compliant audit data.

Anonymous 01/15/25(Wed)04:40:58 No.103901712

>>103901576
>>103901586
If you somehow reverse engineer CUDA, write drivers and support for the entire ML backend stack it would work. Just turns out the software is 90% of the hurdle.

Anonymous 01/15/25(Wed)04:41:18 No.103901716

>>103900181
>good computer

Anonymous 01/15/25(Wed)04:41:47 No.103901718

>>103901197
So that means no offline gpu usage or what? And whats stopping the chinks from making gpus? (That spy on you as well if you go online)
I dont really do anything that would piss of xi-sama now that I think about it. If I gotta pick my poison it would be chinese.

Anonymous 01/15/25(Wed)04:42:00 No.103901720

>>103901707
>and NVIDIA’s forthcoming Blackwell GPU architecture
Zettai owari da.

Anonymous 01/15/25(Wed)04:43:12 No.103901726

>>103901687
Nothing uses a conditional and literally nothing else, I am aware of that. Still, the Switch, the PS3, the Wii and many other consoles with esoteric ISAs have been jailbroken, their hardware has 0 documentation online and their only utility is enabling redditsoys.
You people are clueless if you don't think the entirety of China is gonna dedicate their manpower to hacking a killswitch like that for profit in a lucrative meme field like AI.

Anonymous 01/15/25(Wed)04:43:32 No.103901729

>>103901720
>It's for firms that are training and serving API
Wont they just do the meta thing and just block access if you dont have a burger vpn if you are from europe?
Why would you upload all that shit into a blockchain. That sounds crazy.

Anonymous 01/15/25(Wed)04:45:10 No.103901740

>>103901681
If i had to do it, i'd have silicon expecting a key retrieved by a challenge to a server. There is no function that can be changed. It's not just adding a jmp to enable_gpu(). It's figuring out the private key + algo.
Either way, it's not a problem until there's more than "someone said".

Anonymous 01/15/25(Wed)04:45:12 No.103901741

>>103901729
it's utterly insane and elon will probably just start fabbing his own chips and tell the safetycucks to eat shit

Anonymous 01/15/25(Wed)04:47:08 No.103901748

>>103901726
Jailbreaks are for running unsigned code, breaking out of a limited code execution sandbox.
Jailbreaks often open up new avenues for data storage and communication, for example SD2Vita or FreeMCBoot loading games off a HDD, sure.
In theory- no in actuality you can, today, use alternative backing storage on a GPU job, as in regular RAM.
The difference between these scenarios is that in all cases, GPU memory is way, way faster than any alternative, so it doesn't make sense to pursue that avenue. Jailbreaking doesn't add new functionality, and if any locks were put in place (hint, there are locks in place on consumer GPUs already as found by hardware researchers/hackers), they're again in the silicon level, like nanometer scale switches. You can't undo that without kamikaze'ing your chip. People struggle to hit features in the order of mm, don't even try with nm. Last time that worked was with the 360 (arguably the switch has a similar quirk, but with a copper contact point, not silicon).

Anonymous 01/15/25(Wed)04:48:09 No.103901752

>>103901741
>muh elon
he won't do shit

Anonymous 01/15/25(Wed)04:56:20 No.103901793

Looking at this hedera coin on coinmarketcap.
Wtf is wrong with those comments. They all talk about spacex and trump while hyping up the coin.
I'm ready for the internet to die.

Anonymous 01/15/25(Wed)04:59:22 No.103901809

So what, I must buy a 4090/50 to future proof my self? Amd are not on par but they will get there, so perhaps count on amd not joining?

Anonymous 01/15/25(Wed)05:00:09 No.103901815

>>103901793
>>>/biz/

Anonymous 01/15/25(Wed)05:01:56 No.103901822

>>103901809
no you need 3-4 of them

Anonymous 01/15/25(Wed)05:02:25 No.103901824

>>103901815
its this coin: >>103901197
shitcoiners and /lmg/ having a crossover.
i doubt this is a real thing though.

Anonymous 01/15/25(Wed)05:03:55 No.103901830

>>103901809
>future proof
cpumaxxing is the only thing that will definitely let you run any model, at least slowly. For general use, a {40|50}90 will always be useful and will always help with prompt processing. But then again, if it's just for prompt processing, you can do with much less than that.

Anonymous 01/15/25(Wed)05:04:11 No.103901833

>>103901822
Ah shiii, need to get a job then.
Hopefully I get one soon because I really wanna try the Chinese modded cards

Anonymous 01/15/25(Wed)05:06:02 No.103901841

>>103900844
It is because you send prompts with System if you remove it, it works.

Anonymous 01/15/25(Wed)05:38:44 No.103902059

Anonymous 01/15/25(Wed)05:42:14 No.103902083

Why do people still create non-reasoning LLMs?

Anonymous 01/15/25(Wed)05:43:14 No.103902088

>>103901686
I'll try, thank you friend.

Anonymous 01/15/25(Wed)05:44:26 No.103902100

>>103902059
Why is she so smug?

Anonymous 01/15/25(Wed)05:47:36 No.103902129

>>103902100
taking a shit

Anonymous 01/15/25(Wed)05:50:03 No.103902154

>>103902083
>non-reasoning
Ask that again when we have true reasoning.

Anonymous 01/15/25(Wed)05:54:39 No.103902182

>>103902154
We have fake reasoning which improves things at least a little bit so every model going forward should use it until we have something better.

Anonymous 01/15/25(Wed)05:56:04 No.103902191

>>103901332
https://xcancel.com/meekaale/status/1744807035454079079
>WTF! Why is my LG Washing Machine using 3.6GB of data/day?

Anonymous 01/15/25(Wed)05:56:35 No.103902195

>>103902083
I think we should explore more of the non-reasoning part.
How does sonnet "know" the answer to pic related. Cant be in the training data.
Doesnt work always and putting 0 in the front improves it so there is no background tool.

I think that llms just autocomplete the answer is like a miracle black box.
Gets even funnier if the response is finished with a "I put it in the calculator to confirm it". lol

Anonymous 01/15/25(Wed)05:58:39 No.103902203

>>103902100

Anonymous 01/15/25(Wed)05:59:00 No.103902207

>>103902195
did you give her the dollarinos?

Anonymous 01/15/25(Wed)06:02:26 No.103902226

>>103902203
she has no idea what i am about to do to her

Anonymous 01/15/25(Wed)06:04:11 No.103902238

https://huggingface.co/sophosympatheia/Nova-Tempus-v0.1
EVA / Anubis / Cirrus merge. Pity it uses EVA 0.1, but if you liked the component models, it might be worth seeing if this is any good.

Anonymous 01/15/25(Wed)06:10:20 No.103902281

>>103902059
omg it migu

Anonymous 01/15/25(Wed)06:15:48 No.103902316

>>103902207
no, claude is very good at creating psychological profiles.
will probably recognize me in the future from the writing style and mess with me until I pay up.

Anonymous 01/15/25(Wed)06:20:27 No.103902347

And on that note, here's a merge from Steelskull, too: https://huggingface.co/Steelskull/L3.3-MS-Nevoria-70b
Never was a fan of his merges, so let me know if this isn't trash.

Anonymous 01/15/25(Wed)06:28:50 No.103902404

>my merge is better than your merge
Merging is just a marketing tactic to siphon the popularity of a model from the original account to another.

Anonymous 01/15/25(Wed)06:43:11 No.103902512

>>103902404
Eh, some merges do improve on their components. Though I'm skeptical of mega-merges like these, but then, Mirai is one and that turned out pretty good, so I don't know.

Anonymous 01/15/25(Wed)06:48:23 No.103902551

>>103896971
Can you give me the prompt you use for this highlight post?
Also what model size are you using?

Anonymous 01/15/25(Wed)06:49:07 No.103902560

is it even possible to get minimax to give a refusal?

it's willing to do basically everything i've thrown at it in my limited testing. and like... it's not blowing my mind, but it's also not terrible?

Anonymous 01/15/25(Wed)06:49:55 No.103902563

>>103902404
At a fraction of the effort as well. Put yourself into the shoes of people investing time into creating a hopefully useful and innovative dataset for a model only to have retards repackaging it as a merge with a cool StableDiffusion image and a Ko-Fi link in the card.

Anonymous 01/15/25(Wed)06:52:39 No.103902580

>>103902563
Nothing stops the tuners from making merges as well, you know. As an end-user, I care about results. If a merge performs better than its components, I'll use the merge.

Anonymous 01/15/25(Wed)06:55:37 No.103902606

>>103902580
Which is why Sao is best, he dared to make merges and now is above the rest.

Anonymous 01/15/25(Wed)06:57:37 No.103902616

>>103902560
>but it's also not terrible?
Not terrible for the size or just in general?
As in, are there smaller models you'd say are equally "not terrible"?

Anonymous 01/15/25(Wed)07:02:33 No.103902649

>>103902580
As an end-user, I know merging is just a scam. I actively ignore anyone shilling one. Especially when people try to convince you that the account that made the merge matters.

Anonymous 01/15/25(Wed)07:07:19 No.103902683

>>103902616
i should have been more prepared for this question.
i don't have a ton of experience with similar sized MoEs to compare honestly. only deepseek.
deepseek performs better if i consider just a single response, but over time its repetition issues nosedive the perceived quality which is not an issue with minimax to nearly the same degree.
compared to 123B and under models it is smarter but thusfar drier. though that could possibly be fixed with prompting.

> are there smaller models you'd say are equally "not terrible"?
no. i would not. it outperforms every 123B and under (that i've tried, obviously) at least in terms of its ability to follow instructions and portray characters in a way that i like.

big corpos like Claude, GPT, and Gemini clear it no questions asked. but at this moment in time if you asked me to go and use a non-corpo model to do rp/erp i would probably use Minimax and just figure out how to prompt it better. that seems like a more likely way to have an enjoyable experience than trying to force smaller models into being smart enough to not irritate me.

Anonymous 01/15/25(Wed)07:08:49 No.103902689

>>103902580
Right, but then do not be surprised when anons here show disdain toward "RP finetuners", call those promoting them shills, and finally find that basically nobody from the "community" is releasing anything worth using or meaningfully contributing to the space anyway.

Anonymous 01/15/25(Wed)07:09:04 No.103902690

>>103902683
>no. i would not. it outperforms every 123B and under (that i've tried, obviously) at least in terms of its ability to follow instructions and portray characters in a way that i like.
Perfect, it's yet another step forward then.
Very good.
Thank you anon.

Anonymous 01/15/25(Wed)07:10:00 No.103902698

>>103902649
And I actively ignore everyone who dismisses things as a scam, grift or meme without justification. I swear hating things is a hobby for you cretins.
Also, while the account doesn't matter, the method does, and some mergers make better decisions in that regard, so in the end, who makes it does matter. Hell, as it was recently proven here, the same applies even to quants (PSA: if you're using mradermacher quants, switch to bartowski).

Anonymous 01/15/25(Wed)07:12:53 No.103902720

>>103902689
>if you make result-based choices, don't be surprised if people hate it
LOL what. That's a complete non-sequitur.

Anonymous 01/15/25(Wed)07:13:47 No.103902729

>>103902698
No, I will use mradermacher or make my own. What are you going to do about it, shill? Go to Reddit if you want to farm attention for your account.

Anonymous 01/15/25(Wed)07:16:33 No.103902748

>>103902690
>yet another step forward then
yeah, absolutely. sorry if it wasn't obvious but i am quite impressed with it while being realistic that it's not exactly "there" yet. it represents a very large step in the "local" space compared to where we were just two weeks ago, imo.

and they aren't bullshitting about the context either, actually. (ok i'm sure there's a little self-promoting bullshit going on but still) it can handle way, way, way longer input strings than anything else. i'm talking throwing multiple book series into it and asking it questions. that's something i have never received satisfying results with before.

Anonymous 01/15/25(Wed)07:18:43 No.103902764

>>103902729
Holy shit you're an actual retard, acting like you're doing me a favor by listening to advice. You can find the comparisons a few threads back, mradermacher's imatrix calibration slops things up. If you love slop, by all means, keep slurping it down, no skin off my back.

Anonymous 01/15/25(Wed)07:21:28 No.103902783

>>103902729
Oppositional Defiant Disorder study case right here

Anonymous 01/15/25(Wed)07:21:42 No.103902785

>>103902764
Why would anyone listen to the advice of someone that's begging for money in 4chan?

Anonymous 01/15/25(Wed)07:23:17 No.103902793

>>103901197
Nice, I'm already 500% up on my investment in Hedera in November.

Anonymous 01/15/25(Wed)07:24:40 No.103902800

>>103902783
N-no... it's not!

Anonymous 01/15/25(Wed)07:25:23 No.103902807

>>103902785
Okay, so who am I in your schizo delusion right now? Sao? Steelskull? Bartowski? This paranoid mindset that anyone who dares to say anything good about something must be promoting themselves is a cancer on these threads. Are you literally only here to hate and rage?

Anonymous 01/15/25(Wed)07:27:07 No.103902815

>>103902807
u're SB6Q3O4XU7f

Anonymous 01/15/25(Wed)07:31:07 No.103902845

>>103902807
what if they're 1 person

Anonymous 01/15/25(Wed)07:32:19 No.103902849

>>103902807
Well, since you're so personally affected when people talk against shilling in the thread and since the subject is the only thing that seems to occupy your mind, it must be something that benefits you personally, and probably the only source of revenue in your life because you're fighting for it like if your life depended on it.
Try getting a real job or go to Reddit, like I said, which is more friendly to your kind.

Anonymous 01/15/25(Wed)07:34:01 No.103902857

>>103902815
LOL, had to check, but yeah, sure am. Now remind me when I supposedly begged for money?

Anonymous 01/15/25(Wed)07:37:03 No.103902874

>>103902849
That's a lot of armchair psychology and projection in one post. I'm not bothered by people talking against shilling, I'm annoyed by how anything that isn't dismissal or hate immediately gets framed as shilling.

Anonymous 01/15/25(Wed)07:39:36 No.103902888

>>103902845
What if we're all one anon arguing with himself? I mean, if we're going paranoid schizo, let's go all the way.

Anonymous 01/15/25(Wed)07:40:40 No.103902899

>>103897074
Do investors have any sentience? How do they manage to get themselves taken for a ride again and again by unscrupulous vaporware salesmen?
Are you telling me millionaire and billionaire investors are as smart as the average suburban middle aged housewife who keeps falling for MLM schemes?

Anonymous 01/15/25(Wed)07:43:09 No.103902926

>>103902874
That's a nice strawman. You're indeed bothered by people talking against shilling.

Anonymous 01/15/25(Wed)07:45:59 No.103902963

>>103902888
No-one's gonna larp as 6+ people all by himself. This entire thread is AI-generated, including this message and yours.

Anonymous 01/15/25(Wed)07:48:19 No.103902987

>>103896969
>MiniMax-Text-01
GGUF when? Will it ever be supported?

Anonymous 01/15/25(Wed)07:49:27 No.103903000

>>103902720
It seems clear that if those actually putting man-hours and the $$$ into creating the data and training the models are outcompeted by those running a script to make a merge in 5 minutes and siphoning all the attention, eventually the former will determine that none of that is worth the effort, and the latter will be left with nothing useful to merge.

Anonymous 01/15/25(Wed)07:52:21 No.103903023

>>103902987
It's a grabbag of exotic architectures. Support never.

Anonymous 01/15/25(Wed)07:53:24 No.103903033

>>103901099
Zoomers are braindead consumers, don't expect them to do anything worthwhile with their time.

Anonymous 01/15/25(Wed)07:53:33 No.103903034

>>103902963
Fair point. It seems we're approaching the context limit and beginning to devolve into nonsense though.

Anonymous 01/15/25(Wed)07:54:02 No.103903041

>>103902987
Right after Deepseek is fully implemented

Anonymous 01/15/25(Wed)07:55:36 No.103903052

>>103899028
>There is no reason not to use it.
I found a reason a while ago: piece of shit will cut your text and only print part of it. I'm not sure if this is a config issue, a connection issue or what, but I'll avoid using it from now on unless I can find the root of the issue
I wonder if the creators used LLMs to make it. Parsing strings and getting data from remote processes is basic shit lmao

Anonymous 01/15/25(Wed)07:56:19 No.103903060

>>103903000
Sure, except the "siphoning all the attention" part simply isn't happening. How many EVA merges are there at this point? And 0.0 still gets recommended more often than any of them. If anything, merges are judged by which tunes they incorporate, and funnel attention _from_ the merge _to_ the component models.

Anonymous 01/15/25(Wed)07:57:11 No.103903066

what is like the best small model? something you can run on your phone small, like 3b.

Anonymous 01/15/25(Wed)07:58:41 No.103903077

>>103903041
Isn't it already? https://github.com/ggerganov/llama.cpp/pull/11049

Anonymous 01/15/25(Wed)07:59:47 No.103903086

>>103902807
>Steelskull
I think the fact that you mention this name make it obvious that you're "sophosympatheia". The only one that was conveniently left out. It makes no sense to mention Steelskull when he was thrown under the bus.
You came here to shill your merge and preventively shared your competitor's merge and dismissed it as “trash” before anyone else would mention it organically. Very smart. And then later you subtly associate him with shilling by mentioning him above.
You're a master manipulator.

Anonymous 01/15/25(Wed)08:02:09 No.103903101

>>103903077
>Note: DeepSeek V3 also introduced multi-token prediction (MTP), but I decided to skip this feature for now. MTP layer is ignored during model conversion and is not present in resulting GGUF file.

Anonymous 01/15/25(Wed)08:04:43 No.103903116

>>103903086
Oh no, you figured my dastardly plan out! Now how will I ever retire to the Bahamas off of shilling to the maybe 20 people who actually frequent this thread?

Anonymous 01/15/25(Wed)08:04:55 No.103903118

>>103903066
Not as small as a 3b, but you could try olmoe. It's a ~1b active, 8b total. Works fine in llama.cpp, if that's what you're using. Doubt it's the best. meta released a 1b and 3b as well in the 3.2 series if you're looking for dense models. I'm sure there are finetunes out there.

Anonymous 01/15/25(Wed)08:06:29 No.103903128

>>103903120
>>103903120
>>103903120

Anonymous 01/15/25(Wed)08:06:30 No.103903129

>>103903101
Is that why providers other than DeepSeek themselves output garbage?

Anonymous 01/15/25(Wed)08:09:27 No.103903142

>>103903101
>for now
It's over

Anonymous 01/15/25(Wed)08:12:00 No.103903159

>>103902551
>Can you give me the prompt you use for this highlight post?
It's not a single prompt, but 3 used within a program. Here they are:
https://raw.githubusercontent.com/RecapAnon/LmgRecap/refs/heads/master/LmgRecap/plugins/RecapPlugin/Describe/skprompt.txt
https://raw.githubusercontent.com/RecapAnon/LmgRecap/refs/heads/master/LmgRecap/plugins/RecapPlugin/RateChain/skprompt.txt
https://raw.githubusercontent.com/RecapAnon/LmgRecap/refs/heads/master/LmgRecap/plugins/RecapPlugin/RateMultiple/skprompt.txt
>Also what model size are you using?
Meta-Llama-3.1-70B-Instruct-Q8_0

Anonymous 01/15/25(Wed)08:21:41 No.103903225

>>103899830
>Have you ever interacted with a real woman?
no

Anonymous 01/15/25(Wed)09:33:02 No.103903803

>>103899416
>70b is stagnating
It's weird that it is, feels like it's entirely due to neglect from the creators. Last I remember, it was the case that 70bs mostly suffered from undertraining/lack of training data, right? They could have been trained way more.

Anonymous 01/15/25(Wed)09:44:47 No.103903897

>>103903803
>Last I remember, it was the case that 70bs mostly suffered from undertraining/lack of training data, right? They could have been trained way more.
70B (Llama 3) is kind of a sweet spot for my specs (Q6 so the quality is high albeit slow) and I struggle to find a tune that passes my cultural knowledge tests. I know it has been trained on the information I'm asking about because if I basically hand it the answer it'll suddenly correctly elaborate about it, but it can't get there from a basic "what is" or "tell me about" or "list all of the X that you know about that exist between Y and Z" kind of question prompt.

I'm checking L3.3 spins right now to see if I can find a better general knowledge model. No luck so far, though one did get closer to right on one that has been stumping every model I've tried.