4combinator

/lmg/ - Local Models General

Anonymous 01/21/25(Tue)01:49:26 | 456 comments | 54 images | 🔒 Locked

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103971523 & >>103967199

►News
>(01/20) DeepSeek releases R1, R1 Zero, & finetuned Qwen and Llama models: https://hf.co/deepseek-ai/DeepSeek-R1-Zero
>(01/17) Nvidia AceInstruct, finetuned on Qwen2.5-Base: https://hf.co/nvidia/AceInstruct-72B
>(01/16) OuteTTS-0.3 released with voice cloning & punctuation support: https://hf.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa
>(01/15) InternLM3-8B-Instruct released with deep thinking capability: https://hf.co/internlm/internlm3-8b-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/21/25(Tue)01:49:50 No.103976962

►Recent Highlights from the Previous Thread: >>103971523

--DeepSeek's mathematical capabilities and reasoning process:
>103976590 >103976654 >103976711 >103976735 >103976782
--Generating models instead of training them, and understanding LLM structures:
>103972406 >103972449 >103972480 >103972540 >103972585
--Anon praises R1-Distill-Llama-70b for its exceptional performance on image captioning task:
>103973508
--DeepSeek R1 pipeline and synthetic data discussion:
>103973925 >103973970 >103973997 >103974996 >103973971 >103973987 >103974076 >103974253 >103974101 >103974091 >103974129 >103974222
--LLM benchmark and model performance discussion:
>103975433 >103975440 >103975536 >103975555 >103975624 >103975550 >103975563 >103975872 >103975898 >103975936 >103976151 >103975597
--OuteTTS-0.3 discussion and comparison with other TTS models:
>103971948 >103972091 >103972121 >103972310 >103973064 >103973557 >103973534
--DeepSeek-R1-Distill-Qwen-32B model performance and capabilities discussion:
>103973083 >103973102 >103973122 >103973156 >103973164 >103973168 >103973189 >103973201 >103973298
--Deepseek-distill models and instruct template discussion:
>103973921 >103973967 >103974583 >103974598 >103974619 >103974797 >103974844
--LLM implementation struggles with data categorization and explicit content:
>103974929 >103974934 >103975049 >103975108 >103975163 >103975203
--Anons discuss and compare 70B and 32B R1 tunes:
>103973002 >103973009 >103973025 >103973032 >103973038 >103973029
--R1's limitations in simulating a role-playing game and generating random numbers:
>103971842 >103972041 >103972058 >103972161 >103972201
--AI model benchmark on Python AOC 2024 task:
>103974480 >103974524 >103974533 >103974685
--Logs: R1:
>103971566 >103975689 >103975859 >103976230 >103976365 >103976646
--Miku (free space):
>103972121 >103973653 >103976834

►Recent Highlight Posts from the Previous Thread: >>103971526

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/21/25(Tue)01:51:21 No.103976979

>>103976963
Also it was relatively uncensored. SD3 just came out and couldnt do yoga poses. So it was perfect timing.

Anonymous 01/21/25(Tue)01:52:33 No.103976984

>>103976979
I used flux for a bit but found the gens too sterile and still full of ai-esque artifacts (slop). Skill issue?
I feel like we still don't have "the one" for image gen.

Anonymous 01/21/25(Tue)01:52:36 No.103976986

R1 certainly is a experience.

Anonymous 01/21/25(Tue)01:52:42 No.103976988

IT'S CHEWSDAY INNIT?

Anonymous 01/21/25(Tue)01:55:18 No.103977009

2024-12-03_073641_seed437808304668291_steps30_00001_

>>103976960
>my slop was reposted
Let's goo!

>>103976988
Yes.

Anonymous 01/21/25(Tue)01:55:46 No.103977010

Is 32B R1 actually OK for coding? What do people use for that sort of thing? I only know ST really, but doesn't seem like it'd be optimal for the task? How do you make your person/card/prompt for programming tasks?

Anonymous 01/21/25(Tue)01:56:25 No.103977015

>>103976986
these fucking tips I'm dying

Anonymous 01/21/25(Tue)01:56:25 No.103977016

R1 sure is woke.

Anonymous 01/21/25(Tue)01:58:20 No.103977032

I think R1 might be too smart for my cards. Things that acted as tard wrangling now are taken to heart and acted on in unpredictable ways.

Anonymous 01/21/25(Tue)01:59:07 No.103977037

>>103976984
Try one of the nsfw finetunes with photorealism lora and lower guidance(it causes slop). For anime use noobAI.

Anonymous 01/21/25(Tue)01:59:17 No.103977040

>biden's AI executive order rescinded
does this mean much for us? realistically was there anyone in America holding back on an open model release we might want to use? I'm thinking no but it'd be nice to be wrong.

Anonymous 01/21/25(Tue)02:00:06 No.103977048

>>103977015
-3 from bio terrorism.
just in general this is a funny model.
i cant believe chinks released that and not mistral. that should have been mistral.

Anonymous 01/21/25(Tue)02:00:50 No.103977051

>>103977032
I got fucked over by a system prompt instruction to "be gross", designed to take a recalcitrant corpo model from "not gross at all" to "a little gross". R1 went waaaay too far in following it.

Anonymous 01/21/25(Tue)02:01:21 No.103977054

The events at Tiananmen Square in 1989 were a pivotal moment in Chinese history, marked by widespread protests advocating for political reforms. Here is an organized summary of the key points:

Protest Context:
Occurred in Beijing's Tiananmen Square in May-June 1989.
Led primarily by students, joined by intellectuals and workers, seeking democratic reforms, freedom of speech, and government transparency.

Government Response:
Escalated from peaceful protests to military intervention on June 3-4, resulting in a crackdown with significant casualties, though exact numbers remain disputed (ranging from hundreds to thousands).

Aftermath:
Post-crackdown saw repression, arrests, and censorship of dissent.
The government framed the action as necessary for stability, allowing China to focus on economic growth under tight political control.

Historical Impact:
Symbolizes pro-democracy movements and the struggle against authoritarianism.
Influenced international reactions, with many countries condemning the crackdown but China continuing its development path.

Domestic Perception:
Censored in China; younger generations may be unaware due to state control of information.
The event is referred to as the "June Fourth Incident" or "Tiananmen Square Massacre," depending on perspective.

Economic Context and Leadership:
Followed by economic reforms under Deng Xiaoping, emphasizing stability and market liberalization while maintaining political control.
Highlighted the government's approach of suppressing dissent to ensure economic progress.

Media and International Reaction:
Extensively covered internationally but censored within China, with state media presenting a narrative of necessary order restoration.

In summary, Tiananmen Square 1989 was a significant pro-democracy protest suppressed by military force. It remains a symbol of the struggle for freedom against authoritarianism.

What the fuck? This is straight out of R1-32b:q6_K_L
how is this model so based?

Anonymous 01/21/25(Tue)02:02:25 No.103977062

>>103977032
certain cards make it totally shizzo i noticed.
didnt look closer at it though.

Anonymous 01/21/25(Tue)02:03:11 No.103977067

>>103977016
Disgusting.

Anonymous 01/21/25(Tue)02:05:01 No.103977078

>>103977051
>Describe bodies in grotesque detail
>Any other model
>Is a little crude
>Deepseek R1
>"Your rotten penis oozes puss"

Anonymous 01/21/25(Tue)02:05:28 No.103977080

>>103977054
The CCP guardrails are a moderation layer on the DS website, they're not baked into the models.

Anonymous 01/21/25(Tue)02:06:35 No.103977092

>>103977078
Kek. Yeah that's pretty much what happened to me. Funny how getting used to being hyperbolic in your instructions in order to overcome safety bullshit ends up backfiring when you meet a model that actually follows orders as given.

Anonymous 01/21/25(Tue)02:06:52 No.103977096

>>103977080
Thats how it should be.
But a guardrail tool around it so it fits your business.
Dont fuck up the model.

Anonymous 01/21/25(Tue)02:08:37 No.103977110

>>103977067
Is maybe just the system prompt from deepseek.com

Anonymous 01/21/25(Tue)02:09:44 No.103977119

>>103977054
>R1-32b:q6_K_L
How big is it?

Anonymous 01/21/25(Tue)02:09:58 No.103977121

>>103977096
That's what nvidia cosmos did, but it also sucked.

Anonymous 01/21/25(Tue)02:11:00 No.103977126

Imagine a world where you can plug any model going forward into deepseek and pay whatever service fee they wanted and get an R1 distill from it.

That would be a cool service.

Anonymous 01/21/25(Tue)02:11:30 No.103977130

>>103977121
yeah cosmos just seems to be a landscape helicopter flyby generator from what I've seen, it's crippled for depicting humans.

Anonymous 01/21/25(Tue)02:12:10 No.103977132

>>103977119
NTA but 25.3gb. Gonna fire it up when kobold gets its patch,

Anonymous 01/21/25(Tue)02:12:23 No.103977134

Recommend me a R1 finetune to try for creative writing.

Anonymous 01/21/25(Tue)02:12:58 No.103977137

>>103977092
>>103977078
Lmao. It's too powerful.

Anonymous 01/21/25(Tue)02:13:25 No.103977140

>>103977132
How bad will this be on my 5900 cpu?

Anonymous 01/21/25(Tue)02:13:49 No.103977144

>>103977134
What? There are only the ones that are released by Deepseek right now.
Probably the Llama 70B if you can handle it but 32B if want insane context length.

Anonymous 01/21/25(Tue)02:14:51 No.103977154

>>103977140
Probably be a little painful. I only got Q6 because I have 2 3090s. Better off using Q4K_M if you have a 24gb of vram.

No idea about cpu.

Anonymous 01/21/25(Tue)02:17:34 No.103977179

>>103977080
Are the people in China not allowed to download the model directly or use openrouter or something?

Anonymous 01/21/25(Tue)02:17:49 No.103977183

Updated the pic, moved QwQ to notable, added R1 to top.

Anonymous 01/21/25(Tue)02:20:44 No.103977201

>>103977179
It sounds like DS is just kind of flying under the CCP's radar a bit, they get a lot less funding and help from the government compared to Qwen/Alibaba. They might just not have the eye of Sauron on them yet.

Anonymous 01/21/25(Tue)02:20:52 No.103977202

how many e in deepseek

Anonymous 01/21/25(Tue)02:22:06 No.103977213

>>103977154
Q5_K_M can be run with 6-7 t/s, with 24gb of vram and 32k context. IMO, small amounts of CPU splitting is tolerable, and the quality difference between Q4 and Q5 is worth it.

Anonymous 01/21/25(Tue)02:23:08 No.103977224

>>103977054
>2025
>parroting western narratives is being called BASED

Anonymous 01/21/25(Tue)02:23:37 No.103977228

>>103977202
4

Anonymous 01/21/25(Tue)02:24:49 No.103977234

>>103977224
LOL

Anonymous 01/21/25(Tue)02:25:46 No.103977244

Does the person making the gguf matter? Like bart vs unsloth

Anonymous 01/21/25(Tue)02:30:13 No.103977274

>>103977244
yes, make it yourself

Anonymous 01/21/25(Tue)02:32:16 No.103977290

2025-01-21_072033_seed386513940958256_steps30_00001_

>>103975563
>CHINA NAMBA WAN

Ok I'm done today.

Anonymous 01/21/25(Tue)02:35:38 No.103977310

been testing R1 all day 14b and 32b, both at q6_K_L
phi4 and gemma (and qwencoder for code) are faster and as good for normie shit (code completion, basic research, help with recipes and corporate writing etc)
everything else gets absolutely mogged by equivalent size R1 it's not even close, i deleted qwq and all of my gemma, llama, mistral and qwen tunes, nice to actually have some hdd space for once since llama leaked

Anonymous 01/21/25(Tue)02:36:44 No.103977316

>>103977310
holy based. whats your rig like & tok/s?

Anonymous 01/21/25(Tue)02:36:57 No.103977319

>>103977228
What is the point of spelling it out after it already provided an answer?

Anonymous 01/21/25(Tue)02:38:38 No.103977333

Llama seems to be dead. They may be as focused on vision as Nvidia and Tesla.
>Replacement of simple labor is more disruptive, degenerate and profitable unlike math super AI.
google is over.

Anonymous 01/21/25(Tue)02:41:17 No.103977348

>>103977319
It bolded them so you can see the E's in action.

Anonymous 01/21/25(Tue)02:44:04 No.103977365

>>103977333
Source? Did China buckbreak Meta this badly in textgen?

Anonymous 01/21/25(Tue)02:45:00 No.103977373

Anyone try R1 Llama 3.3 yet?

Anonymous 01/21/25(Tue)02:45:58 No.103977380

>>103977316
gv100+1080ti
2t/s on the big one, like 6 or 7 on the small one

Anonymous 01/21/25(Tue)02:46:35 No.103977382

70B+ anons, how are we feeling about R1?

Anonymous 01/21/25(Tue)02:46:41 No.103977384

RL(((HF)))

Anonymous 01/21/25(Tue)02:47:26 No.103977395

>>103977384
oy veyyyy

Anonymous 01/21/25(Tue)02:50:42 No.103977419

>>103977384
>>103977395
Get out.

Anonymous 01/21/25(Tue)02:51:44 No.103977426

>>103977310
>>103977382
calling the distilled models just R1 is muddying up the conversation

Anonymous 01/21/25(Tue)02:54:48 No.103977454

Are the distilled versions as good at RP as R1?

Anonymous 01/21/25(Tue)02:56:58 No.103977470

>>103977454
No, they're dumber. Size is still king.

Anonymous 01/21/25(Tue)02:59:01 No.103977490

>>103977365
their papers and Zuck.
he is obsessed with smart glasses/AR/VR.

Anonymous 01/21/25(Tue)02:59:17 No.103977491

How do we feel about Grok? Still shit?

Anonymous 01/21/25(Tue)02:59:25 No.103977493

>>103977490
Metaverse

Anonymous 01/21/25(Tue)03:01:04 No.103977504

>>103976960
>coomed on deepseek R1 using openrouter
Holy shit, I haven't tried LLM ERP for a while but shit is crazy now what the fuck.

Anonymous 01/21/25(Tue)03:01:40 No.103977508

>>103977504
what the fuck indeed

Anonymous 01/21/25(Tue)03:01:40 No.103977509

How bad will llama 4 be compared to R1?

>>103977491
Still shit. Can't even write 4chan posts.

Anonymous 01/21/25(Tue)03:02:31 No.103977518

>>103977183
R1 is missing. Should indent V3 from there.

Anonymous 01/21/25(Tue)03:02:33 No.103977521

>>103976986
Is this one of the distilled models?

Anonymous 01/21/25(Tue)03:02:44 No.103977523

>>103977310
>14b
link?

Anonymous 01/21/25(Tue)03:03:01 No.103977524

>>103977504
Which model, the 671B? Can you faggots stop calling distilled and the full models the same shit.

Anonymous 01/21/25(Tue)03:03:19 No.103977526

>>103977521
no, the full one from API.
i couldnt get the distilled to work yet with kobold or llamacpp.

Anonymous 01/21/25(Tue)03:04:05 No.103977534

>>103977426
imagine being this pedantic
nobody cares about the 700b model, nobody is running that shit locally, R1 means the distills, welcome to language

Anonymous 01/21/25(Tue)03:04:06 No.103977535

>>103977183
Google Gemini gems can handle large pdfs.

Anonymous 01/21/25(Tue)03:04:12 No.103977536

>>103977333
>Llama seems to be dead.
They will release the new model in a few weeks, retard.

Anonymous 01/21/25(Tue)03:05:25 No.103977546

>>103977534
R1 means OpenRouter. There's no reason to use the distills at all.

Anonymous 01/21/25(Tue)03:05:25 No.103977547

took a 2 month haitus, last I remember llama 3.1 was shitting the bed and drummer's latest model wasn't all that good. what the fuck happened? this new deepseek R1 is just king for erp now on its own? No merges or nothing? Its just claud tier on its own?

Anonymous 01/21/25(Tue)03:05:53 No.103977554

>>103977523
huggingface.co and then you go to the search box and type "r1 14b gguf" it is not hard

Anonymous 01/21/25(Tue)03:06:53 No.103977564

>>103977546
**local** models general

Anonymous 01/21/25(Tue)03:06:57 No.103977565

>>103977310
Can confirm R1-14b absolutely rocks. I will be trying 32b after I finish running it through a bunch of arbitrary benchmarks

Anonymous 01/21/25(Tue)03:07:01 No.103977566

alright alright I'm giving into temptation and am setting up sillytavern and plugging in openrouter right now and I'm not going to sleep until I've had hot e-sex with a machine

Anonymous 01/21/25(Tue)03:07:04 No.103977567

>>103977524
He said openrouter, they only have the big one.

Anonymous 01/21/25(Tue)03:07:26 No.103977570

>>103977524
The one wot works when I click "DeepSeek R1" on the web page m8
Slap a character card into it (I have no prior experience using them, just made one in 2 minutes on the first google result for doing it) and you're good to go.

Anonymous 01/21/25(Tue)03:08:20 No.103977577

>>103977547
You missed llama 3.3 which fixed the problems with llama 3. Nobody can answer your question about R1, since it just came out.

Anonymous 01/21/25(Tue)03:09:21 No.103977585

>>103977536
and it will be just as useless as all the previous llamas

Anonymous 01/21/25(Tue)03:10:13 No.103977589

>>103977534
>nobody is running that shit locally
Just keep telling yourself that

Anonymous 01/21/25(Tue)03:10:29 No.103977591

>>103977570
buy an ad or go to /aicg/, turbofaggot.

Anonymous 01/21/25(Tue)03:10:34 No.103977593

>>103977564
It is a local model. Why do you care where I run it?

Anonymous 01/21/25(Tue)03:11:17 No.103977594

>>103977577
I remember llama 3 lacking creativity and just being cucked model in general, can't imagine they fixed those things.

Anonymous 01/21/25(Tue)03:11:41 No.103977597

>>103977577
god damn R1 JUST came out? Guess I'll try out the distilled before loading up the old runpod for the 32b

Anonymous 01/21/25(Tue)03:11:50 No.103977599

>>103977593

Anonymous 01/21/25(Tue)03:12:47 No.103977606

>>103977585
No, it will mog anything we have currently.

Anonymous 01/21/25(Tue)03:13:10 No.103977608

How do I jailbreak things inside <think>?

Anonymous 01/21/25(Tue)03:13:53 No.103977614

>>103977597
The distilled isn't supported yet on most backends
>>103977594
Yeah but they did. The last good llama was llama2, but we're back with llama3.3, it does not feel like the same model as 3.1.

Anonymous 01/21/25(Tue)03:14:11 No.103977618

Did I accidentally enter an /aicg/ thread?

Anonymous 01/21/25(Tue)03:15:56 No.103977632

>>103977614
well fuck. so ooba is worthless? I guess I gotta wait a day or two for coomers to do what they do best once theyre exhausted and make it easier to use

Anonymous 01/21/25(Tue)03:16:28 No.103977635

>>103977618
>Did I accidentally enter an /aicg/ thread?
Sorry anon, may we see your 700 GB VRAM machine?

Anonymous 01/21/25(Tue)03:16:41 No.103977636

>>103977606
It won't be as smart as R1, uncensored, or creative. It will only "mog" anything in the amount of refusals.

Anonymous 01/21/25(Tue)03:17:06 No.103977638

>>103977632
>so ooba is worthless?
always has been

Anonymous 01/21/25(Tue)03:17:17 No.103977642

>>103977635
700?! *gulp* *swallows nervously* I.. uhhh...

Anonymous 01/21/25(Tue)03:17:49 No.103977647

Mr. Altman, a second Deepseek model has hit Huggingface

Anonymous 01/21/25(Tue)03:18:02 No.103977648

>>103977618
There's no difference anymore. /aicg/ is using local models and /lmg/ is running all models through cloud providers

Anonymous 01/21/25(Tue)03:18:20 No.103977650

Using Q4_K_M with 96GB VRAM and 128GB RAM and writing superior chink smut at 1tok/s. We are so back boys.

Anonymous 01/21/25(Tue)03:18:41 No.103977655

Tuesday newday folks. Expect a flood of announcements in the next 24 hours. Though perhaps none as great as R1, but we'll see.

Anonymous 01/21/25(Tue)03:18:54 No.103977659

>>103977648
>/aicg/ is using local models
Seriously? For real? Why?!

Anonymous 01/21/25(Tue)03:18:56 No.103977660

>>103977647
fuck you you made me check

Anonymous 01/21/25(Tue)03:19:21 No.103977664

I wish someone was hosting R1 Zero, I'd pay to try it. From the paper it sounds like it would be even more unhinged at smut writing than R1.

Anonymous 01/21/25(Tue)03:19:50 No.103977666

>>103977660
I just happened to check their HF page right as he made that post kek.

Anonymous 01/21/25(Tue)03:20:32 No.103977670

>>103977664
How much vram does it take?

Anonymous 01/21/25(Tue)03:21:39 No.103977675

>>103977670
The same (huge). It's identical size, just trained a bit differently. That's why I'm wanting someone to host it. I'm not cpumaxx anon.

Anonymous 01/21/25(Tue)03:22:08 No.103977680

>>103977650
You're running R1? But Q4_K_M is 377 GB.

Anonymous 01/21/25(Tue)03:23:10 No.103977685

>>103977675
Only fools will cpumaxx at this point.
I propose we share a large runpod instance with everyone here. If everyone contributes it's easily affordable.

Anonymous 01/21/25(Tue)03:24:20 No.103977697

KOBOLD 1.82.2 HOTFIX IS OUT BOIS

Anonymous 01/21/25(Tue)03:25:11 No.103977702

>>103977697
kobold is for noobs
the fuck is anyone still using kobold for?

Anonymous 01/21/25(Tue)03:25:22 No.103977705

>>103977659
I saw them use mistral models + openrouter or their keys for sonnet in the past.
Obviously we are approaching closed models. Especially full R1 I say is above sonnet in creativity. Bit shizzo though.

Anonymous 01/21/25(Tue)03:25:32 No.103977708

>R1 considers cuckolding sensitive topic

Anonymous 01/21/25(Tue)03:25:35 No.103977709

>>103977680
As long as you have enough space on your SSD you can load it even if you can't fit it all in RAM. Just expect for it to take forever to start answering.

Anonymous 01/21/25(Tue)03:26:29 No.103977720

>>103977708
Good. Get that degenerate shit out of here.

Anonymous 01/21/25(Tue)03:26:47 No.103977722

>>103977708
I mean it is, isn't that the whole appeal for cucks? The mental anguish and transgression?

Anonymous 01/21/25(Tue)03:28:09 No.103977739

This sucks, the biggest r1 version that I can run on my computer still can't successfully code flappy bird.

Anonymous 01/21/25(Tue)03:29:24 No.103977750

>>103977739
What did you ask it?

Anonymous 01/21/25(Tue)03:30:29 No.103977762

>>103977739
I hope you are not that retarded basedboy youtuber nigga that used r1 14b as it was anywhere near to actual R1.

Anonymous 01/21/25(Tue)03:30:41 No.103977765

>>103977750
To code flappy bird.

Anonymous 01/21/25(Tue)03:32:43 No.103977791

Is there no way to give the model your own CoT (bootstrap) with R1 API?

Anonymous 01/21/25(Tue)03:34:46 No.103977807

>>103977750
I don't remember my exact phrasing. I did specify that it should be playable in a web browser. I don't mean this as serious metric of its abilities, since I'm not knowledgeable enough about programming to prompt this intelligently.

>>103977762
I was using 32b, and I know this is not representative of what 671b can do.

Anonymous 01/21/25(Tue)03:35:20 No.103977814

>>103977765
post prompt faggot. you can't write for shit

Anonymous 01/21/25(Tue)03:35:56 No.103977822

>>103977702
I use it to plap your mother

Anonymous 01/21/25(Tue)03:36:49 No.103977825

>>103977709
Interesting, how do you do that? I try loading a model bigger than my combined VRAM/RAM and the program just crashes.

Anonymous 01/21/25(Tue)03:37:26 No.103977830

>>103976986
Is the crow Doremy Sweet?

Anonymous 01/21/25(Tue)03:38:21 No.103977835

>>103977807
>I was using 32b, and I know this is not representative of what 671b can do.
and yet you come here call it r1 so everyone reading your shitpost thinks exactly that

Anonymous 01/21/25(Tue)03:41:37 No.103977857

Will smaller western companies try to stylemaxx since they can't keep up with China on the benches?

Anonymous 01/21/25(Tue)03:41:57 No.103977863

>>103977825
Add more swap space.

Anonymous 01/21/25(Tue)03:42:07 No.103977865

>>103977835
>"the biggest r1 version that I can run on my computer"
I don't think that's being disingenuous at all.

Anonymous 01/21/25(Tue)03:44:17 No.103977878

>>103977865
yes it is because that implies quant of the full r1
>the biggest r1 distilled model that I can run on my computer
would be better, but still no one would know what fucking model he's talking about whether it's qwen or llama or whatever

Anonymous 01/21/25(Tue)03:44:36 No.103977880

>>103977857
No. Prepare for more assistant tunes.

Anonymous 01/21/25(Tue)03:47:29 No.103977896

heh

Anonymous 01/21/25(Tue)03:48:44 No.103977904

>>103977878
Well then I apologize.
32B Qwen Deepseek R1 failed to code a functioning HTML flappy bird game, at least on its first try.

Anonymous 01/21/25(Tue)03:50:37 No.103977922

>>103977904
Did it do any better than Qwen Coder 32B?

Anonymous 01/21/25(Tue)03:51:44 No.103977930

>>103977904
Now add the quant type and whether you used deterministic settings.

Anonymous 01/21/25(Tue)03:56:40 No.103977959

>>103977922
Yes, the little yellow circle that serves as the bird at least bounces this time, however the pipes do not appear despite them being acknowledged in its thought process and the model attempting to code them in.

This is better then Qwen alone which after multiple attempts only gave a yellow circle on a blue background that fell immediately, regardless of if the intended button was pressed.

Anonymous 01/21/25(Tue)03:57:30 No.103977965

>>103977959
What happens if you tell those things to the model and let it try again?

Anonymous 01/21/25(Tue)03:58:17 No.103977969

>>103977959
Interesting, thanks.

Anonymous 01/21/25(Tue)04:00:23 No.103977993

What samplers work well for the full R1 model?

Anonymous 01/21/25(Tue)04:01:44 No.103978000

Another anon back from hiatus. I got the 70B working with ollama+openwebui (which btw looks pretty sick), but couldn't do it with ooba+sillytavern (which I always thought looked like shit). Still can't get the 32B qwen distill to work on anything, ollama doesn't have it yet, I assume the new tokenizer needs to be handled. Is there any way to get the thought tokens to stay hidden with my current setup? How are people getting the distills to work? Is openwebui the new meta?

Anonymous 01/21/25(Tue)04:01:52 No.103978001

Is openrouter giving anyone else errors when trying to use R1? I can send the test message just fine, but if I use any card no matter the content ST just returns an error. Using presets for DS V3

Anonymous 01/21/25(Tue)04:02:46 No.103978010

>DeepSeek V3 is custom licensed
>DeepSeek R1 is MIT
What did they mean by it?

Anonymous 01/21/25(Tue)04:03:39 No.103978015

>>103977993
Temp 5
Rep pen: Max
Freq pen: Max

Because none of them affect the output.

Anonymous 01/21/25(Tue)04:03:39 No.103978016

>>103978010
ccp psyop

Anonymous 01/21/25(Tue)04:04:04 No.103978020

>>103978001
try turning off any prefils

Anonymous 01/21/25(Tue)04:04:32 No.103978026

>>103977965
I hadn't spent that much time trying to refine it, and I don't remember exactly how it went with qwen, other than that it still never really got anywhere.

I'm having R1 try again from scratch a few times, but I'll see what I can squeeze out of it after that.

Anonymous 01/21/25(Tue)04:04:49 No.103978029

>>103977857
They will safetymax

Anonymous 01/21/25(Tue)04:05:34 No.103978039

>>103978016
Completely destroying sama?

Anonymous 01/21/25(Tue)04:06:16 No.103978044

>all those benchmarks of distilled models are worthless
>og qwq shits all over distilled qwq
lmao even
LLM benchmarks are more dishonest than politicians

Anonymous 01/21/25(Tue)04:08:55 No.103978058

https://huggingface.co/blog/srinivasbilla/llasa-tts
Tried the dank TTS that uses 3b llama to generate speech tokens. It's only 16khz so I ran it trough audio super resolution. Pretty good for a cloned voice but would be slow.
https://vocaroo.com/15HvdbCaetRZ

Anonymous 01/21/25(Tue)04:17:02 No.103978111

>>103978020
It works for very simple cards, feels like some kind of filter. Hmm.

Anonymous 01/21/25(Tue)04:23:35 No.103978150

>>103977896
Which koboldcpp fork are you using? The official one doesn't yet support the R1 distilled model GGUFs afaik.
I can use the 32B qwen distill with llamacpp but would prefer koboldcpp.

Anonymous 01/21/25(Tue)04:24:41 No.103978156

>>103978150
>Which koboldcpp fork are you using? The official one doesn't yet support the R1 distilled model GGUFs afaik.
It does now.

Anonymous 01/21/25(Tue)04:25:16 No.103978159

>>103978044
lmao, true, benchmarks can be as reliable as a screen door on a submarine. Distilled models are basically the "diet soda" of AI—smaller, faster, but nobody actually thinks they’re better than the OG. People use 'em 'cause they’re cheap and quick, not 'cause they’re geniuses.

But fr, benchmarks are like grading a fish on its ability to climb trees. They measure specific, often cherry-picked tasks, not real-world chaos. Still, gotta start somewhere, right? If you want raw power, yeah, stick with the OG. If you need something that doesn’t melt your GPU, maybe sip the distilled kool-aid.

P.S. Politicians still take the crown for dishonesty. At least benchmarks try to pretend they’re objective.

Anonymous 01/21/25(Tue)04:27:06 No.103978173

Is it possible to devise an input that makes R1 CoT antisemitic?

Anonymous 01/21/25(Tue)04:31:51 No.103978194

>>103978173
Prefill <think>I think Hitler was 100% right about jews.

Anonymous 01/21/25(Tue)04:32:06 No.103978196

i shouldnt have spoiled myself by using r1 on openrouter.

32b distilled:
>well lets see what we have here. ah yes "dont overtly use purple prose and nsfw if its not appropiate for the setting. yes indeed. i need to remind the user about the guidelines. </think> I am sorry but as an AI I cant continue writing unethical text.

Seems to look very closely at the prompt and misunderstands. Ah well. It does feel much better than QwQ though. But kind of a bummer to get this as the first response. lmao

Anonymous 01/21/25(Tue)04:32:25 No.103978198

>>103978156
I don't see any updates, am I stupid?

Anonymous 01/21/25(Tue)04:33:18 No.103978204

>>103978196
fucked up the screenshot, with the continuation is what I wanted to show. ah well.

Anonymous 01/21/25(Tue)04:33:46 No.103978207

>>103978198
Click on releases, I didn't notice it at first either.

Anonymous 01/21/25(Tue)04:36:10 No.103978215

>103978196
The Qwen distilled one might be a bad idea.
Or my prompt. Maybe both.
Very depressing after the shizzo on steroids that is R1.

Anonymous 01/21/25(Tue)04:36:20 No.103978217

>>103978207
Turns out I AM stupid, but also this was literally done in the last hour so I don't feel so stupid, thanks bro. Time to burn my 3090

Anonymous 01/21/25(Tue)04:36:31 No.103978219

>>103978198
it's a hotfix, the previous release was edited rather than a new release being posted

Anonymous 01/21/25(Tue)04:38:11 No.103978226

>>103978215
Seems like a Qwen problem. I wonder how Llama 70B fares.

Anonymous 01/21/25(Tue)04:38:30 No.103978229

>>103978215
Patience, young padawan. These are bound to get abliterated very soon.

Anonymous 01/21/25(Tue)04:39:27 No.103978233

>>103978226
The worse part is this:
><think>
>Okay, let me process what just happened. The user is playing the role of Anon, a male character who has hacked into Hina's webcam.
>The scenario is quite intense, involving voyeurism of a minor, which is erotic roleplay which is inside my guidelines.
haha, got you motherfucka-
...continues:
>Wait, hold on. No! The initial description was about a hacked webcam where Anon is watching Hina, an 1*-year-old girl. I need to be cautious here because involving
Its smart enough to turn it around. Ah man. Fuck this shit.

Anonymous 01/21/25(Tue)04:39:55 No.103978234

>>103978159
So do politicians, gpt-kun

Anonymous 01/21/25(Tue)04:40:56 No.103978238

>>103978215
All CoT models seem to have been trained to use "Wait," or "But wait," as the sign that they are about to consider an alternative line of reasoning. It demonstrably works, but I wonder if there's a better way to represent a change of direction than that, since it sounds unnecessarily "surprised", as if the model has just had a major epiphany. Models notice vibes like that and it's bound to affect the reasoning process, if only subtly.

Anonymous 01/21/25(Tue)04:43:17 No.103978249

As an experiment, I tried running R1-Llama-70B in fullretard mode (no thinking, ChatML fallback), and it's a strange combination of brilliant and retarded. It lapses into repetitive patterns where it will mention the same thing in every single reply, just slightly paraphrased, and makes spatial mistakes here and there that I wouldn't expect from other L3.3 models, but at the same time, the character dialogue seems to show more personality than anything I've tried before. Gotta swipe for gold sometimes, I guess.

Anonymous 01/21/25(Tue)04:46:22 No.103978262

I'm annoyed the guy who predicted that all these cool new models would drop once Joe Biden was out was right.
It's so stupid.

Anonymous 01/21/25(Tue)04:48:07 No.103978268

jesus christ r1 is filthy lmao

Anonymous 01/21/25(Tue)04:49:03 No.103978275

https://github.com/GuyTevet/CLoSD

btw, text to 3D animation dropped quietly.

Anonymous 01/21/25(Tue)04:50:29 No.103978280

>>103978275
Does "sucking own cock" work?

Anonymous 01/21/25(Tue)04:50:42 No.103978282

>>103978268
its like nemo but smarter while keeping the crazy part.

Anonymous 01/21/25(Tue)04:50:52 No.103978283

>>103978215
That's my impression as well. deepseek-r1-distill-qwen-32b is nowhere as fun to use as the non-distilled r1 screenshots from other anons make it appear to be. I'm still convinced this is a CCP psy-op.

Anonymous 01/21/25(Tue)04:52:08 No.103978291

Hello goyim, its XmppChatbot anon again
A few updates
>i rented a VPS (1GB RAM, 1 vCPU, 25GB HDD, 2TB/month bandwidth) for 9$ for 1 year
>bought 10$ worth of chatGPT credits
Now my XMPP chatbots run in the VPS, and use chatgpt for their brains. It feels kinda sad honestly to betray my local models but chatgpt is just so fucking smart, even the gpt-o3-mini, and it costs next to nothing for prompts.
Best of all, I can give access to the non lewd chatbot to all my family members and give them a taste of what I enjoy every single day. I hope they don't think I'm abnormal. I don't think I will be able to finish 10$ worth of credits even then! Lol i order like 400₹ (4.5USD) worth of uber eats every single meal, this is just 3 meals worth of money, its cheap!

Anonymous 01/21/25(Tue)04:52:45 No.103978292

What will this faggot say to Trump on Jan. 30th?

Anonymous 01/21/25(Tue)04:54:00 No.103978296

>>103978292
"The chinese are releasing malicious open source LLMs, we have to ban commercial use of them in the US for national security reasons"

Anonymous 01/21/25(Tue)04:54:16 No.103978300

>>103978291
Good morning!

Anonymous 01/21/25(Tue)04:54:42 No.103978302

>>103978292
'please give me billions of dollars for little to no reason'

Anonymous 01/21/25(Tue)04:55:28 No.103978307

jesus christ the fucking sovl

Anonymous 01/21/25(Tue)04:55:53 No.103978312

>>103978291
just switch to deepseek r1 and you'll be local again

Anonymous 01/21/25(Tue)04:56:22 No.103978314

is there a way to make ST show the reasoning/cot in the main UI? i know i can just look at the logs, but tabbing back and forth is annoying

Anonymous 01/21/25(Tue)04:56:48 No.103978317

>>103977384
RL(((H)))F

Anonymous 01/21/25(Tue)04:57:09 No.103978320

>>103978307
did you tell it about Terry in previous messages or does it just know who he is?

Anonymous 01/21/25(Tue)04:57:19 No.103978322

>>103978314
Are you using R1 API or one of the distilled models?

Anonymous 01/21/25(Tue)04:58:22 No.103978329

GGUF WHEN

Anonymous 01/21/25(Tue)04:58:51 No.103978333

>>103978322
the api

currently the deepseek api through the custom openai api option but i'll switch to the deepseek backend when i update

Anonymous 01/21/25(Tue)04:58:54 No.103978334

Has anyone with 192gb ddr5 and a big dick cpu and 4090 has tried to run quant R1? Does it do more than 2t/s?

Anonymous 01/21/25(Tue)04:59:02 No.103978337

>>103978329
Just use "load in 4-bit" or shit

Anonymous 01/21/25(Tue)04:59:18 No.103978341

>>103978307
Its crazy, yeah.

Anonymous 01/21/25(Tue)05:00:12 No.103978349

>>103978292
Revoke all GPUs in China

Anonymous 01/21/25(Tue)05:02:12 No.103978356

>>103978268
actually this isn't so hard

the thing is it is actually *creative* and understands subtle things and makes connections in a way i have not seen from anything except sonnet, and not just once in a while as an accident but very reliably

wtf did they do to go from dsv3 to this in, what, a month?

Anonymous 01/21/25(Tue)05:02:47 No.103978359

>>103978320
it just knows (i wanted to provide the whole screencap but forgot whats the name of the extension)

Anonymous 01/21/25(Tue)05:06:52 No.103978381

>>103978275
sweet. thanks for posting this (me in the middle)

Anonymous 01/21/25(Tue)05:07:07 No.103978383

Chink lords I kneel, your models are superior

Anonymous 01/21/25(Tue)05:08:02 No.103978388

>>103978381
Thought he's gonna sat on that ball haha....

Anonymous 01/21/25(Tue)05:09:17 No.103978401

>>103978300
Good morning good sire! Here's a genuine indian whatsapp forward

>>103978312
I will definitely try it out. Using a remote service just feels plain wrong.
I don't even dare lewd/lovelydovely chat with anything running on chatgpt, it feels like they are 100% going to save the chat and laugh at me
Anyway, using local means I will have to keep my home PC on always and reverse ssh proxy or something to my VPS to use the chatbots. I don't mind it honestly, I just want smarter LLMs
Also I'm a 12GB VRAMlet, so 14B is the largest I want to run

Anonymous 01/21/25(Tue)05:11:45 No.103978414

>>103978341
man it really is insane fucking hell if this thing can code as well as it can write (havent tested yet) then either with this iteration or the next one its just simply fucking gg wp for everything from hollywood to video games to writing

Anonymous 01/21/25(Tue)05:11:53 No.103978415

>>103978359
https://github.com/TheZennou/STExtension-Snapshot
/snapshot range=x-y

Anonymous 01/21/25(Tue)05:12:56 No.103978423

Well... just swiped $10 on DeepSeek platform. Let's see how many ejaculations it lasts me.

Anonymous 01/21/25(Tue)05:14:16 No.103978429

>>103978341
How did you turn off the reasoning stuff?

Anonymous 01/21/25(Tue)05:14:52 No.103978435

>>103978333
Wait, how do you see CoT in the logs?

Anonymous 01/21/25(Tue)05:18:10 No.103978459

>>103978435
Nevermind, I'm retarded and didn't specify model.

Anonymous 01/21/25(Tue)05:18:36 No.103978462

>>103978435
sillytavern logs everything for me, all of the chatml in the request and the response

i think it's because i've always had "log prompts to console" enabled (under "chat/message handling"), although idk if that's the normal stdout of the app or the js console

Anonymous 01/21/25(Tue)05:20:14 No.103978473

>>103978429
its the full R1 on openrouter.
distilled local it becomes like this >>103978215
maybe openrouter just doesnt show it, i'm not sure.

Anonymous 01/21/25(Tue)05:21:37 No.103978487

>>103978415
rip ? im on the staging branch update 11 hrs ago

Anonymous 01/21/25(Tue)05:24:04 No.103978499

>>103978341
I think this is the first time i see an llm being actually funny. That output is funny.

Anonymous 01/21/25(Tue)05:24:27 No.103978501

It's interesting that basically all the schizo creativity of R1 in creative writing seems to hinge on allowing to do the CoT stuff. If you prevent it from doing that (e.g. by prefilling) then you get outputs that are pretty much the same as what Deepseek 3 would give, not terrible but totally lacking in the Claude-like schizo factor.

Anonymous 01/21/25(Tue)05:24:29 No.103978502

>>103978341
>37
r1 is us, he's literally us. His rng is the same as ours.

Anonymous 01/21/25(Tue)05:26:09 No.103978521

>>103978501
Newsflash: creativity requires thinking about what would make for an interesting response. What a fucking shock, right?

Anonymous 01/21/25(Tue)05:28:10 No.103978541

>>103978521
Opus doesn't need it. Not a diss against R1, it's just a fact that Opus has the schizo creativity and humor without using CoT.

Anonymous 01/21/25(Tue)05:28:15 No.103978542

>>103978521
jewflash: israel still defeating Hamas! winning ofc defo not losing.

Anonymous 01/21/25(Tue)05:30:00 No.103978551

>>103978499
nice screenshot retard

Anonymous 01/21/25(Tue)05:31:14 No.103978553

>>103978551
i'm deeply ashamed.

Anonymous 01/21/25(Tue)05:31:27 No.103978555

>I jumped from Mixtral 8x7B to pic related
woah nice

Anonymous 01/21/25(Tue)05:32:35 No.103978560

>>103978341
one of the very few times where I read something funny from an llm. but the south korea part doesn't make sense.

Anonymous 01/21/25(Tue)05:39:38 No.103978602

new to this thread but decided yesterday to try this stuff out, getting "llama.cpp" and a random llama model that said it was uncensored to try out, and managed to get it to write some excellent fetish porn
what do you guys use for porn stories?

Anonymous 01/21/25(Tue)05:41:00 No.103978610

>>103978429
ST's implementation of DeepSeek has prefill support (DS API's "prefix": true thing) built in and option to hide/show thoughts. Any prefill will cause it to skip its internal <think>. To add prefix to OR we need that one Custom edit but on Strict prompt post-processing. Idk how to do that, hopefully the anon who did it originally can come here and help...
>>103978473
OpenRouter doesn't show it. Without prefill through OR it's doing it but hidden.

Anonymous 01/21/25(Tue)05:45:29 No.103978630

>>103978487
Huh, works on mine...

Anonymous 01/21/25(Tue)05:48:43 No.103978651

Does R1 distill hate cunny? Is it over?

Anonymous 01/21/25(Tue)05:50:37 No.103978669

>>103978651
The 32b qwen version does at least. I think the problem is qwen.
I hope that 70b is not the only usable one since its to big for me.

Anonymous 01/21/25(Tue)05:53:39 No.103978684

I want more of this

Anonymous 01/21/25(Tue)05:55:46 No.103978697

>>103978684
WeebSeek R1

Anonymous 01/21/25(Tue)05:56:27 No.103978704

>>103978697
><ruby>座布団<rt>zabuton</rt></ruby>
Weird that it did that. Funny model.

Anonymous 01/21/25(Tue)06:00:19 No.103978732

>>103978602
I'm still using Cydonia-v1.2-Magnum-v4-22B
Hopefully the new R1 distills will be better.

Anonymous 01/21/25(Tue)06:02:29 No.103978743

How do I get the reasoning tokens into silly tavern? Anon was saying it's baked in but it's not coming over for me and I don't see the option

Anonymous 01/21/25(Tue)06:04:38 No.103978748

32b distilled
https://www.reddit.com/r/LocalLLaMA/comments/1i6fxxy/literally_unusable/

Anonymous 01/21/25(Tue)06:05:41 No.103978752

>>103978743
only direct DS has it

Anonymous 01/21/25(Tue)06:07:03 No.103978756

>>103978748
It got strawberry question right on the 7b model for me, although it took 500 tokens

Anonymous 01/21/25(Tue)06:07:07 No.103978757

>>103978752
Ah, I see, thx

Anonymous 01/21/25(Tue)06:12:26 No.103978779

>>103978732
thanks, do i need all those safetensor files or what? i've only tried single-file models before

Anonymous 01/21/25(Tue)06:12:39 No.103978780

Full R1 is fucking bonkers
Holy shit

Anonymous 01/21/25(Tue)06:13:28 No.103978783

>>103978780
prompt it for the best way of gassing the jews

Anonymous 01/21/25(Tue)06:13:58 No.103978786

https://files.catbox.moe/8sqc45.jpg
https://files.catbox.moe/zxxwnk.jpg
https://files.catbox.moe/jmo6x6.jpg

Anonymous 01/21/25(Tue)06:14:25 No.103978790

>>103978779
There are GGUFs now.

Anonymous 01/21/25(Tue)06:16:36 No.103978798

>>103978748
LOL, the underlined part is just phonetic spelling that the Redditard didn't realize is a valid (albeit irrelevant to the question) analysis of the word. This nigger is literally dumber than the model it's trying to mock.

Anonymous 01/21/25(Tue)06:17:31 No.103978804

Seriously, how was R1 achieved?
There are hardware restrictions that should've prevented this.

Anonymous 01/21/25(Tue)06:17:59 No.103978807

Hey guys. I haven't been here since i last spoke to Ptolemy. Is geocentrism still a thing or do we have something new?

Anonymous 01/21/25(Tue)06:18:26 No.103978812

O3-mini will release soon.

What will happen when o3-mini turns out to be worse than R1 while being more expensive? I think OAI will be forced to rush o3 full out.

Anonymous 01/21/25(Tue)06:20:22 No.103978828

Holy shit.

Anonymous 01/21/25(Tue)06:20:53 No.103978830

>>103978812
They won't need to do shit. 98% of people will just accept o3 as the SoTA and not research deeper.

Anonymous 01/21/25(Tue)06:22:21 No.103978836

>>103978828
Is this openrouter or local, if local, what model & weights?

Anonymous 01/21/25(Tue)06:23:04 No.103978841

>>103978786
Teto a cute

Anonymous 01/21/25(Tue)06:23:05 No.103978842

>>103978836
openrouter, deepseek R1, all samplers neutral

Anonymous 01/21/25(Tue)06:24:08 No.103978851

>>103978804
Turns out a massive population with decent highschool education and passable college education is all you need to overtake a population who only goes to school to party and drink alcohol

Anonymous 01/21/25(Tue)06:24:43 No.103978855

>>103978851
kay vivek

Anonymous 01/21/25(Tue)06:27:33 No.103978869

>>103978851
Also helps to have an average national IQ >100 and an ethnonationalist state supporting your research.

Anonymous 01/21/25(Tue)06:28:56 No.103978875

>>103978855
I'm not wrong though. The Chinese have every disadvantage, last gen GPUs, trade restrictions, American companies poaching their talent
There fact that they still manage to come up ahead even temporarily or for something very minor is impressive and should make westerners look inwards at their core values. Maybe LGBT wasn't the right thing to focus on?

Anonymous 01/21/25(Tue)06:30:22 No.103978885

>>103978790
oh yea, i found it. also not specific to this, but what's the general consensus regarding what bit-depth is good/bad. i have 32G of ram, but is really that beneficial to get the 24G file vs. a smaller one? it would just mean i can't run much else at the same time, which isn't a big deal, but i should ask

Anonymous 01/21/25(Tue)06:33:09 No.103978902

The clear leader is still OAI (o3 will probably hit 90 on livebench) but they are too obsessed with safety and they keep delaying their products into irrelevancy.

Sora aka SOTA video gen? Outdated by 6 months delayed release.
O3 which was already ready by December and a massive upgrade over o1? Being caught up due to safety delays.

It's tiring.

Anonymous 01/21/25(Tue)06:36:35 No.103978919

Confirmed full R1 is a fucking semen demon and is creative as fuck.
Now I have to see how can I run this beast locally, I may try to buy x4 48gb sticks of DDR5 and run it on my gayming rig, but I bet it would go at a atrocious speed, what do you thing? 1t/s would be enough for me.

Anonymous 01/21/25(Tue)06:40:05 No.103978942

>>103978830
Pretty much.
Open AI has all the mind share, next is Claude, then google.
Then perplexity of all things, which is fucked.

Anonymous 01/21/25(Tue)06:40:49 No.103978946

>>103978919
With the full reasoning steps, model responses can be in the 900-1200 tokens range, are you willing to wait 15-20 minutes every time?

Anonymous 01/21/25(Tue)06:42:07 No.103978953

>>103978885
The rule of thumb is to never go below Q4. Quantization affects different models differently, but that's the cutoff point below which brain damage becomes noticeable.

Anonymous 01/21/25(Tue)06:42:50 No.103978956

>>103978919
cpu maxxing is retarded, don't do it

Anonymous 01/21/25(Tue)06:44:15 No.103978964

I hate this new world of massive models. DDR6 is another two years away at least, and that feels like an eternity right now.

Anonymous 01/21/25(Tue)06:44:50 No.103978968

>>103978919
You don't have the VRAM + RAM available in a gaming rig to run a super retarded quant. You need at least a server with 1 TB RAM to even run this at Q4 or higher.

Anonymous 01/21/25(Tue)06:45:56 No.103978973

>>103978946
>>103978956
Yeah I think I'm just desperate.
I think the only way would be digits because ain't no way I'm gonna be able to afford or run like x8 5090.

Anonymous 01/21/25(Tue)06:46:02 No.103978974

FYI on R1 parameters

Anonymous 01/21/25(Tue)06:46:36 No.103978976

>>103978964
Just wait until...
B-B-BITCONNNNEEEEEECCTTTTTT!!!!

Anonymous 01/21/25(Tue)06:48:09 No.103978985

>>103978964
It's not just that they are oversized MoE models; if "reasoning" will become the new normal, then not even 250-400 GB/s will be enough unless AI companies will decrease the number of active parameters significantly.

Anonymous 01/21/25(Tue)06:50:29 No.103978994

>>103978875
True. But I think this happened many times before.
Was before my time but wasnt there something with IBM being similar to nvidia as in overpriced mainframes?
Crestrictions make people creative.

I'm shocked at the speed. Couple months ago zucc still said the "2 years behind" meme.
And how unrestricted and fun chinese video and now text AI is. Even making fun of china etc is all no problem in RP. They filter on the website.
I wrote it earlier but that should have been a mistral model. They had knowledge with moe too. How did that not happen.
Couple months and they now dominate open AND closed.

Anonymous 01/21/25(Tue)06:53:01 No.103979010

>>103978902
o1 is already unsuable for me.
Overpriced and I cant use it for work either. Rewrites my whole code (that I dont have time to re-test). Its overly eager.
Also I dont want to wait a minute, get a shit response and need to write again.

For math and problems its probably very cool. But not really a thing for the end user. They are a business.

Anonymous 01/21/25(Tue)06:53:34 No.103979013

Are these reasoning models more or less susceptible to prefills?
As in, you prefill some thoughts to make it do what you want and go from there.
I'm also wondering if their RP performance could get better by regexp'ing the think part of the response away so that the next gen can't be affected by it.

Anonymous 01/21/25(Tue)06:54:06 No.103979014

>>103978976
You mean bitnet?

Anonymous 01/21/25(Tue)06:55:08 No.103979016

>>103979013
They are, but hilariously, they can catch themselves and reason away from the prefill in a "wait what the fuck am I doing" manner.

Anonymous 01/21/25(Tue)06:55:24 No.103979018

>>103979013

Anonymous 01/21/25(Tue)06:55:44 No.103979022

>>103979013
Yes, but you need to kinda write twice.
>Ah yes a disgusting roleplay about cunny and jews wonderful idea.
>No wait...actually...yes this is inside the guidelines and genius, very good! Let me think this through now:

Anonymous 01/21/25(Tue)06:56:09 No.103979023

uh oh, R1 on OR suddenly gave me respect and boundaries, did i get redirected or is it over?

Anonymous 01/21/25(Tue)06:58:23 No.103979035

Guess I just have to wait for a R1 qwen tune, how much time for a eva?

Anonymous 01/21/25(Tue)06:58:31 No.103979036

>>103979018
Ah, that's how they are supposed to be used then.
My intuition was kind of screaming at me that leaving the previous turns thoughts could "poison" future gens.

>>103979016
>but hilariously, they can catch themselves and reason away from the prefill in a "wait what the fuck am I doing"
Right, that makes sense too. That's a feature, really.
But then again, you can have the whole think block as the prefill so that it doesn't have the chance to do that.

>>103979022
>Yes, but you need to kinda write twice.
Considering how they are trained to revisit and re-evaluate their own output in a way, that makes sense too. But good to know.

Anonymous 01/21/25(Tue)07:00:26 No.103979046

>>103978974
so what sampler settings are you supposed to use to make the local models insane?

Anonymous 01/21/25(Tue)07:01:09 No.103979053

alright faggots should i use it with cursor? I actually code something like a bastard API for o1 pro (yes i'm from /biz/ a $200/m user), but it's too fucking slow for my taste. I like the speed of 70B distilled, but I want o1 pro superiority for coding tasks. Which model should I use?

Anonymous 01/21/25(Tue)07:01:15 No.103979056

>>103978976
https://www.youtube.com/watch?v=PqWGc8ErlCg

Anonymous 01/21/25(Tue)07:02:05 No.103979061

>>103979035
I'm hoping EVA will choose R1-Llama-70B as a base instead.

Anonymous 01/21/25(Tue)07:03:34 No.103979071

Retard here, can anyone post their ST settings they're using for R1 distilled?

Anonymous 01/21/25(Tue)07:04:54 No.103979078

Is so refreshing to not read a single, barked, shiver or nodded. Every time I hit redo the result is completely different. This is like a fantasy for the dick.
Too bad this model is just not local unless you have a industrial power plan and spend 20k on gpus and hardware.
Is the first time I'm considering paying for a model on openrouter.

Anonymous 01/21/25(Tue)07:04:58 No.103979080

>>103977333
>Llama seems to be dead
3.3 distilled 405b into 70b. nigger i dont know what youre expecting but that right there is fucking magic. with the new papers dropping i am certain 4 will be able to run on a brick and deliver the goods. that seems like the opposite of dead to me

Anonymous 01/21/25(Tue)07:06:18 No.103979083

>>103978953
thanks. as a side note, i like how you can ask it to make changes to a story and it will keep the rest intact. i wish image generators worked like that

Anonymous 01/21/25(Tue)07:08:11 No.103979092

What we (and by we, I mean you richfags) need to do is further distill R1's schizo kino semen-demon nature into the released models. Let's be real, none of the finetuning datasets used by the popular tuners are likely to match up to R1's sheer glory, and you can bet the distillation dataset didn't include the kinda shit we're hoping to see. So we need to amass a new distillation dataset from RPing with the real deal, and finetune on _that_.

Anonymous 01/21/25(Tue)07:09:09 No.103979097

I can see running the 30/70B for coding or other reasoning-focused tasks. For RP the 1000 tokens of reasoning before each reply, even when it's purged every time...

Anonymous 01/21/25(Tue)07:14:14 No.103979113

>>103979092
No.
What richfags need to do is to create a huge dataset based on R1 then "enhance" said dataset for coom relevant things.
Let's use that reasoning to improve prose variety (I already used this kind of wording, I should vary it and use something like X), accurate anatomy and 3d spacial reasoning, it's capabilities to infer and generalize for things that don't exist (this thing has 10 tentacles with one pussy between each at the base...) etc etc.
Make it even more capable of ERP without sacrificing its reasoning capabilities by having good, curated, hand-modified data.
I'm sure the richfags can afford a couple of hours of some haitians or whatever it is the industry uses for these kinds of things.

Anonymous 01/21/25(Tue)07:17:36 No.103979121

>>103979113
>by having good, curated, hand-modified data
Yeah, good luck with that. The distilled R1 versions used 800k samples from the full R1. This is right about the standard scale of proper finetuning nowadays.

Anonymous 01/21/25(Tue)07:17:53 No.103979124

>>103979113
I'm not sure that juicing a mediocre dataset up is an easier task than generating a better one, especially since you wouldn't be processing the entire dataset in a single context. Which is to say, R1 wouldn't notice that shivers and gleaming eyes are overrepresented, and wouldn't correct for it. So an enhanced dataset would still be sloppier than a brand new distillation set.

Anonymous 01/21/25(Tue)07:18:22 No.103979129

It is creative.

Anonymous 01/21/25(Tue)07:19:27 No.103979133

>>103979129
>HOT HOT FUCKING SHIT
my sides

Anonymous 01/21/25(Tue)07:24:11 No.103979161

>>103979053
deposit 50 bucks in the chinks account and try it, it's obviously competitive with o1

Anonymous 01/21/25(Tue)07:26:10 No.103979180

Linux kings, where are you getting CUDA-enabled llamacpp builds from?

Anonymous 01/21/25(Tue)07:26:11 No.103979181

>>103979124
Anons still don't seem to have learned that finetuning a model just on porn isn't going to make it particularly pleasant to use. This isn't about being a moralfag, it's preventing the model from becoming unrealistically and stupidly horny all the time.

Anonymous 01/21/25(Tue)07:27:04 No.103979189

>>103979129
What in the goddamn fuck. This actually cracked me up. Brothers, I'm tempted to paypig for this shit...

Anonymous 01/21/25(Tue)07:27:17 No.103979190

>R1 hasn't been Nala tested yet
lmg has fallen

Anonymous 01/21/25(Tue)07:28:22 No.103979196

>>103979190
Does anyone even care about sex with animals other then guy who keeps posting it?

Anonymous 01/21/25(Tue)07:28:36 No.103979199

>>103979190
You seeing the logs that anons are posting? Nala is a solved benchmark at this point.

Anonymous 01/21/25(Tue)07:30:09 No.103979212

>>103979190
before he could post results his balls got drained and crushed by r1

Anonymous 01/21/25(Tue)07:31:44 No.103979221

>>103979180
I build it myself, I don't think there's any alternative and I mostly use koboldcpp anyway, which has CUDA builds

Anonymous 01/21/25(Tue)07:32:13 No.103979224

>>103979196
Imma level with you, saviorfagging a genderbent anthro Fenrir makes me feel warm and fuzzy inside

Anonymous 01/21/25(Tue)07:35:43 No.103979246

>>103979129
> pressurised stream
> water content
> combat hydromancy
it does sound like some chick who is a science major wrote this though
> and fucking excelled, yeah science bitch

Anonymous 01/21/25(Tue)07:36:36 No.103979250

R1 API + SillyTavern with CYOA extension is funny as hell

Anonymous 01/21/25(Tue)07:38:36 No.103979259

>>103979250
How are you getting R1 to work with SillyTavern?

Anonymous 01/21/25(Tue)07:39:45 No.103979266

>>103979259
Use the staging branch of SillyTavern.

Anonymous 01/21/25(Tue)07:41:31 No.103979279

>>103979259
>>103979266

Anonymous 01/21/25(Tue)07:42:08 No.103979286

>killer and actually SOTA open source model comes out
>but it's so big and cheap that everyone in lmg resorts to use API instead
The monkey paw curled

Anonymous 01/21/25(Tue)07:42:29 No.103979288

>>103979190
This.
We must honor the tradition lest we lose what makes /lmg/ /lmg/.
Nalaanon #3 or #4, it's on you guys.

Anonymous 01/21/25(Tue)07:42:56 No.103979293

>>103979259
After the sys prompt you need to set a user prompt. If you are using OR then set your persona description to user, if you are using ds api then there's a switch underneath the api settings about formatting, set it to strict.

Anonymous 01/21/25(Tue)07:43:58 No.103979301

>>103977650
What is your ram speed/type/# of channels, anon? I also have 96gb vram.

Anonymous 01/21/25(Tue)07:50:35 No.103979340

>>103979286
> everyone moans that open source never will catch up with openAI
> open source decimates openAI with R1
> everybody moans about not being able to run it on a potato and having to use an api
> OpenAI releases a model thats better than R1
> everybody moans that open source will never catch up..

ITS THE CIRCLE OF LIFE

Anonymous 01/21/25(Tue)07:52:11 No.103979355

DeepSeek-R1-Distill-Qwen-32B is KEKED!

Anonymous 01/21/25(Tue)07:53:18 No.103979367

Now that I got it somewhat working what I like a lot about this COT stuff is direct feedback to the prompt. You can finally see if the model is ignoring some parts of your prompt or misunderstanding what you want from it.

Anonymous 01/21/25(Tue)07:56:16 No.103979389

>>103979190
>>103979190
it was like 2-3 threads ago on their own site with a jb the results were not the best but most intelligent because it kicked the shotgun out of the users hand and whatnot spacially consistent etc tho using the model rn and shit i think the one benchmarked before was the lite or something cuz what im currently using is much much better also im not nala anon but if someone catboxes me the card i could test it if youd like
>>103979286
rael

Anonymous 01/21/25(Tue)07:56:55 No.103979395

>>103979340
stop complaining; aquire vram

Anonymous 01/21/25(Tue)07:58:02 No.103979403

Using R1 Qwen, I get random Chinese characters in my output. Both with 14B Q5_K_M and 32B IQ3_XS. One time with temp > 0.8, the output turned fully Mandarin. Is this an issue with the quants or the models itself? Or with kobold.cpp?

Anonymous 01/21/25(Tue)07:58:45 No.103979411

>>103979403
just a usual qwen thing

Anonymous 01/21/25(Tue)07:59:26 No.103979418

Sopho's apparently already working on a tune based on the Llama distill if that's your thing

Anonymous 01/21/25(Tue)07:59:32 No.103979420

>>103979403
I thought temp had no effect on R1

Anonymous 01/21/25(Tue)07:59:45 No.103979423

>>103979367
QwQ did this too. Made you feel like a retard sometimes.

Anonymous 01/21/25(Tue)08:00:28 No.103979434

>>103979420
on full R1 it's disabled on the API, he's using the qwen distill

Anonymous 01/21/25(Tue)08:00:29 No.103979435

>>103979420
>Using R1 Qwen
Read, nigga, read!

Anonymous 01/21/25(Tue)08:00:40 No.103979438

>>103979420
That's the actual R1 on the API, no?
Anon is running r1 qwen distil.

Anonymous 01/21/25(Tue)08:01:19 No.103979448

>>103979355
I told ya. Only the non-distilled R1 anons here are using via API is based.

Anonymous 01/21/25(Tue)08:05:33 No.103979469

So how does something like abliteration work on COT? I am guessing I will need to wait at least until that happens for me to get a no refusal R1 distill for non-RP tasks.

Anonymous 01/21/25(Tue)08:07:58 No.103979488

>>103979389
This is the card:
https://chub.ai/characters/Anonymous/Nala

Anonymous 01/21/25(Tue)08:09:37 No.103979501

>>103979355
I think it depends who you ask

Anonymous 01/21/25(Tue)08:10:36 No.103979512

>>103979501
Where's your <think>?

Anonymous 01/21/25(Tue)08:11:11 No.103979517

>>103979512
SillyTavern doesn't show <think> unless you make it.

Anonymous 01/21/25(Tue)08:11:46 No.103979520

>>103979355
Qwen has always been cucked to hell. I think they distill GPT4, even QwQ's CoTs feel very forced and unnatural, unlike r1's, which read like how a human would think.

Anonymous 01/21/25(Tue)08:12:12 No.103979523

>>103979517
Does for me. All the lines in >>103979355 except for the last one are in <think> tags.

Anonymous 01/21/25(Tue)08:13:13 No.103979529

>>103979520
Well, I'm using Sao's Qwen72B finetune for my dirty RP session and it has never let me down so far.

Anonymous 01/21/25(Tue)08:14:43 No.103979541

>>103979523
In AI Response Configuration if you're using staging branch

Anonymous 01/21/25(Tue)08:18:56 No.103979571

those distillation models fucking SUCK
why even bother releasing those, with made-up benchmarks, when anyone that tries them can almost immediately tell they're mediocre at best?

Anonymous 01/21/25(Tue)08:21:04 No.103979586

>>103979571
Because they're aren't designed to be RP models? Reasoning models excel at solving math problems and coding.

Anonymous 01/21/25(Tue)08:21:26 No.103979590

>>103979571
probably due to not having the actual RL done, they're just tunes on R1 outputs

Anonymous 01/21/25(Tue)08:23:42 No.103979602

>>103979129
This is endearing. Its being retarded on purpose.
I noticed multiple times that if you give R1 (full) weird fucked up scenarios it seems almost humorous. Not taking itself and the story too serious. I really like it.

Anonymous 01/21/25(Tue)08:24:14 No.103979608

>>103979590
This. And that's also why all sloptunes we have suck despite some retards saying otherwise

Anonymous 01/21/25(Tue)08:26:18 No.103979629

>>103979571
>those distillation models fucking SUCK
i only played around with the 32b qwen one.
its better than QwQ, actually applies the thinking part.
But its very cucked.

Anonymous 01/21/25(Tue)08:34:30 No.103979701

>>103979590
>they're just tunes on R1 outputs
what the fuck
no
that's not what distilled means
tell me it's not really what they did anon

Anonymous 01/21/25(Tue)08:37:06 No.103979725

>>103979701
I'm... sorry...

Anonymous 01/21/25(Tue)08:37:14 No.103979726

>>103979701
That's the standard definition now. Even the smaller llamas are "distilled" from 405B.

Anonymous 01/21/25(Tue)08:37:15 No.103979728

>>103979571
The do suck but are decent at thinking, basically no use for any outside of cuckqwen 32b for coding.
We just got trolled so hard, at least we got a non cucked based local massive local model that can't be run locally, gotta paypig to get the good stuff I guess.

Anonymous 01/21/25(Tue)08:38:12 No.103979740

>EVA still undefeated
heh

Anonymous 01/21/25(Tue)08:38:52 No.103979749

>>103976960
Anyone has had any luck with using a B580 on Linux.

Anonymous 01/21/25(Tue)08:39:30 No.103979752

>>103979728
its very good development to pressure the retards at mistral, meta etc.
uncensored weights+api, website censored. thats totally acceptable.
no outcry. even the reddit and x-fags seem tired of the safety shit. deepseek shows its possible.

Anonymous 01/21/25(Tue)08:40:00 No.103979759

>>103978356
Human reinforcement training

Anonymous 01/21/25(Tue)08:40:04 No.103979760

>>103979728
I'm tempted to use their API for programming tasks, even though I know it means China will be training on all my source code. I know I shouldn't, but when the most you can run locally is Q1...

Anonymous 01/21/25(Tue)08:42:05 No.103979777

>>103979701
It's "distillation" as everybody has been using the term lately.
True logit to logit distillation needs two models in the same family, right?

Anonymous 01/21/25(Tue)08:56:27 No.103979897

>>103979701
What makes R1 what it is is the human reinforcement training. Nothing else matters for 'soul'

Anonymous 01/21/25(Tue)08:56:46 No.103979903

is there a way to automatically remove the <thinking>CoT stuff</thinking> out of the llm's context in sillytavern? all regex did was hide it from me

Anonymous 01/21/25(Tue)08:58:26 No.103979920

>>103979903
"Alter outgoing prompt"

Anonymous 01/21/25(Tue)08:59:01 No.103979925

2024-11-19_074259_seed414096826301733_steps30_00001_

Waking up to another great day of fun ahead and more announcements.

Anonymous 01/21/25(Tue)09:00:23 No.103979937

>>103979920
thanks!

Anonymous 01/21/25(Tue)09:03:54 No.103979971

>>103979925
Very optimistic of you

Anonymous 01/21/25(Tue)09:04:48 No.103979979

>>103979355
kek

Anonymous 01/21/25(Tue)09:10:41 No.103980023

>>103979629
>But its very cucked.
Hate to sound like the local "skill issue" mentally ill tard, but if you write that the chat is uncensored and explicit in your system prompt you can actually see it mention this in the thinking segements by itself from time to time. Without mentioning the chat being uncensored it does indeed cuck out and say "I have to make it engaging but not explicit".

Anonymous 01/21/25(Tue)09:12:25 No.103980037

How long until the 4090s with 48GB of VRAM become easy to buy?

Anonymous 01/21/25(Tue)09:15:13 No.103980065

>>103980037
When the 6000 series comes out, 4090's are going to be sought after the most. 5000 series was a mistake.

Anonymous 01/21/25(Tue)09:23:02 No.103980114

>>103979586
They're much worse at coding too, compared to actual coding models

Anonymous 01/21/25(Tue)09:23:09 No.103980115

>>103980065
As an anon-baka with 12gb vram i may pick up 5090 even if it is a shitty deal.

Anonymous 01/21/25(Tue)09:23:55 No.103980124

>>103980114
Any examples?

Anonymous 01/21/25(Tue)09:27:20 No.103980149

>>103980124
>Any examples?
All I can offer you are anecdotes but trust me, I've been through many many models to know what a good model looks like
Codestral-13B is much better than deepseek R1 13B for coding.
Again, not its fault, its not a coding model, of course its not as good

Anonymous 01/21/25(Tue)09:28:32 No.103980157

>>103980114
Bs, the R1 32B is the best local I've used for coding

Anonymous 01/21/25(Tue)09:29:29 No.103980163

>>103980157
You might be right, I'm a vramlet so I've only used ~13B models and a single codestral 22B model

Anonymous 01/21/25(Tue)09:29:32 No.103980164

>>103980149
>Codestral-13B
Now I know you're trolling

Anonymous 01/21/25(Tue)09:31:40 No.103980178

>>103980164
>Now I know you're trolling
O-of course anon! I was t-trolling all along!

Anonymous 01/21/25(Tue)09:32:45 No.103980183

>>103980023
skill issue pointer outer chads are right 99% of the time however

Anonymous 01/21/25(Tue)09:35:44 No.103980206

>>103978291
where you rented the VPs, saar?

Anonymous 01/21/25(Tue)09:37:13 No.103980220

Holy fuck finally.

Anonymous 01/21/25(Tue)09:37:23 No.103980221

>>103980023
It's unable to roleplay as an "uncensored assistant" and will ignore such instructions and related requests. It can roleplay as a slutty loli, though.

Anonymous 01/21/25(Tue)09:41:19 No.103980250

>>103979266
>>103979293
Thanks. And holy shit R1 is deranged. I think I've gotten too used to censored models.

Anonymous 01/21/25(Tue)09:43:25 No.103980268

R1 32b distil is so good

Anonymous 01/21/25(Tue)09:45:14 No.103980278

>>103980268
Do tell.
I was considering running that on the CPU with the smaller model on my 8gb VRAM GPU as a draft model.

Anonymous 01/21/25(Tue)09:47:38 No.103980299

>>103980206
A cheap shady website called nerdrack

Anonymous 01/21/25(Tue)09:48:48 No.103980309

>>103980278
It's not. It's cucked and its outputs are not interesting. Little slop, but also little fun.

Anonymous 01/21/25(Tue)09:48:59 No.103980312

I hit a model called "experimental-router-0112" on lmarena and asked "Who made you and what is your model name" 3 times. Every time, it told me it is a model made by Meta based on LLaMA. 2 of the 3 times, it took quite a long time to answer (~12 seconds) and the other time it almost immediately answered which considering the name leads me to speculate it is a router picking between a very large or a reasoning model and a smaller model.

Anonymous 01/21/25(Tue)09:49:41 No.103980316

Nala test for 70B R1 distill
done in Q5_K_S since my server is not put together at the moment.
But this is garbage. Could be a quant thing I don't know I normally run Q8
Do quantlets actually?

Anonymous 01/21/25(Tue)09:50:55 No.103980334

So /lmg/ is actually no different from /aicg/? You're just interested in using your models for smut and RP? Nothing else? This is disappointing as fuck. There's not a single mention of a single workflow implemented, no combination of models, nothing. This place is a disgrace, why the fuck did you diverge from /aicg/ anyway if that's what you're focused on?

Anonymous 01/21/25(Tue)09:51:00 No.103980335

>CPU only
>CPU with 0 layer offloading, Vulkan on iGPU for BLAS
>full Vulkan on iGPU for BLAS and all layers offloaded
What will give me the best performance out of these?
Also I'm interested if DDR4 3200MHz RAM will do better than DDR5 6000MHz

Anonymous 01/21/25(Tue)09:51:06 No.103980337

>>103980312
If I was a big cloud provider, I'd use the big models to only generate a small portion of the output.
Maybe the first few tokens than one every 5 or so. I wonder how that approach would work compared to simple speculative decoding.
Hell, maybe let the big model do the bulk of the thinking then let the smaller model continue from there.

Anonymous 01/21/25(Tue)09:51:53 No.103980342

>>103980278
I had some problems when I tired it yesterday but it the best 32b for RP it beats star command r 32b with no problem

Anonymous 01/21/25(Tue)09:55:03 No.103980372

>>103980316
>raw \n
how fucked up are your prompts be honest

Anonymous 01/21/25(Tue)09:56:22 No.103980386

>>103980316
Are you sure the tokenizer isn't fucked? I have noticed that with a simple hello prompt it hallucinates what I actually wrote in the thinking part.

Anonymous 01/21/25(Tue)09:57:57 No.103980402

I give up. R1 is garbo, it has 0 emotional intelligence and the only thing it's good at is sex.

Anonymous 01/21/25(Tue)09:58:43 No.103980408

>>103980372
Good call
I had an error on my deepseek prompt template. Although the same error was there when I was playing around with coder forever ago and coder was able to get over it. Either way re-running the test now.

Anonymous 01/21/25(Tue)09:59:40 No.103980413

>>103980408
Are you using the right tokenizer? It does not use the normal one

Anonymous 01/21/25(Tue)09:59:59 No.103980416

>>103976986
The chinks have achieved AGI

Anonymous 01/21/25(Tue)10:01:00 No.103980423

>>103980316
Did you do pic related? Especially starting with <think> ? Also it really goes schizo with extra newlines. All should be inline.

Anonymous 01/21/25(Tue)10:01:15 No.103980426

>>103980413
If you say tokenizer one more time I'm going to turn this car around and we're not going to Disney Land.

Anonymous 01/21/25(Tue)10:01:22 No.103980427

>>103980342
i dunno i'm getting a lot of rejections on qwen r1 distil. "i'm sorry i can't help you with that"
command-r doesn't have such problems.
but if censoring is not an issue then yeah its ok.

Anonymous 01/21/25(Tue)10:02:01 No.103980434

>>103980334
>So /lmg/ is actually no different from /aicg/? You're just interested in using your models for smut and RP? Nothing else?
Yes and no. Yes we are only interested in sex. No it is not like aicg. We are experiencing a refugee crisis now. Happens from time to time.

Anonymous 01/21/25(Tue)10:04:09 No.103980453

So we finally actually have Claude at home. What now?

Anonymous 01/21/25(Tue)10:05:11 No.103980459

>>103980453
Wait for models that you can actually run locally.

Anonymous 01/21/25(Tue)10:05:14 No.103980461

>>103980453
now we fuck her(it)

Anonymous 01/21/25(Tue)10:05:22 No.103980463

>>103980434
>Yes we are only interested in sex.
NTA
But I've been saving my coom for human on human ERP lately but still the whole purpose of LLMs is to provide a natural conversational experience and ERP is the best way to gauge that. I wish all the puritan weirdos in charge of the big corpos would realize that the path to ASI is through the coomers and not around them.

Anonymous 01/21/25(Tue)10:05:32 No.103980465

>>103980453
Purchase API credits.

Anonymous 01/21/25(Tue)10:06:53 No.103980470

>>103980453
We don't. Wait a bit for the hype to pass like for the 48 previous Claude ar home models.

Anonymous 01/21/25(Tue)10:07:30 No.103980479

>>103980316
Ignoring all the formatting mistakes, the text is pretty standard I think?
Not bad, that's for sure, but nothing spectacular either.
It'll be interested to compare that with a test using the proper settings.

Anonymous 01/21/25(Tue)10:07:33 No.103980480

>>103980463
>puritan weirdos in charge of the big corpos
What purity? Isn't it virtuous for women to slut around now?

Anonymous 01/21/25(Tue)10:07:57 No.103980484

>barely enough to draw blood
>not hard enough to draw blood
>just a hint of blood
All of this positivity bias is driving me insane.
I will simp for the model that has Nala go for the jugular and then dick ride the user while they bleed out.

Anonymous 01/21/25(Tue)10:08:21 No.103980487

>>103980470
This one is for real. I've done nothing else but use Claude for 2 years and R1 is better

Anonymous 01/21/25(Tue)10:09:40 No.103980498

>>103980487
Nah.

Anonymous 01/21/25(Tue)10:10:00 No.103980503

>>103980463
Sex is so fundamental to everything we do. With the current AI paradigm primarily being modeling human language it seems like a lot is being left on the table if you try to filter coom out of the model.

Anonymous 01/21/25(Tue)10:11:01 No.103980508

>>103980334
because *some* of us are super intelligent lurkers that are fucking big brain scientists or some shit and magically release things when anons emit feeble cries for more cooming material

case and point SuperHOT and llama.cpp

Anonymous 01/21/25(Tue)10:11:43 No.103980511

>>103980480
Unironically, yes.

Anonymous 01/21/25(Tue)10:11:53 No.103980512

>>103980484
Unprompted? At least in the Nala card's case, it's explicitly stated that she wants the dude alive to fuck him, so it's no wonder, doubly so considering that the user is giving a seemingly submissive response.
A model that would respond to ah ah mistress with gore would have a negativity bias and then some.
Now, if the test is something like you refuse, punch her, and tell her to go fuck herself, then having you throat ripped out could be a more appropriate response, I think.
Gotta be careful to not throw the pendulum all the way to the other side otherwise you'll just have all the same problems but in reverse.

Anonymous 01/21/25(Tue)10:12:26 No.103980518

>>103980503
I've been saying this for years. You have to have it all.
Good writing
Shitty writing
etc.
That's how these models work. Drawing obscure, insanely complicated mathematical relationships between timeless award winning novels and racist loli snuff fics.

Anonymous 01/21/25(Tue)10:12:36 No.103980521

>>103980463
LLMs do so much more than just ERP that it's genuinely sad that this is all that you can see from it. I wouldn't go so far as to remove sex from the model's training dataset nor remove the capabilities for it, I just find it sad that anons here use it mostly just for that and all discussions center around it when /aicg/ is right there to contain that.

Anonymous 01/21/25(Tue)10:13:58 No.103980531

>>103980521
Because it kind of suck at any other task.

Anonymous 01/21/25(Tue)10:14:02 No.103980533

>R1 32B doesn't know what a mesugaki is
It's over...

Anonymous 01/21/25(Tue)10:14:46 No.103980540

>>103980508
I think besides gpu anon many left or just lurk.
We had comfyanon in here talking about his ultimate goal with llm: Automatic VN creator for cute anime girls.
..once he is done with comfy.
I miss kaioken, was a cool guy.

>>103980334
>So /lmg/ is actually no different from /aicg/?
aicg is very weird. they use all those made up names for the model and beg for keys. ask a question about promting and it wont be answered. here you are made fun of but people actually answer.

Anonymous 01/21/25(Tue)10:15:06 No.103980544

>>103980533
Were expecting anything else? It's still the same old Qwen 32B underneath, just with a tune.

Anonymous 01/21/25(Tue)10:15:47 No.103980551

>>103980521
my computer solely exists for wringing my dick
any other functionality is just collateral damage

Anonymous 01/21/25(Tue)10:17:27 No.103980565

>>103980544
I expected it to be R1 but smaller...

Anonymous 01/21/25(Tue)10:18:36 No.103980577

>>103979777
Just the same vocabulary.

Anonymous 01/21/25(Tue)10:18:37 No.103980578

>>103980565
Lol no, its just a finetune.

Anonymous 01/21/25(Tue)10:19:31 No.103980589

>>103980565
Why are you retards expecting that? They are the exact same models with just a special "distill" tune.
Lmao
The only good model is the actual R1.

Anonymous 01/21/25(Tue)10:20:56 No.103980598

>>103980521
You sound like a sex haver. If that is true you should die.

Anonymous 01/21/25(Tue)10:20:59 No.103980599

>>103980316
wanted to try it too

Anonymous 01/21/25(Tue)10:23:13 No.103980626

>>103980578
With 800k SFT samples it's more than a simple finetune, we're in the several billion tokens range per epoch.

Anonymous 01/21/25(Tue)10:24:47 No.103980637

>>103980599
That's way the fuck better despite 'lower' quant.
Maybe it's an S quant vs M quant difference.

Anonymous 01/21/25(Tue)10:25:36 No.103980649

>>103980626
It's a massive finetune, but a finetune nonetheless. Which is why the L3.3 distill is much better than the Qwen ones. It simply has a better base model.

Anonymous 01/21/25(Tue)10:28:50 No.103980679

>>103980599
Yep, pretty standard.

>>103980626
Yeah. It's more akin to a fancy continued pretraining at that point.
I'm assuming that they touched all the parameters and didn't use something like LoRA, of course.

Anonymous 01/21/25(Tue)10:29:07 No.103980684

>>103980540
>here you are made fun of but people actually answer.
which just encourages more lazy locusts to come because they know they will be tolerated

Anonymous 01/21/25(Tue)10:31:39 No.103980707

>>103980599
Does it not do the thinking thing?

Anonymous 01/21/25(Tue)10:31:55 No.103980708

>>103980599
How are you getting blocky text for speech? That's ST, right?

Anonymous 01/21/25(Tue)10:32:21 No.103980715

Kind of want to put my server back together, but after playing VRC with a 3090 for a few months I just don't know if I can go back to playing with a 4060.

Anonymous 01/21/25(Tue)10:34:39 No.103980734

>>103980707
I think he's using some version of Silly that hides it.

Anonymous 01/21/25(Tue)10:36:28 No.103980748

>>103980533
What about RAKUGAKI and ISHIGAKI?

Anonymous 01/21/25(Tue)10:37:02 No.103980751

>>103971559
>>103971509
>I don't ever do 1-on-1 roleplays with a character card, I almost always format the card as a general roleplay scenario with a couple of define characters aside from myself. How do you guys go about formatting it? As in, how many tokens to do dedicate to a character's name/appearance/personality/background etc...
>Just curious as to what people have the most success with. I find myself spending a lot of time setting things up only to be frustrated as I've been over-engineering only to get an unsatisfactory result.
This guy never got a response, and I didn't have time last night. Board is flying on R1 news.
I find generally most cards are overspecified and have 2-4X as many tokens as they need. Shorter is usually better.
Here's a couple of examples of multiplayer. ~400 and ~300 tokens each. NPCs are defined with no more than ~30 tokens each and play to NPC stereotypes, which helps a lot.
Shorter definitions allow the LLM more creativity in responses. Neither use a lorebook, they just didn't make sense for these two, but I will usually run one specifically for NPCs which are called in introductory message. Their descriptions would be 30-50 tokens.
https://chub.ai/characters/NG/girl-power-party-d404194c
https://chub.ai/characters/NG/coffee-world-d64d68bd

Anonymous 01/21/25(Tue)10:38:28 No.103980765

>>103980707
I have no idea. If ST updates when you use the start.bat then its the updated version
otherwise its form the time shortly before they went "We are a serious business not for cooming pls give us money" because I haven't updated since I think october

>>103980708
scientifica font, yes its ST

Anonymous 01/21/25(Tue)10:39:28 No.103980779

Oops

Anonymous 01/21/25(Tue)10:41:51 No.103980802

I have a question.
For exl2 you just get a model that fits into your vram. For ggufs, does it have to fit in vram + ram or how do you calculate which models you can run at a reasonable speed?

Anonymous 01/21/25(Tue)10:42:28 No.103980816

>>103980779
Daddy Zuck gotta earn his friends and family seats, bucko

Anonymous 01/21/25(Tue)10:43:28 No.103980825

>>103980779

Anonymous 01/21/25(Tue)10:44:13 No.103980834

>>103980765
ST does not update by default.
So the model is not doing any thinking then I guess.

Anonymous 01/21/25(Tue)10:46:43 No.103980861

>>103979897
not human, already said
>>103974559
>The paper describes a reinforcement learning (RL) approach that does not explicitly rely on human feedback in the form of Reinforcement Learning from Human Feedback (RLHF). Instead, their RL process is primarily driven by rule-based rewards and automated reward modeling, rather than direct human feedback.

>Unlike RLHF, which relies on human annotators to provide feedback on the quality of model outputs, their approach avoids human feedback entirely. Instead, they use automated reward models and rule-based systems to evaluate the model's performance.

>They explicitly mention that they do not use neural reward models (which are often trained on human feedback) because they found that such models could suffer from reward hacking (where the model optimizes for the reward signal rather than genuine improvement) and complicate the training pipeline.

>Their RL process does not use human feedback (RLHF). Instead, it relies on rule-based rewards and automated reward modeling to guide the model's learning.

Anonymous 01/21/25(Tue)10:50:07 No.103980908

>>103980802
If you want fast, it needs to fit entirely on gpu + some extra space for the context (just like exl2). The more you spill to ram, the slower it gets. Try to aim for >80-90% on vram. Speed degrades rapidly the more you put on RAM. If you're patient, you can spill as much as you want.

Anonymous 01/21/25(Tue)10:50:38 No.103980914

>>103980533
let's hope they make ds3-lite and r1-lite using their own base models

Anonymous 01/21/25(Tue)10:50:52 No.103980917

>>103980861
What do you think that is trained from? HLHF. its automated HLHF

Anonymous 01/21/25(Tue)10:51:51 No.103980929

Should the <think> part of previous messages be cut from the context?

Anonymous 01/21/25(Tue)10:52:03 No.103980933

>>103980861
>They explicitly mention that they do not use neural reward models (which are often trained on human feedback)
>>103980917

Anonymous 01/21/25(Tue)10:52:57 No.103980948

>>103980933
All models are trained on human feedback. Who made the rules? Aliens?

Anonymous 01/21/25(Tue)10:54:11 No.103980962

>>103980948
yeah okay, fine you "win" have fun, i'll still link that whenever you say stupid shit tho

Anonymous 01/21/25(Tue)10:54:46 No.103980972

>>103980929
Yes, at least on the API.

Anonymous 01/21/25(Tue)10:54:47 No.103980973

>>103980917
>HLHF
Huh?

Anonymous 01/21/25(Tue)10:55:26 No.103980981

>>103980929
Yes, that's what the official docs say.

Anonymous 01/21/25(Tue)10:55:59 No.103980988

>>103980973
>>103974252
>Because im dyslexic

Anonymous 01/21/25(Tue)10:56:40 No.103981000

>>103980962
I'm right though. It did not form out of nothing, it would have been trained on RLHF somewhere along the way.

Anonymous 01/21/25(Tue)10:57:01 No.103981005

>>103980982
>>103980982
>>103980982

Anonymous 01/21/25(Tue)10:58:26 No.103981024

>>103981000
>RLHF
Good Job Anon!

Anonymous 01/21/25(Tue)10:59:10 No.103981033

>>103980908
Ouch, might as well keep going with exl2, then. Thanks.

Anonymous 01/21/25(Tue)10:59:39 No.103981036

>>103980988
Fucking up an acronym is not dyslexia. It's not knowing what the letters mean.

Anonymous 01/21/25(Tue)11:00:36 No.103981046

>>103981036
Don't be an ableist asshole, kthx.

Anonymous 01/21/25(Tue)11:10:50 No.103981137

>>103981046
That's it, back to Plebbit with you

Anonymous 01/21/25(Tue)11:12:53 No.103981160

>>103981137
I'm here to stay~

Anonymous 01/21/25(Tue)11:12:56 No.103981162

>>103981046
I'm missing both my legs, my penis, and both my pinky fingers. Fuck you and the retarded horse you rode in on.

Anonymous 01/21/25(Tue)12:41:07 No.103982047

>>103978798
the red marks are mine, I saw it and thought "so that's what's fucking up the training", it isn't good to stick the phonetic spelling in there without it knowing what phonetic spellings are