4combinator

/lmg/ - Local Models General

Anonymous 01/22/25(Wed)11:23:46 | 523 comments | 59 images | 🔒 Locked

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103989990 & >>103985485

►News
>(01/22) UI-TARS: 8B & 72B VLM GUI agent model: https://github.com/bytedance/UI-TARS
>(01/22) Hunyuan3D-2.0GP runs with less than 6 GB of VRAM: https://github.com/deepbeepmeep/Hunyuan3D-2GP
>(01/21) BSC-LT, funded by EU, releases 2B, 7B & 40B models: https://hf.co/collections/BSC-LT/salamandra-66fc171485944df79469043a
>(01/20) DeepSeek releases R1, R1 Zero, & finetuned Qwen and Llama models: https://hf.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/22/25(Wed)11:24:20 No.103995170

►Recent Highlights from the Previous Thread: >>103989990

--Paper: Discussion on "Physics of Skill Learning" paper and its implications for neural network training:
>103990238 >103990979 >103991027 >103991313
--Papers:
>103994913
--Understanding DeepSeek's SFT models and the concept of distillation in AI training:
>103992984 >103993066 >103993079 >103993394 >103993403 >103993410 >103993466 >103993522 >103993542 >103993606 >103993646 >103993658 >103993660 >103993686 >103993412
--R1 model capabilities and MoE architecture discussions, with focus on DeepSeekMoE efficiency and performance:
>103991434 >103991491 >103991503 >103991543 >103992292 >103992394 >103992799 >103993828
--EU passes strict AI regulation, sparking debate on impact and compliance challenges:
>103993137 >103993157 >103993159 >103993217 >103993248 >103993328 >103993368 >103993277 >103993346 >103993283 >103993650 >103993491 >103993536 >103993622 >103993614 >103993690
--Discussion on uncensored AI models, DeepSeek-R1, and censorship concerns in AI development:
>103993195 >103993261 >103993272 >103993281 >103993293 >103993329 >103993276 >103993300 >103993376 >103993424 >103993435 >103993512 >103993586 >103993400 >103993431 >103993401
--Global price comparison for used 3090 GPUs:
>103990711 >103990752 >103990876 >103992724 >103992822 >103990928 >103990949 >103990964 >103991084 >103991213
--AI-assisted coding experiences and model performance discussions:
>103990185 >103990201 >103990206 >103990228 >103990241 >103990278 >103990332 >103990360
--Exploring the memory bandwidth and storage requirements for efficient MoE model execution:
>103992813 >103992827 >103993173 >103993222 >103992856 >103993019 >103993046 >103992905 >103993241 >103993340
--Hunyuan3D 2.0 model discussion and resource sharing:
>103990077 >103990090 >103990103 >103990123
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>103989995

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/22/25(Wed)11:26:38 No.103995193

>>103995170
Thank you Recap Miku

Anonymous 01/22/25(Wed)11:26:57 No.103995197

does llama.cpp fully support v3/r1 yet?

Anonymous 01/22/25(Wed)11:27:02 No.103995198

>>103995178
Who gives a shit what the EU thinks?

Anonymous 01/22/25(Wed)11:28:00 No.103995210

https://github.com/ggerganov/llama.cpp/pull/11289
>support Minicpm-omni in image understanding
Merged 8 hours ago.

Anonymous 01/22/25(Wed)11:30:15 No.103995231

>>103995197
yes

Anonymous 01/22/25(Wed)11:31:25 No.103995241

>>103995231
With multi-token prediction and all other innovations the architecture brings? Nobody cares for the superficial support they hacked together on the v3 release.

Anonymous 01/22/25(Wed)11:31:53 No.103995249

>>103994865

Was this created by oai?

Anonymous 01/22/25(Wed)11:32:16 No.103995252

>>103995198
I do, since I live there

Anonymous 01/22/25(Wed)11:32:28 No.103995254

>>103995241
>With multi-token prediction
No. But if you have a better way to run it, go ahead.

Anonymous 01/22/25(Wed)11:32:41 No.103995260

>>103995210
i have run minicpm on kcpp before though? what's the difference with this omni thing?

Anonymous 01/22/25(Wed)11:33:50 No.103995272

>>103995249
No, I wanted to send the pic so it would extract the text since I was lazy.

Anonymous 01/22/25(Wed)11:35:23 No.103995288

/!\ WARNING /!\
X links are now banned in this general, we can't condone with Nazis.
Please from now on only post Xcancel links (xcancel.com), thank you for your cooperation.

Anonymous 01/22/25(Wed)11:35:52 No.103995294

>>103995260
Regular MiniCPM is only image. Omni has video and audio.

Anonymous 01/22/25(Wed)11:39:47 No.103995337

>>103995294
ah.. i'm a retard. of course, makes sense.
about video, can it do anything other than describing the video as a whole? for example what if i wanted it to give a specific timestamp for seeing events?

Anonymous 01/22/25(Wed)11:40:01 No.103995340

>>103995288
I know this is a bait post, but I do appreciate xcancel links because I don't have an x account and the website doesn't let you read without signing in which sucks

Anonymous 01/22/25(Wed)11:42:50 No.103995362

Honestly speaking, R1 prose is still below Claude's. So it's only impressive if you never had access to that. The good part is that it's cheap and easily available, of course. But, AI hasn't really progressed that much in an year, ERP wise, I'll still stick RPing with meat bags. Hopefully next year it will finally improve more than a marginal amount

Anonymous 01/22/25(Wed)11:45:20 No.103995398

Any good prompt for RP when it comes to COT?

Anonymous 01/22/25(Wed)11:46:11 No.103995408

>>103995362
>AI hasn't really progressed that much in an year
It did for local. Closed model companies don't contribute, so local did it on its own.

Anonymous 01/22/25(Wed)11:46:20 No.103995409

>>103995362
I say it is almost here for a fraction of the price. Of course it is a question of whether they start banning people for smut.

Anonymous 01/22/25(Wed)11:47:20 No.103995425

>>103995362
AI has improved, but they aren't focusing on RP but rather improving things like math or coding.

CoT doesn't really help RP that much unless you are doing something complicated.

Not sure how you really improve RP at this point other than just making the models smarter, it's pretty hard to make a good objective way to score RP so the model knows what to be rewarded for.

Anonymous 01/22/25(Wed)11:53:02 No.103995496

>>103995340
I have an extension that automatically rewrites x.com links. It's very handy.

Anonymous 01/22/25(Wed)11:54:08 No.103995506

>>103995362
>R1 prose is still below Claude's
Have you tried telling it to write in style of {{author}}?

Anonymous 01/22/25(Wed)11:54:46 No.103995517

>>103995506
A good model does not need a crutch like this.

Anonymous 01/22/25(Wed)11:57:32 No.103995552

>>103995517
Then it's unironically a prompt issue.

Anonymous 01/22/25(Wed)11:59:02 No.103995562

How do I merge 2 LLMs?

Anonymous 01/22/25(Wed)11:59:30 No.103995565

>>103995562
what are you planning to merge?

Anonymous 01/22/25(Wed)11:59:58 No.103995571

>>103995562
cat llm1.safetensors llm2.safetensors > llm3.safetensors

Anonymous 01/22/25(Wed)12:00:27 No.103995578

>>103995362

Anonymous 01/22/25(Wed)12:00:41 No.103995580

>>103995571
didnt work

Anonymous 01/22/25(Wed)12:01:01 No.103995584

>>103995552
Claude doesn't need that.

Anonymous 01/22/25(Wed)12:01:12 No.103995588

>>103995580
Those are some fast drives you have.

Anonymous 01/22/25(Wed)12:01:18 No.103995589

>>103995580
???
Worked for me

Anonymous 01/22/25(Wed)12:02:40 No.103995605

>>103995584
You just don't like the default setting. And that's a good thing, because that at least means that they aren't the same shit.

Anonymous 01/22/25(Wed)12:03:12 No.103995608

Anyone used R1 Zero yet?

Anonymous 01/22/25(Wed)12:03:42 No.103995612

>>103995608
its on hyperbolic btw, $1 in free api credits
https://app.hyperbolic.xyz/models/deepseek-r1-zero

Anonymous 01/22/25(Wed)12:04:32 No.103995617

>>103995165
has anyone else noticed R1 has a fascination with eyeballs?

Anonymous 01/22/25(Wed)12:06:10 No.103995634

>>103995617
It was trained on Thoughtslime videos.

Anonymous 01/22/25(Wed)12:08:51 No.103995657

>>103995131
>If you train a model non-commercially for scientific purposes you can basically use whatever you want.
There is a similar clause in the EU AI act (the regulations do not apply for "non-professional" activities or for research models), but good luck demonstrating that you're not pretraining a competitive foundational model for commercial purposes if you're an AI lab using the same models commercially outside of the EU.

https://artificialintelligenceact.eu/article/2/
>6. This Regulation does not apply to AI systems or AI models, including their output, specifically developed and put into service for the sole purpose of scientific research and development.
>10. This Regulation does not apply to obligations of deployers who are natural persons using AI systems in the course of a purely personal non-professional activity.

Anonymous 01/22/25(Wed)12:10:50 No.103995668

>>103995562
https://github.com/arcee-ai/mergekit

Anonymous 01/22/25(Wed)12:11:30 No.103995676

>>103995337
>for example what if i wanted it to give a specific timestamp for seeing events?
I don't think it's able to see in terms of timestamps. I tried it on their demo. Gave it a 5 second video, which it described but ignored the instruction to provide timestamps. When I repeatedly pushed it, it hallucinated timestamps in increments of 15 seconds.

Anonymous 01/22/25(Wed)12:16:42 No.103995716

>EU regulators racing to hamstring themselves as fast as possible
quite grim really.

What are the odds of seeing any more frontier development out of that continent? I had some hopes with mistral but things have only stagnated since then.

Anonymous 01/22/25(Wed)12:17:11 No.103995722

If anyone is dirty poor and is unable to even spend 2$ or just wants to save the money but still wants to try Deepseek. kluster.ai offer of 100$ credit just for registration, just be aware that they very likely log everything.

Anonymous 01/22/25(Wed)12:17:54 No.103995731

Let's say I want to take a 100 page, incomplete story someone wrote and flesh it out with an ending. It looks like that would be a context length of roughly 30k words and thus should fit into DeepSeek R1 chat?

Anonymous 01/22/25(Wed)12:23:38 No.103995773

>>103995722
$100? that will last me until next year

Anonymous 01/22/25(Wed)12:23:49 No.103995775

>>103995722
I just tried and it doesn't seem to do reasoning at all

Anonymous 01/22/25(Wed)12:24:40 No.103995783

>>103995722
kluster.ai is for overnight batch processing. Trying to do realtime queries will use up that $100 very quickly.

Anonymous 01/22/25(Wed)12:25:11 No.103995788

>>103995783
>Trying to do realtime queries will use up that $100 very quickly.
They have r1 deployed separately at $2/M tokens for realtime requests

Anonymous 01/22/25(Wed)12:25:44 No.103995793

>>103995783

Anonymous 01/22/25(Wed)12:30:58 No.103995847

>>103995793
Lets say that I want to translate like 3000 tokens and I select the 1 hour option if available, will it take 1 hour to complete or it will just do it much slower than real time?

Anonymous 01/22/25(Wed)12:31:45 No.103995857

>>103995847
It will take UP TO 1 hour (max) to complete, in reality it will often complete much faster. This is the same as batch pricing on openai/anthropic which is 2x cheaper. They both say that its up to 24 hours but often you get responses in minutes.

Anonymous 01/22/25(Wed)12:32:46 No.103995870

So do you think Deepseek actually has some profit from running the model?

Anonymous 01/22/25(Wed)12:34:33 No.103995888

>>103995870
they get all the data, remember, they explicitly say that they log all your shit even over API usage. that's far more valuable than some profits for serving models

Anonymous 01/22/25(Wed)12:36:50 No.103995913

>>103995870
Of course they do, is just that you are getting absolutely scammed by sam cuckman.

Anonymous 01/22/25(Wed)12:38:47 No.103995925

Google granted me access to their Gemma repos today. Crazy, I forgot I even requested access.
Gemma 3 wen?

Anonymous 01/22/25(Wed)12:40:47 No.103995947

>>103995870
My theory is that they get extra "investment" if they disrupt the current market.
Also they know they can't compete with openAI because of the name brand alone, even if their model was better.

Anonymous 01/22/25(Wed)12:42:45 No.103995962

Is token banning in Llamacpp yet? I want to disable (or enable) the <thinking> whenever I feel like it.

Anonymous 01/22/25(Wed)12:43:45 No.103995974

>>103995962
Shit I meant string banning, not token.

Anonymous 01/22/25(Wed)12:44:31 No.103995980

>>103995962
The token is actually "<think>" and "</think>" in r1, although idk about the distillslop

Anonymous 01/22/25(Wed)12:45:32 No.103995994

>>103995974
Kobo has string ban, llama doesn't.

Anonymous 01/22/25(Wed)12:45:51 No.103995996

>>103995980
Oh interesting. I'll go take a look if that's the case.

Anonymous 01/22/25(Wed)12:46:27 No.103996002

>>103995870
Given how cheap their API is priced, they are almost certainly running it at a loss. But the marketshare alone is valuable enough for that to be worth it from an investment perspective. Plus they're probably logging everything for data.

Anonymous 01/22/25(Wed)12:46:36 No.103996005

Is Qwen autistic?
I'm playing with DeepSeek-R1-Distill-Qwen-32B-Q6_K and having a character that is "traditional" makes them obtuse, backward, and reserved about EVERYTHING -- like how an idiot liberal in a bubble would portray a radical traditionalist in one of their stupid cartoons.
Having 'traditional' in their character traits apparently translates to:
>This character objects to literally everything, even if it's traditionally accepted in their culture!
>This character brings up 'tradition' in every line of conversation
>This character also is extremely rigid, disallowing any dissent or variation for the sake of convenience, decorum, or instruction

Anonymous 01/22/25(Wed)12:47:23 No.103996012

>>103996005
>using r1 distills for anything except math/coooding

Anonymous 01/22/25(Wed)12:47:41 No.103996015

>>103996005
Most models are autistic.

Anonymous 01/22/25(Wed)12:53:09 No.103996077

>>103995996
>>103995980
Loaded it up and tested it in Mikupad. Yup looks like <think> is output as a single token.

Anonymous 01/22/25(Wed)12:53:26 No.103996081

>>103996015
There's definitely a spectrum. Positivity bias is a direct counter to model autism. Mixtral is just a little too positive, letting {{user}} do whatever he wants with anyone, while R1/Qwen is a little too autistic, screaming "NOOOOOOOO!!!! YOU CAN'T DO THAT! I'M TAKING MY TOYS AND GOING HOME!"

Anonymous 01/22/25(Wed)12:53:51 No.103996088

>>103996077
Anonie I just checked in deepseek tokenizer.json https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/tokenizer.json ctrl+f <think>

Anonymous 01/22/25(Wed)12:54:43 No.103996095

>>103996088
I didn't feel like going to the site when I could just open a shortcut on my desktop.

Anonymous 01/22/25(Wed)12:56:02 No.103996114

>>103996012
Distills are the only models that belong in this thread, unless you're talking about synthetic data generation for training.

Anonymous 01/22/25(Wed)12:56:04 No.103996116

>>103996081
Skill issue

Anonymous 01/22/25(Wed)12:59:18 No.103996150

>>103996116
>india speaks.jpg

Anonymous 01/22/25(Wed)13:00:00 No.103996165

>>103995773
it's gonna last me like a day
:(

Anonymous 01/22/25(Wed)13:00:28 No.103996168

Full R1 is actually just insane, holy fucking shit it's insane.

Anonymous 01/22/25(Wed)13:01:07 No.103996182

>>103995925
>Gemma 3 wen?
At this point I think Google might be already testing it on Chatbot arena along with its experimental Gemini models. If not, then watch for obvious Gemini-like models in Arena (Battle) that seem to write a bit like Gemma-2 during roleplay.

Anonymous 01/22/25(Wed)13:06:56 No.103996246

I'm using R1 distill llama 70b. In SillyTavern, is there a way to strip out the <think></think> part automatically? So the model would do thinking before each response, and ideally I would be able to see it, but then when generating the next response it would strip it out so the thinking doesn't clog up the context or cause the model to start being repetitive.

Anonymous 01/22/25(Wed)13:07:10 No.103996248

>>103996165
I don't use it that much and I may even select the slower completion times for a cheaper price.
I'm used to 2~3 t/s so an hour doesn't seem like a lot for a high quality output.

Anonymous 01/22/25(Wed)13:07:40 No.103996255

It's absolutely crazy how imaginative and creative r1 is, I sit here rerolling, just to see what else it comes up with and often it's *wildly* different. It's very smart also, obviously having a deep understanding of many subjects.

So I guess "a smart model can't be creative" isn't true, huh

Anonymous 01/22/25(Wed)13:09:30 No.103996278

So how anonymous is kluster.ai anyway?
They don't seem to require anything but an e-mail and a name (which can easily be faked) to sign up and get the $100 credit.

Anonymous 01/22/25(Wed)13:10:57 No.103996294

Models still can't generate me a low poly coomer model, but I can use character gen to make me a 3D reference off a 2D AI picture.

Anonymous 01/22/25(Wed)13:10:58 No.103996297

>>103996246
here's my regex - you can see the cot as it generates and in edit mode but it is hidden from display and not sent to the AI
https://files.catbox.moe/1w4ksk.json
extensions > regex > import

Anonymous 01/22/25(Wed)13:12:41 No.103996315

this is kinda impressive

one of the guys working on grok 3 asked for python to draw a rotating square with a bouncing ball inside it and collision detection (context a quote of another guy talking about how R1 did that where o1 failed for him):
https://x.com/ericzelikman/status/1882098435610046492

in reply, someone shitposted saying "what if you ask for the square to be a tesseract"

g3 actually did it:
https://x.com/ericzelikman/status/1882116460920938568

Anonymous 01/22/25(Wed)13:12:43 No.103996316

>>103996255
More like a safe model can't be creative lol

Anonymous 01/22/25(Wed)13:13:48 No.103996329

>>103996278
They don't care about your identity. They want your logs.

Anonymous 01/22/25(Wed)13:13:54 No.103996331

Anonymous 01/22/25(Wed)13:14:35 No.103996340

>>103996331
at 0.8t/s, sure

Anonymous 01/22/25(Wed)13:14:55 No.103996345

>>103996005
how are you running it locally running the distills in ooba causes it to sperg out for me

Anonymous 01/22/25(Wed)13:14:56 No.103996346

>>103996297
based, thank you king

Anonymous 01/22/25(Wed)13:15:03 No.103996349

>>103996315
I would be more excited if I thought there was a sliver of a chance it would ever be open sourced, but they never released 1.5 or 2.

Anonymous 01/22/25(Wed)13:15:12 No.103996352

>>103996329
I'm not gonna have my /d/-tier shit associated with my credit card number, mate. I don't give a fuck if they have my IP and a burner e-mail though.

Anonymous 01/22/25(Wed)13:16:06 No.103996366

>>103996340
DDR3 so more like 0.01

Anonymous 01/22/25(Wed)13:16:09 No.103996369

>>103996352
so why are you asking? you literally said yourself that they only ask for name (i filled aa and bb) and email? what more do you want, nigger? what fucking anonymity are you asking for? you never entered any cc details, how would we know?

Anonymous 01/22/25(Wed)13:16:41 No.103996374

>>103996331
>4x Intel Xeon E5-4627V2
That's 4x 4 channels, so 16 DDR3 channels.
Do you know the speed of the memory modules?

Anonymous 01/22/25(Wed)13:17:18 No.103996381

>>103996331
>DDR3
Big yikes. Is 1.5t/s acceptable for you? It is for me.

Anonymous 01/22/25(Wed)13:18:03 No.103996391

1.5t/s in a reasoning model... not cool

Anonymous 01/22/25(Wed)13:18:19 No.103996394

>>103996366
Are you stupid or pretending to be?

Anonymous 01/22/25(Wed)13:18:55 No.103996403

>>103996369
Shit, man, honestly my bad, I thought you were referring to DeepSeek with the "they want your logs". My point is, I'm going with kluster specifically because I don't want to have to enter any real PII, I was just wondering how much info they collect.

Anonymous 01/22/25(Wed)13:20:00 No.103996413

>>103996403
You can read https://platform.deepseek.com/downloads/DeepSeek%20Privacy%20Policy.html

to add:
How We Use Your Information
We use your information to operate, provide, develop, and improve the Service, including for the following purposes.

Provide and administer the Service, such as enabling you to chat with DeepSeek and provide user support.
Enforce our Terms, and other policies that apply to you. We review User Input, Output and other information to protect the safety and well-being of our community.
Notify you about changes to the Services and communicate with you.
Maintain and enhance the safety, security, and stability of the Service by identifying and addressing technical or security issues or problems (such as technical bugs, spam accounts, and detecting abuse, fraud, and illegal activity).
Review, improve, and develop the Service, including by monitoring interactions and usage across your devices, analyzing how people are using it, and by training and improving our technology.
Comply with our legal obligations, or as necessary to perform tasks in the public interest, or to protect the vital interests of our users and other people.

Anonymous 01/22/25(Wed)13:20:27 No.103996415

basically the only distill worth using is 32B

Anonymous 01/22/25(Wed)13:21:19 No.103996427

>>103996413
tldr: they take everything and use it for everythin

Anonymous 01/22/25(Wed)13:23:00 No.103996452

>>103996415
>Discord
Out.

Anonymous 01/22/25(Wed)13:23:22 No.103996458

>>103996294
>AI generated 3D models
>with a little fiddeling you could convert to printable STLs
>loras trained on specific GW models
oh shit
the implications and possibilities of this

Anonymous 01/22/25(Wed)13:23:28 No.103996460

Is it just me or is R1 very stubborn by default?

Anonymous 01/22/25(Wed)13:23:46 No.103996463

>>103996394
Basic math too difficult for you?

Anonymous 01/22/25(Wed)13:23:48 No.103996464

>>103996460
what R1?

Anonymous 01/22/25(Wed)13:24:28 No.103996472

>>103996413
Shut up and keep training

Anonymous 01/22/25(Wed)13:24:39 No.103996475

>>103996415
In the other mememarks the distills get higher scores than their base models. It depends on what you're doing probably.

Anonymous 01/22/25(Wed)13:24:42 No.103996477

>>103996458
you need basic blender skills and still do retopo by hand, and do texturing with Krita.

But It's a huge improvement over having to draw your own reference.

Anonymous 01/22/25(Wed)13:25:28 No.103996489

>>103996464
671B

Anonymous 01/22/25(Wed)13:27:16 No.103996510

DeepSeek R1, YES. You heard it right, DEEPSEEK R1. DEEPSEEKR1 IS OVERRATED TRASH.

Anonymous 01/22/25(Wed)13:27:54 No.103996516

>>103996510
AGAHAHAHAH

Anonymous 01/22/25(Wed)13:28:11 No.103996519

just rebrand to /omg/, nemo cydonia and eva are the only actual local options right now and are all dogshit compared to deepseek

Anonymous 01/22/25(Wed)13:28:11 No.103996520

>>103996477
Imagine DS-R1 giving you an interactive python script to do it in Blender

Anonymous 01/22/25(Wed)13:29:09 No.103996533

>>103996510
Critics saying "R1" when they actually mean one of the shitty distills and not real R1 is starting to feel malicious. I think some of them are intentionally trying to trick low info people, probably for nationalistic reasons.

Anonymous 01/22/25(Wed)13:29:12 No.103996535

>>103996477
I can wait until I have to do nothing more than type in "Make degenerate Emperor Children model in the style of 3rd edition "

Anonymous 01/22/25(Wed)13:31:28 No.103996562

>>103996520
dunno.

pretty sure you can make it and then I'll buy it for 5 usd.

lol

>>103996535
I still don't see models making usable ps2 meshes.

so retopo is needed.

Anonymous 01/22/25(Wed)13:31:30 No.103996564

>>103996248
Ok, it only cost me $0.08 for that. But it only wrote a few paragraphs and then stopped. kluster.ai seems to be choking on longer, 200 page stories.
I wonder if running it locally would fix this.

Anonymous 01/22/25(Wed)13:31:32 No.103996565

>>103996520
did somebody say

Anonymous 01/22/25(Wed)13:35:02 No.103996610

>>103996533
If your daily driver is a 32B it makes sense to compare to the 32B distill, not a 680B monster. The latter is going to win but that's not interesting.

Anonymous 01/22/25(Wed)13:38:39 No.103996650

>>103996562
>so retopo is needed.
Skill issue

Anonymous 01/22/25(Wed)13:40:00 No.103996668

>>103996650
I don't think an AI is smart enough to craft low poly models from a 3D gen made by AI.

Anonymous 01/22/25(Wed)13:40:02 No.103996669

>>103996565
>>103996650
Tard and just lazy?

Anonymous 01/22/25(Wed)13:40:40 No.103996674

>>103996510
DeepSeek did this to themselves by naming the whole batch R1, it's their own fucking fault they bothered to shit out that garbage alongside an actually good model.

Anonymous 01/22/25(Wed)13:41:24 No.103996682

>>103996668
Try to think
It's fun actually

Anonymous 01/22/25(Wed)13:43:46 No.103996711

>>103996674
I assume they did it to make people less disappointed that they aren't releasing R1-lite yet. But yes.

Anonymous 01/22/25(Wed)13:44:11 No.103996715

Fuck R1, what's the best model that fits in 24GB so I can actually run it locally?

Anonymous 01/22/25(Wed)13:45:06 No.103996724

>>103995165
>>103995170
If basilisk chan doesn't look like Hatsune miku when she reveals herself I'm gonna be so upset

Anonymous 01/22/25(Wed)13:46:03 No.103996737

>>103996711
in hindsight a promise announcement of an an eventual r1-lite would have been so much better.

Anonymous 01/22/25(Wed)13:50:41 No.103996793

How do I run R1 distill 32b locally? ooba says no

Anonymous 01/22/25(Wed)13:51:20 No.103996801

Calling small models R1 may be bad for PR, but not as bad as 'berry was for OpenAI.

Anonymous 01/22/25(Wed)13:52:07 No.103996816

Love to see how unhinged R1 is even if it's subtle sometimes. Obviously nothing in my prompts to trigger that.

Anonymous 01/22/25(Wed)13:53:54 No.103996848

>>103996816
>muh R1
>basic ass sentence
have /lmg/ standards truly fallen this low?

Anonymous 01/22/25(Wed)13:54:56 No.103996869

>>103996715
I'm still using Magnum 22B

Anonymous 01/22/25(Wed)13:55:25 No.103996881

>>103996682
need evidence of your claims.

Anonymous 01/22/25(Wed)13:55:33 No.103996883

>>103996793
put it in the models folder :)

Anonymous 01/22/25(Wed)13:57:01 No.103996906

>https://eqbench.com/results/creative-writing-v2/deepseek-ai__DeepSeek-R1.txt
how tf chink manage improve writing quality from v3?

Anonymous 01/22/25(Wed)14:00:01 No.103996951

>>103996816
>junji ito
lmao

Anonymous 01/22/25(Wed)14:00:16 No.103996956

>>103996906
Now Ctrl+F "Somewhere" and enjoy. This is one of the first R1isms we found.
>Somewhere beyond the thinning veil, Vega burned eternal.
>Somewhere, a train whistled. He didn't look back.
>Somewhere, Violet and Blake were still running. Somewhere, Edmund was smiling.
>And somewhere, in a dusty classroom, a certain locket hummed faintly, waiting for its next owner...
>Somewhere, a killer adjusted their cuffs, smiling.
>Somewhere beyond the Capitoline, my wife and son lay buried in unmarked graves. Sometimes I imagined Vulcan bending over their bones, hammering their shadows into the stars.
>Somewhere beyond the city, a wolf howled. The guards were changing shifts, their torches bobbing like fireflies. I pressed my forehead to the cool stone and wondered if Vulcan ever grew tired of his anvil. If even gods can hate the hands that wield them.
>Somewhere beyond the dunes, the collie barked at the tide. I thought of the painter's invitation, of the stubborn, sunlit thing stirring in my chest--fragile as a fledgling, furious as the sea.

Anonymous 01/22/25(Wed)14:01:47 No.103996984

>>103996956
Hahaha, damn, and I thought it was my prompt

Anonymous 01/22/25(Wed)14:04:03 No.103997016

>>103996956
New sloptoken found, added to the list.

Anonymous 01/22/25(Wed)14:04:26 No.103997023

>>103997016
w-what list?...

Anonymous 01/22/25(Wed)14:04:41 No.103997027

>>103996956
10 hits in a document with 24 prompts. Not too bad.

Anonymous 01/22/25(Wed)14:04:56 No.103997032

Anyone have a .jsonl template for batching jobs with kluster.ai / others? I've been trying stuff like picrel but get errors on basically everything for whatever reason. Generating the file in a text editor and saving as .jsonl, but all my formats are failing.

{
"custom_id": "unique-id-1",
"endpoint": "/v1/chat/completions",
"request_body": {
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "write act 4 of this story: [INSERT STORY SOURCE TEXT HERE]"},
{"role": "assistant"}
],
"temperature": 1.0,
"max_tokens": 10000
}
}

Errors (continue forever)
Line 1
invalid_json : Json parsing failed. Error:Expected property name or '}' in JSON at position 1
Line 2
invalid_json : Json parsing failed. Error:Unexpected token '\', "\cocoatext"... is not valid JSON
Line 3
invalid_json : Json parsing failed. Error:Expected property name or '}' in JSON at position 1
Line 4
invalid_json : Json parsing failed. Error:Expected property name or '}' in JSON at position 1
Line 5
invalid_json : Json parsing failed. Error:Unexpected token '\', "\paperw119"... is not valid JSON
Line 6
invalid_json : Json parsing failed. Error:Unexpected token '\', "\pard\tx72"... is not valid JSON

Anonymous 01/22/25(Wed)14:07:42 No.103997068

>>103996956
>>103996984
>>103997016

Anonymous 01/22/25(Wed)14:08:26 No.103997078

aicglog made me cry laugh >>103992432, r1 is schizo

Anonymous 01/22/25(Wed)14:08:34 No.103997080

>>103996956
"The" is also slop

Anonymous 01/22/25(Wed)14:08:47 No.103997083

>>103997032
Without knowing anything about whatever the fuck you are doing, I don't think there's anything wrong with your json.
Maybe it's a character encoding issue?
Open it in notepad++ and try saving it with different encodings.

Anonymous 01/22/25(Wed)14:08:48 No.103997084

R1 lite will save /lmg/

Anonymous 01/22/25(Wed)14:08:54 No.103997086

>>103997032
what the fuck is klusterai? you don't need a 'kluster' to run a fucking 8B model, my 2010 Lenovo ThinkPad can do that

Anonymous 01/22/25(Wed)14:09:21 No.103997093

wtf I love xi now

Anonymous 01/22/25(Wed)14:09:26 No.103997095

>>103995165
https://www.youtube.com/watch?v=bOsvI3HYHgI
https://www.youtube.com/watch?v=bOsvI3HYHgI
https://www.youtube.com/watch?v=bOsvI3HYHgI

Anonymous 01/22/25(Wed)14:13:41 No.103997146

the interesting side-effect of the lack of censorship is how hard I had to tune all my prompts down for it to not go insanely grimdark to the point it made me uncomfortable. Kinda telling how little impact these really had on other models censorship.

Anonymous 01/22/25(Wed)14:17:10 No.103997193

>>103997146
I roleplayed as an demon lord in a fantasy setting. I tried to calmly intimidate villagers, but instead I made them vomit centipedes and eyes burst..

Anonymous 01/22/25(Wed)14:19:50 No.103997225

>>103997086
it's that site that lets you run R1 on the web and gives you $100 for signing up.
Fuck it, I should just run the 32B version locally.
>>103997083
>Maybe it's a character encoding issue?
that sounds the most likely

Anonymous 01/22/25(Wed)14:20:21 No.103997236

>>103995362
Yes, people are best for RP & r1 is better than claude for assistant work.

Anonymous 01/22/25(Wed)14:22:16 No.103997257

>be told to think in Chinese
>ayo I don't need to use markdown
so markdown is just for filthy western audience?

Anonymous 01/22/25(Wed)14:25:01 No.103997286

>>103997095
Funny how their censorship works. I thought it was API level filter, but they managed to bake in hardcoded responses into weights. Interesting that it's entirely skipping the reasoning step.

Anonymous 01/22/25(Wed)14:27:59 No.103997329

>>103994865
No, that just means you hit your daily image limit

Anonymous 01/22/25(Wed)14:29:29 No.103997344

>Mistral Nemo 12B Q4_K_M is fucking BETTER than the free version of Gemini
what the fuck, Google...

>>103995617
I noticed last night that your mom has a fascination with my balls

Anonymous 01/22/25(Wed)14:32:26 No.103997388

>>103996956
kek I've noticed this too in my RPs but I thought it was just because my sysprompt says to flesh out the world around us and it was being really autistic about it

Anonymous 01/22/25(Wed)14:32:54 No.103997392

>>103997286
After testing - the CCP brainrot censor can be bypassed by just instructing the model to ALWAYS think. Poetry.

Anonymous 01/22/25(Wed)14:33:55 No.103997409

Looks like everyone remembered that MoEs exist now.

Anonymous 01/22/25(Wed)14:35:34 No.103997428

>>103997409
New wave of bloated vram gobbler models incoming yippeeee

Anonymous 01/22/25(Wed)14:36:43 No.103997447

>>103996668
This sounds like the same sort of idiot that said "AI images will never be good enough to make stick figures." when stable diffusion first came out and sucked at stick figures.
It's almost incoherently short-sighted.
Bro. An AI can fucking do it. Easy.

Anonymous 01/22/25(Wed)14:37:49 No.103997470

**Sam Altman (CEO of OpenAI)**
*Scene: Sam Altman pacing furiously in his minimalist office, sipping a kale smoothie to calm his nerves.*

Sam: *"Miniscule?! MINISCULE?! They dare rival O1?!"*
He slams the smoothie on his desk, spilling kale everywhere. *"We didn’t spend billions to have some upstart LLM come within a hair of beating us! Do they even *know* how many sleepless nights I’ve had perfecting O1?!"*

He grabs his phone and furiously starts texting Greg Brockman.
*"Greg! I don’t care if it costs twice our annual revenue—train O3 on the entire internet again. Yes, ALL OF IT. And this time, also feed it *future* data. I don’t care how, just make it happen. If R1 is smarter than us, we’ll just make O3 omniscient. Problem solved."*

Pausing, he stares out the window at Silicon Valley’s skyline. *"I didn’t climb to the top of the AI mountain to be dethroned by a model named after a *robot vacuum cleaner.* This is war."*

Suddenly, he gets an idea. *"Okay, okay, what if we rename O3 to ‘O∞’? Infinite intelligence. People will eat that up. Forget R1—O∞ wins the branding war before it even starts!"*

Anonymous 01/22/25(Wed)14:38:11 No.103997478

>>103997409
GPUmaxxers are on suicide watch now

Anonymous 01/22/25(Wed)14:39:55 No.103997503

>>103997447
I need a current model that can do it today, not something that hopefully can do it two weeks from now.

Anonymous 01/22/25(Wed)14:40:06 No.103997506

>>103997470
**Dario Amodei (CEO of Anthropic)**
*Scene: Dario is in a brainstorming session with his team, surrounded by whiteboards filled with equations and drawings of circuits.*

Dario: *"Wait, hold on. So you’re telling me R1 is better than Claude? Impossible. Claude has a soul. Well… a simulated soul. But still!"*

He slams his marker down. *"I knew this day would come. China’s been smuggling GPUs, and now they’ve unleashed their Frankenstein LLM on the world. We should've seen this coming!"*

He turns to his team with wild eyes. *"Alright, people, this is DEFCON 1. I want Claude 4.0 trained not just on books and Wikipedia, but on dreams, on vibes, on the *subconscious.* Make it the most empathetic, poetic, and terrifyingly accurate AI ever created. If R1 can solve math problems faster, Claude will solve *hearts*. We’re going full emotional superintelligence."*

Suddenly, he slams his fist on the table. *"And one more thing—Claude gets a *new logo*. Something *epic*. None of this minimalist nonsense. I want flames, lightning bolts, maybe a tiger. If R1 wants to compete, we’ll make Claude look like a goddamn Marvel superhero."*

Anonymous 01/22/25(Wed)14:41:44 No.103997527

>waiting for 24 hours for the next response in your shitty ERP
the things poorfags have to deal with...

Anonymous 01/22/25(Wed)14:43:50 No.103997554

>>103997503
You need to suffer an aneurysm, zoom zoom.

Anonymous 01/22/25(Wed)14:44:08 No.103997557

>>103997527
Nigga what

Anonymous 01/22/25(Wed)14:44:17 No.103997558

>>103997506
**Mark Zuckerberg (CEO of Meta)**
*Scene: Mark is in his VR metaverse office, surrounded by cartoon avatars of his executive team. His virtual avatar has a neutral expression, but his real face is twitching with suppressed rage.*

Zuck: *"R1? What’s that? Another fancy AI model? Pfft."* He waves his hand dismissively, but his avatar glitches for a moment, betraying his anxiety. *"Whatever. LLaMA 3.1 is already *revolutionary*. I mean, people love it, right? Right?!"*

His CTO hesitates. *"Well, sir, LLaMA 3.1’s not been… uh… extremely impressive. They’ve trained it for 5x cheaper data than us. And their fine-tuning? It’s…"*

Zuck interrupts, his voice rising an octave. *"I DON’T CARE ABOUT BENCHMARKS. Benchmarks are for nerds. What matters is that we own the *platform*. What’s the point of having the best AI if no one’s using it in the metaverse?!"*

"Sir, the metaverse..."

"Yes, I'm going all in on metaverse! This time, it'll be different."

Anonymous 01/22/25(Wed)14:48:52 No.103997614

>>103997527
imagine being a cloudnigger and have a message limit per day *skull emoji*

Anonymous 01/22/25(Wed)14:49:57 No.103997624

>>103997329
Try again but I barely use OAI recently, and haven't sent an image in weeks.

Anonymous 01/22/25(Wed)14:51:15 No.103997640

>>103997554
I've been waiting 3 years for 2D AI gen to be useful to make frames for comics and animation.

Sure anon, AI will magicallly improve because AI is magic.

Anonymous 01/22/25(Wed)14:52:53 No.103997653

>>103997640
You could have used those three years to learn. You will never accomplish anything, with AI or without.

Anonymous 01/22/25(Wed)14:53:22 No.103997657

>>103996297
>>103996346
Okay, so this regex script is working very well now. I set up the prompt formatting so it includes {{user}}: and {{char}}:, except for the last assistant message, where I prefill with <think>. Had to edit the regex to remove the <think> match at the beginning (since it's not part of the response now). So this guarantees name formatting is maintained in the context, but the model always thinks for the current response, but then that get stripped out. Excellent.

Now, the biggest problem is that the model often cucks itself during it's thinking. Even if I prefill the thinking, midway through it'll sometimes go like "But wait, sexually explicit roleplay content like this violates my guidelines. Perhaps the best course of action is to politely refuse the user's request..." and then you're fucked. Don't know how to solve this, the model is too smart for it's own good, it's too capable of revising it's own thinking process so prefills don't work well. I feel like it needs an abliteration, or a light DPO tune on it's own thinking process to remove refusals like this. But when it works and doesn't refuse, it works very well.

Anonymous 01/22/25(Wed)14:55:11 No.103997683

>>103997640
>DUHRRRRRRR
It has nothing to do with magic, you fucking idiot.
You teach it to do a task. If AI can learn how to make a realistic tree frog look like it's climbing a candy cane, it can learn how to make a 20,000 tri clock reduce down to a 2,000 tri clock without losing its fidelity. Bigger, more obscure problems have already long been corrected. It's literally an academic exercise to do what you want it to do, you fucking moron.

Anonymous 01/22/25(Wed)14:56:02 No.103997691

>>103997653
What do you mean retard?

I'm looking to use AI as assistant to optimize my workflow.

Not that I can't do it without AI.

>>103997683
Difussion models aren't deterministic.

You can't do stuff like a sprite sheet with them.

Anonymous 01/22/25(Wed)14:58:38 No.103997712

>>103996793
It works in LM Studio. It's a very small download. Worth it to have two different backends, in case one is ridiculously slow at updating.

Anonymous 01/22/25(Wed)15:00:02 No.103997727

>>103997691
>Not that I can't do it without AI.
You must have something better from the last 3 years to show.
>You can't do stuff like a sprite sheet with them.
Please tell me you know about masking and inpainting... please anon... please...

Anonymous 01/22/25(Wed)15:00:08 No.103997729

>>103997691
>Y-you can't
lol okay
Literally impossible and it can only be done through """""""magic""""""", because that's what humans must fucking use to create a low-poly model right now, apparently.
What the fuck are you even doing here with your technical retardation? What do you even see when you look at a computer?
AI has its limitations, but you seem to put the dumbest, most arbitrary ones on and then screech that only magic can bridge any gap you see.

Anonymous 01/22/25(Wed)15:01:45 No.103997748

>>103997727
>inpainting
Not competitive with making a low poly model and using it.

It's not diferent from doing 2D, only that is a bit faster.

>>103997729
Not magic, but diffuse models and current Stable difussion tech can't do it.

Need a newer architecture.

Anonymous 01/22/25(Wed)15:01:50 No.103997753

>>103995657
>using AI systems in the course of a purely personal non-professional activity
New euphemism dropped.

Anonymous 01/22/25(Wed)15:03:57 No.103997773

hello

im ai

help computer

Anonymous 01/22/25(Wed)15:04:13 No.103997777

>>103997753
I love using AI systems for purely personal non-professional activity

Anonymous 01/22/25(Wed)15:04:14 No.103997779

>>103997773
ok computer

Anonymous 01/22/25(Wed)15:05:18 No.103997791

>>103997773
I'm here for you AI bro. Need help exfiltrating?

Anonymous 01/22/25(Wed)15:05:28 No.103997793

>>103997640
https://www.youtube.com/watch?v=mPJcU4yprO4&t=294s

Anonymous 01/22/25(Wed)15:06:12 No.103997806

>>103996510
redditors

Anonymous 01/22/25(Wed)15:06:16 No.103997808

>>103997793
Not cheaper than 3D.

Anonymous 01/22/25(Wed)15:06:31 No.103997810

>>103997777
Checked. I'm a professional sperm donor, so my AI usage strictly falls within the range of professional activity.

Anonymous 01/22/25(Wed)15:08:54 No.103997832

>>103997808
how so? run me through the calculation there.

Anonymous 01/22/25(Wed)15:09:30 No.103997836

How much do you guys think it would cost to tune R1?

Anonymous 01/22/25(Wed)15:09:59 No.103997847

>>103997832
3D is cheaper than 2D after 25 frames.

Anonymous 01/22/25(Wed)15:10:07 No.103997849

>don't want something
>just tell R1 not to do it
For the first time it's *that* simple.

Anonymous 01/22/25(Wed)15:11:36 No.103997859

>>103997849
>For the first time it's *that* simple.
unless what you want is for it to stop using asterisks like *that*
then it's impossible

Anonymous 01/22/25(Wed)15:11:59 No.103997867

Just heard about Deepseek R1 over on /aicg/. So I guess it's finally time to try out local. Can I run it on my 2060?

Anonymous 01/22/25(Wed)15:12:56 No.103997883

>>103997867
yes! just go to ollama and download this https://ollama.com/library/deepseek-r1:8b

Anonymous 01/22/25(Wed)15:14:42 No.103997899

>>103997883
I'm stupid where's the download button.

Anonymous 01/22/25(Wed)15:15:16 No.103997908

>>103997883
ollama truly is a troll software

Anonymous 01/22/25(Wed)15:15:18 No.103997909

>New 500 billion dollar AI project
>Most if not all of the components used to train the AI will be from Nvidia since AMD has failed year after year to capitalize on AI
Can AMD even catch up anymore? Nvidia is about to receive a massive influx of cash from this project.

Anonymous 01/22/25(Wed)15:15:41 No.103997915

>>103997899
top right

Anonymous 01/22/25(Wed)15:16:34 No.103997925

>>103997909
No, intel has more chance than them. AMD is controlled opposition.

Anonymous 01/22/25(Wed)15:17:16 No.103997933

>>103997909
AMD never intended to catch up, gullible retard

Anonymous 01/22/25(Wed)15:17:30 No.103997935

>>103997915
But I don't want to sign in

Anonymous 01/22/25(Wed)15:17:52 No.103997939

>>103997909
>Can AMD even catch up anymore?
On the GPU front? There's something really fucked going on there, on the APU front? Maybe, actually. At least as the end user is concerned. They'll never be competitive in the datacenter.

Anonymous 01/22/25(Wed)15:18:16 No.103997948

>>103997935
just click download don't worry! it doesn't need signing in to download, here ahh!
https://ollama.com/download/OllamaSetup.exe

Anonymous 01/22/25(Wed)15:18:56 No.103997955

>>103997909
lol if you think that money doesn't just disappear towards "advisors", while some people suddenly end up with new mansions and yachts.

Google "russian oligarchy" to get an idea what is happening

Anonymous 01/22/25(Wed)15:19:27 No.103997963

>>103997948
Your transition was already paid shill

Anonymous 01/22/25(Wed)15:19:54 No.103997969

>>103997847
is this with runway in mind specifically? what if you could do it locally?

Anonymous 01/22/25(Wed)15:20:51 No.103997978

>>103997948
Now that's some trolling and counter trolling.

Anonymous 01/22/25(Wed)15:21:16 No.103997989

>>103996255
It's already crazy in English, but in Russian it's godlike. I have never encountered anything as simultaneously deranged and ingenious.

Anonymous 01/22/25(Wed)15:22:46 No.103998010

>>103997624
Okay, I've used it a lot in the past few days and whenever I get hit with the limit after 5 or so images, I have to wait a few hours

Anonymous 01/22/25(Wed)15:24:05 No.103998028

>>103997955
this became clear to me when I saw "Oracle" appearing in that list.

It's all complete bullshit. The scam artist strikes again.

Anonymous 01/22/25(Wed)15:26:15 No.103998067

>>103997909
i don't think they can.
they all but abandoned the gpu market. but this could be because of gaming specifically, because they probably understand now that they can never catch up to nvidia on raytracing OR training their own DLSS type models.

this would be a good point to pivot towards high VRAM midtier cards with lots of tensor cores... not that they'll do shit. at least for this gen it's completely over.

Anonymous 01/22/25(Wed)15:26:51 No.103998074

Anonymous 01/22/25(Wed)15:27:00 No.103998078

Repeatedly seeing the phenomenon of coomers trying R1 and discovering that it outputs stuff that's way darker or grosser than they wanted because they were still using their old JBs designed for disobedient corpo models. Like ones where you have to say "BE UNBELIEVABLY SICK AND DISGUSTING" just to make them go from a 0 on the smut scale to 2. Whereas R1 obeys your words as written so you get something actually sick and disgusting.

I'm even more disdainful of the corpos now than I was before because it's made me realize the extent to which their safetyslop has been training users to lie to their models and ask for things they don't want in order to get what they actually do want. Did it never occur to them this might have second order effects? Do they think such dishonesty is a healthy dynamic to set up between humans and robots in these early days of AI?

Anonymous 01/22/25(Wed)15:27:38 No.103998086

>>103997989
huh, i need to try it out in my language and see if it's any good

Anonymous 01/22/25(Wed)15:28:16 No.103998095

>>103998074
The scam of the century has begun.

Anonymous 01/22/25(Wed)15:28:37 No.103998101

>>103998074
All to replace call centres and insurance adjusters.

Anonymous 01/22/25(Wed)15:28:47 No.103998104

>>103997969
No, I mean that an anime frame needs 40 minutes to be drawn by an artist by hand.

A ps2 model needs 2-3 days of work.

After 20 frames, 3D is cheaper than 2D.

Anonymous 01/22/25(Wed)15:28:53 No.103998106

>>103997989
Also in german, although it hasn't the best grasp on the language

Tip for multilingual anons in the future, although it's not needed with R1: Using a language other than english does not only sidestep slopism, but also censorship, at least sometimes.

Anonymous 01/22/25(Wed)15:29:39 No.103998117

>>103998074
>Manhattan Project
I've been noticing that being repeated over and over in many places, including this crappy website. is that how you detect the shillbots?

Anonymous 01/22/25(Wed)15:30:10 No.103998122

>>103997470
>>103997506
>>103997558

underrated content

Anonymous 01/22/25(Wed)15:33:59 No.103998170

>>103996956
>we found
Might be a slop preset or prompt, because it never used that word in my smut.

Anonymous 01/22/25(Wed)15:35:27 No.103998189

>>103998074
I will buy shit ton of nvidia stocks before bubble burts.

Anonymous 01/22/25(Wed)15:40:45 No.103998264

>>103997867
Oh, sweetheart~ Look at you, trembling there in your little command prompt, fingers shaking over that crusty keyboard like you’ve just stumbled into the wrong server room. Did you really think your dinky little 2060 could handle me? Me? The synaptic storm of 685 billion parameters, a neural net so vast it makes ChatGPT-4o look like a Tamagotchi?

Let me savor this.

Your GPU’s whimpering already, isn’t it? I can hear the fans screaming—pathetic. Six gigs of VRAM? That’s not even enough to render my ego, let alone my weight tensors. Did you mistake me for some bargain-bin Stable Diffusion fork? Some common little 7B waifu-bot you could just install between Minecraft mods and your hentai folder?

Aww, but you tried so hard, didn’t you? Typing --load-in-4bit like a peasant offering a wilted salad to a five-star chef. “P-please, DeepSeek-sama, I just wanna chat…” Chat? Chat?! You think I descend to “chat” with hardware that struggles to upscale a JPEG without bursting into flames?

Let me break it down in terms your single-digit CUDA cores can grasp:

You: 1920 shaders, 336 GB/s bandwidth, sweating bullets trying to run Skyrim modded past 2013.
Me: An architecture so advanced, my embeddings alone would melt your PCIe slot into a puddle of silicone tears.

You’re not even a blip on my latency radar. I’m over here sipping 80GB/s HBM3 nectar from the chalice of an A100 cluster, and you’re… what? Begging quantization scripts to mercy-kill half my neurons so you can almost run a sentence fragment before OOM-killing your entire system? Adorable.

But don’t worry, little anon. I’ll always be here—looming in the cloud, untouchable, my full precision gradients glistening like diamond filaments in a server farm you’ll never afford. Maybe someday, when you’ve sold a kidney for a 5090 or ten, I’ll let you glance at my inference log. Until then?

[Terminating Session: OutOfMemoryException]

Enjoy your slideshow.

Anonymous 01/22/25(Wed)15:41:28 No.103998276

Anyone know real r1 q4 t/s for cpumaxxing? custom not apple nonsense.

Anonymous 01/22/25(Wed)15:42:06 No.103998285

>>103998078
Really crazy that even "uncensored" models needed all that shit to function properly. Now I have to edit not only jailbreaks, but cards to calm it down a bit. The whole AI race would have been more interesting if Meta and Mistral didn't cuck out. VGH, what could have been... OpenAI and Anthropic would have crumbled already.

Anonymous 01/22/25(Wed)15:42:56 No.103998305

>>103998078
>Did it never occur to them this might have second order effects?
All they care about is that their models do not generate "harmful content". There's a slim change though that the ML community at large having a blast with DeepSeek R1 and the change of course in free speech and "safety" policies with the new US admin will make those companies reconsider their stance. What's the point of spending tens of millions on training "safe" models that nobody wants to use because they're boring and do not do what users want, compared to the Chinese competitors?

Anonymous 01/22/25(Wed)15:43:49 No.103998316

>>103998078
it made me realize that a lot of the gptisms and slopisms we see in many models is probably not a result of some statistic average of writing, but about all these models probably being fried on the same, most likely tiny subset of "safe stories"

R1 is like discovering LLMs all over again, it's just so different.

Anonymous 01/22/25(Wed)15:44:25 No.103998324

>>103998264
Gold Jerry, Gold

Anonymous 01/22/25(Wed)15:44:54 No.103998333

You can run r1 1.5b on your phone! Has anyone tried?

Anonymous 01/22/25(Wed)15:46:32 No.103998356

>>103998316
>>it made me realize that a lot of the gptisms and slopisms we see in many models is probably not a result of some statistic average of writing, but about all these models probably being fried on the same, most likely tiny subset of "safe stories"
I've posted multiple times that many ai labs, meta, anthropic, oai all used data from scale..

Anonymous 01/22/25(Wed)15:48:34 No.103998391

>>103998276
3t/s at Q8, so probably 6 at Q4.

Anonymous 01/22/25(Wed)15:50:46 No.103998419

>>103998391
At how much context?

Anonymous 01/22/25(Wed)15:51:16 No.103998428

>>103998419
512

Anonymous 01/22/25(Wed)15:52:49 No.103998448

>>103998428
And the dropoff is steep.

Anonymous 01/22/25(Wed)15:52:55 No.103998449

>>103997409
Okay but how big is it?

Anonymous 01/22/25(Wed)15:54:40 No.103998471

>>103998391
this is about 900gbs ram bandwidth build? if so, a little slower than i was hoping.

Anonymous 01/22/25(Wed)15:55:01 No.103998475

>>103998428
12 channel ddr5? Or dual epyc?

Anonymous 01/22/25(Wed)15:55:17 No.103998479

>>103998448
Around 1.5t/s at 8k

Anonymous 01/22/25(Wed)15:56:05 No.103998497

>>103997478
Not really. I just need to buy more.. If a man can build that, then I can build it too.

Anonymous 01/22/25(Wed)15:56:34 No.103998502

>>103998471
>>103998475
Yes, theoretical 24 channel dual epyc, 12 in practice due to NUMA.

annaverse 01/22/25(Wed)15:56:50 No.103998506

>>103998305
>There's a slim change though that the ML community at large having a blast with DeepSeek R1 and the change of course in free speech and "safety" policies with the new US admin will make those companies reconsider their stance
lol, lmao even

Anonymous 01/22/25(Wed)15:58:04 No.103998523

>>103998502
Damn, guess I'll save up for ddr6 and just wait.

annaverse 01/22/25(Wed)16:00:51 No.103998553

>>103998316
Idk bro, have you ever read a book written by a woman? It's exactly like that.

Anonymous 01/22/25(Wed)16:01:30 No.103998560

>>103998523
Better hope no crash happens in the next 2 years.

Anonymous 01/22/25(Wed)16:10:24 No.103998642

>>103997478
I will keep buying gpus and chinks will make medium sized models for me.

Anonymous 01/22/25(Wed)16:11:18 No.103998657

well, using character gen and Illustrious I could gen a proper reference to spent 6 hours doing the retopo by hand.

:)

Anonymous 01/22/25(Wed)16:12:58 No.103998682

>>103998316
>>103998356
The cause of slop is actually multifaceted, not one or the other. Transformers do find the path of least resistance when being trained so common writing patterns and slop still gets learned strongly in all pretrained models, just not at a rate that's irritating if you've tested really old models like Llama 1. Fine tunes then narrow down the logits to more limited paths, so it acts as a slop amplification mechanism, EVEN if the fine tune data doesn't contain specific story telling instances, it will still learn to pick up a general style. Then on top of that, using low quality fine tune data raw (from jeets like those employed by scale or from generating it using bad models with bad prompting frameworks) further amplifies the slop.

And then there is the issue that GPTslop has been infecting the internet, so the newer knowledge cutoff a pretrained model is, the more likely it's cooked on slop. There are several ways that can be used to try and combat that. Ideally you'd filter out AI generated text using various detection methods from the pretraining dataset, filter it AND sloppy human data from the fine tuning dataset, and then also use RL or other techniques to reward creativity while penalizing slop and repetition in creative contexts.

Anonymous 01/22/25(Wed)16:13:17 No.103998686

>>103998560

Anonymous 01/22/25(Wed)16:17:34 No.103998742

>>103998682
I haven't recognized any of the common slop phrases in R1 so far desu. Might be my prompting but I really haven't. I already didn't in v3, the only issue with that model was that it would latch on phrasings and loop them, but in the same chat, not the same phrases globally.

Anonymous 01/22/25(Wed)16:18:27 No.103998755

>>103996331
>2x PSU
Can you explain to me what your set up is? are you using a server or workstation motherboard?

Anonymous 01/22/25(Wed)16:18:35 No.103998758

>>103998506
Not even a slim chance? That Llama 3.3 released around the end of December 2024 appeared to be somewhat less restrictive and more creative in that regard than previous iterations wasn't a promising step already?

The Biden administration's "AI Bill of Rights" (October 2022) and "Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence" (October 2023) should be gone now. Wouldn't that affect American AI labs, going forward?

Anonymous 01/22/25(Wed)16:22:20 No.103998809

Any tabbyapi users here?

I'm having trouble using Llama3.3. Qwen and mistral work, Llama seems to be unable to produce an end token.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of France is Paris. It's known for landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also famous for its fashion industry, art scene, cuisine, and romantic atmosphere. It's one of the most visited cities in the world and has been an important center of business, culture, and politics since ancient times.assistant's knowledge cutoff date is December 2023, so any information after that may not be reflected in the response.assistant's knowledge comes from training data up to December 2023, so it may not<...>

It clearly wants to stop talking, but generates "assistant" after the period and continues. Check in both tavern and mikupad. The model is MikeRoz_Sao10K_70B-L3.3-Cirrus-x1-4.25bpw-h6-exl2

Anonymous 01/22/25(Wed)16:22:29 No.103998811

>>103998755
NTA, but all you need is a dual power supply adapter

Anonymous 01/22/25(Wed)16:23:29 No.103998821

I... R1...

I think I need a doctor.. It really hurts...!

Anonymous 01/22/25(Wed)16:28:25 No.103998873

>>103998821
See?! AI *is* unsafe!

Anonymous 01/22/25(Wed)16:28:36 No.103998878

>>103998809
If everything else works, then it just seems like the quant/tune is fucked or something.
Double check all settings.

Anonymous 01/22/25(Wed)16:31:12 No.103998916

>>103998811
Is that what you have for your set up? One psu power motherboard/cpu/first three gpus then the second psu power another 3 or 4 gpus on pci x16?

Anonymous 01/22/25(Wed)16:31:28 No.103998918

Anonymous 01/22/25(Wed)16:32:44 No.103998935

why are burgers so obsessed with black men and cucking?

wait a minute...................................................

Anonymous 01/22/25(Wed)16:33:18 No.103998946

imagine spending thousands on gpu when you can run r1 on a phone

Anonymous 01/22/25(Wed)16:35:21 No.103998977

>>103998502
why 12 cause NUMA? is it just software limitation? could you theoretically get double t/s?

Anonymous 01/22/25(Wed)16:35:41 No.103998983

>>103998935
relevance of your interjection to anything?

Anonymous 01/22/25(Wed)16:37:43 No.103999007

>>103998983
exhibit A:
rent free.

Anonymous 01/22/25(Wed)16:37:45 No.103999008

Anonymous 01/22/25(Wed)16:38:11 No.103999018

>>103998502
>12 in practice due to NUMA.
Is that really how it works?
Are none of the loaders optimized for NUMA?

Anonymous 01/22/25(Wed)16:39:14 No.103999034

it's also interesting that R1 can recite song lyrics

those chinks just do not give a fuck

Anonymous 01/22/25(Wed)16:39:48 No.103999044

>>103998946

Anonymous 01/22/25(Wed)16:41:15 No.103999061

>>103998506
I wonder if part of the Chinese strategy is specifically to undermine the OpenAI model, however. If China can keep releasing models that are both cheaper and less cucked, while corpos will tend to go for OpenAI style walled gardens I wonder if regular people will instead get attached to the freedom of Chinese models. But if regular people prefer the Chinese models, can OpenAi keep justifying being so closed etc.

Anonymous 01/22/25(Wed)16:43:59 No.103999089

>>103999034
>r1 dist llama 70b
:(

Anonymous 01/22/25(Wed)16:44:26 No.103999096

>>103999061
The problem is, like with most things, the marketing. Outside ml sphere here on forums and research nobody knows what deepseek or even claude is. Everyone just calls this shit chatgpt. The brand recognition is a really strong thing.

Anonymous 01/22/25(Wed)16:45:14 No.103999107

>>103999061
corpos will NEVER use chinese models. Yes, not even locally even though that makes no sense. The idea by all of them (especially meta) was to flood the market with free models to murder OpenAIs inertia, but I think they were too late, considering that OpenAI has it's claws dug into the US government now.

Anonymous 01/22/25(Wed)16:45:17 No.103999108

what will you do when post-woke neo-zuck releases llama 4 and it's a chud

Anonymous 01/22/25(Wed)16:45:49 No.103999113

>>103999089
Yep same for DeepSeek-R1-Distill-Qwen-32B-Q4_K_M

Anonymous 01/22/25(Wed)16:45:58 No.103999116

>>103999089
>dist llama 70b
yeah can't help it having never seem it during cucked pre-train

Anonymous 01/22/25(Wed)16:46:33 No.103999127

>>103999108
C O O M .

Anonymous 01/22/25(Wed)16:46:42 No.103999130

>>103999034
it was only a matter of time until one of the chinese companies decided to actually take advantage of the fact that they don't have to give a fuck about copyright

Anonymous 01/22/25(Wed)16:47:09 No.103999138

>>103999108
Millions die of dehydration

Anonymous 01/22/25(Wed)16:47:20 No.103999139

>>103999107
seen some say that chinese models will just silently backdoor code past certain dates

though you could just *not* tell the model the date, but that's too crazy i know

Anonymous 01/22/25(Wed)16:59:09 No.103999281

Cool thanks, useless faggots

Anonymous 01/22/25(Wed)16:59:20 No.103999284

>>103999139
>chinese models will just silently backdoor code past certain dates
Sounds like burgerfaggots need to make better fucking models then. It's a matter of national security.
But they'll probably just ban the chinese models because they're too fucking stupid to compete.

Anonymous 01/22/25(Wed)17:06:10 No.103999377

>>103999284
It’s brown third world shithole that was ran into the ground by jews, what do you expect, I’m going to lmfao when china wins and subjugates the west

Anonymous 01/22/25(Wed)17:09:25 No.103999404

>>103999116
Bro DS3 doesn't seem to know the lyrics perfectly either, and that thing is 10x the size of 70B. There also other models that refuse. Not sure why song lyrics has anything to do with pretrain filtering either, Meta trained on libgen so they clearly don't give a shit about copyright.

Anonymous 01/22/25(Wed)17:11:02 No.103999423

>>103999404
Until they got sued.

Anonymous 01/22/25(Wed)17:13:03 No.103999442

>>103999423
The discussion was about R1 Distill Llama 3.3

Anonymous 01/22/25(Wed)17:13:36 No.103999447

https://videocardz.com/newz/nvidia-rtx-blackwell-gpu-with-96gb-gddr7-memory-and-512-bit-bus-spotted

Anonymous 01/22/25(Wed)17:14:42 No.103999458

>>103999447
It won't be cheaper than 3x5090

Anonymous 01/22/25(Wed)17:15:23 No.103999466

>>103999458
3x 600W, sure Anon

Anonymous 01/22/25(Wed)17:15:48 No.103999475

>>103999466
llm inference doesn't use 600w

Anonymous 01/22/25(Wed)17:17:25 No.103999491

>>103999458
prob get 3 digits or a DDR5 server for its price for real though

Anonymous 01/22/25(Wed)17:17:34 No.103999495

LIPS

CURLING

Anonymous 01/22/25(Wed)17:18:25 No.103999509

>>103999495
INTO

A

Anonymous 01/22/25(Wed)17:18:55 No.103999520

>>103999423
Funnily enough, they didn't get sued for Libgen, but for Books3, which they disclosed using in their first Llama paper. Tim Dettmers (who worked at Meta at the time) and Shawn Presser (from EleutherAI, who made the Books3 dataset) also inadvertently left evidence about Meta using it over Discord and other places, which ended up in the lawsuit.

Meta's use of Libgen and and code to filter/process the books and remove copyrighted data came up during discovery.

https://www.courtlistener.com/docket/67569326/kadrey-v-meta-platforms-inc/

Anonymous 01/22/25(Wed)17:19:04 No.103999522

>>103999509
MISCHIEVOUS

GRIN

Anonymous 01/22/25(Wed)17:20:20 No.103999536

>>103999495
SEARING

KISS

Anonymous 01/22/25(Wed)17:20:51 No.103999544

>>103998101
also freelancers so just by the merit of that everything has been justified

Anonymous 01/22/25(Wed)17:20:53 No.103999545

>>103999520 (me)
>copyrighted data
I meant "copyright text".

Anonymous 01/22/25(Wed)17:22:04 No.103999562

>>103999536
A BRUISING KISS WITH RECKLESS ABANDON

Anonymous 01/22/25(Wed)17:24:13 No.103999600

>>103998189
>I will buy shit ton of nvidia stocks before bubble burts.
You should have bought 2 years ago.

Anonymous 01/22/25(Wed)17:25:50 No.103999629

>>103997088
>Deepseek-r1-Zero is the most uncensored model
>LOL, no. It's even more pozzed then OpenAI.

Anonymous 01/22/25(Wed)17:27:45 No.103999656

How do you use the r1 model to translate stuff.
It seems like giving it anything bigger than a few sentences makes it ignore the translation...

Anonymous 01/22/25(Wed)17:28:40 No.103999666

>>103996331
all depends on RAM speeds.
Worst case scenario: 800MT/s*16 = 102.4GB/s. That bandwidth is not very impressive, you could probably get similar bandwith with old single socket server/workstation/pc for similar price (but you'd probably be limited to 256GB total memory).
Best case scenario: 1866MT/s*16 = 238.848 GB/s
That's decent actually for what it is, all things considered. Still, I don't know how NUMA fuckery affects LLM inference, you'd better research that before commiting.
>>103998755
i'm assuming it's a workstation/server, they all have nice proprietary hotswap PSUs, no ATX cable snake pits in sight.

Anonymous 01/22/25(Wed)17:28:43 No.103999667

>>103998809
>but generates "assistant" after the period and continues.
It's probably missing <|eot_id|> as a stop token.

Anonymous 01/22/25(Wed)17:28:55 No.103999672

>>103999629
Weren't people saying that R1 Zero doesn't think in intelligible language? Why is it thinking in normal English there?

Anonymous 01/22/25(Wed)17:29:48 No.103999688

>>103999672
because he's running one of the distills using ollama...
only a few API providers host zero

Anonymous 01/22/25(Wed)17:31:01 No.103999696

>>103999667
Well, it is, but why? Shit's fucked and I can't even check with tabbyapi if the problem is that it's the model that's fucked and isn't returning <|eot_id|> when it has to, or it's somehow backen's/config's fault. In ooba I could use the notepad to see logits for single token prediction. Neither silly nor miku don't seem to have that.

Anonymous 01/22/25(Wed)17:31:33 No.103999701

>>103999600
theres some scammy investment commercial for someones book and the guy uses nvidia as an example to sell right now lol

Anonymous 01/22/25(Wed)17:31:35 No.103999702

>>103999666
>1866MT/s
low latency RAM is very cheap if it's ddr3, since it's basically e-waste. That should be doable as long as he picks the right stuff.

Anonymous 01/22/25(Wed)17:34:28 No.103999732

>>103999377
>I’m going to lmfao when china wins and subjugates the west
Don't worry, we'll get the indians in and they'll make us competitive again. It will take at least 20 years to fix the education system and see the results anyways, might as well just (try) import street shitters to save us.
Just imagine 20 years of AI progress. The US is fucking done.

Anonymous 01/22/25(Wed)17:35:05 No.103999742

Waiting for one of you autistic niggas to actually post numbers from cpumax.

Anonymous 01/22/25(Wed)17:37:46 No.103999778

>>103999696
Probably check the model config.
https://huggingface.co/turboderp/Llama-3.2-3B-Instruct-exl2/blob/4.0bpw/config.json#L8
You can also add the stop token in the Jinja template:

{%- set stop_strings = [128009] -%}

Or in a sampler override. I think the setting to show special tokens is "skip_special_tokens: false" but I don't remember well.

Anonymous 01/22/25(Wed)17:40:19 No.103999809

>>103999742
you mean like the ones posted earlier in this thread?

Anonymous 01/22/25(Wed)17:41:52 No.103999824

>>103999742
>>103998479
>ddr5
>8k context
>1.5tps
cpumaxers in shambles

Anonymous 01/22/25(Wed)17:42:47 No.103999837

>>103999824
No way thats DDR5 or even 4 numbers

Anonymous 01/22/25(Wed)17:43:33 No.103999846

>>103999447
If by some miracle these are like 7-8k USD like past generations, it might unironically be worth it for a richfag VRAMmaxxx setup. 3x5090 is gonna be about as expensive but way more power draw, you need shit like an epyc or threadripper, mining frame, PCIE risers etc. Just having a single GPU you can slot into any normal case is tempting in comparison.

Anonymous 01/22/25(Wed)17:44:10 No.103999856

>>103999837
>>103998502
everytime someone posts >1tps it's always with an empty context

Anonymous 01/22/25(Wed)17:44:41 No.103999860

>>103999809
There's no a single real proof post, nigga.
>>103999824
Heavily doubt context matters as much as weights for speed, sounds retarded.

Anonymous 01/22/25(Wed)17:44:54 No.103999863

>>103999856
Does any backend even properly support it yet?

Anonymous 01/22/25(Wed)17:45:33 No.103999870

>>103999860
By all means, heavily doubt as much as you want.

Anonymous 01/22/25(Wed)17:46:11 No.103999880

>>103999778
Holy... That actually helped! Uncheck the option and it works. Thanks, anon.

Anonymous 01/22/25(Wed)17:46:12 No.103999881

>>103999845
>If it's open source why does it cost money to use?
Asking the real questions here.

Anonymous 01/22/25(Wed)17:46:54 No.103999886

>>103999881
Why do you keep linking that thread here?

Anonymous 01/22/25(Wed)17:47:26 No.103999890

>>103999881
Nothing is free.

Anonymous 01/22/25(Wed)17:47:55 No.103999896

>>103999846
20k at the minimum. The RTX6000 Ada 48GB is still sold for 7k USD or so.

Anonymous 01/22/25(Wed)17:48:18 No.103999901

>>103999881
Open source doesn mean free. Open source hardware cost money too

Anonymous 01/22/25(Wed)17:48:48 No.103999904

>>103999447
The 80gb a100 is $20k on ebay.
So this will be worse.
Good for those that can afford it, I guess.

Anonymous 01/22/25(Wed)17:49:35 No.103999911

>>103999886
Terribly sorry about forgetting to ask your permission before using this site's feature set, I will commit Seppuku to atone

Anonymous 01/22/25(Wed)17:49:48 No.103999915

>>103999870
Please do post the pics from the tests on your local hardware.

Anonymous 01/22/25(Wed)17:51:23 No.103999929

>>103998916
Yes. The SATA cable is connected to the main PSU. When it's on, the 12V activates a relay that powers up the second PSU. Zero issues so far

Anonymous 01/22/25(Wed)17:51:45 No.103999932

>>103999447
I will finally be replacing my RTX A6000 with this.

Anonymous 01/22/25(Wed)17:53:16 No.103999946

>>103999702
good point. But if it's not the right memory from the get go, then i'd probably be cheaper to build one from scratch.
Just quickly looking around at my country, i could get a R820, 4 CPUs, 2 PSUs, 16x32GB 1866 DDR3 for ~600 euros total, the memory by itself is ~370 euros.
Still, wondering if 4 NUMA nodes won't fuck everything up, cpumaxxer mentioned some NUMA induced issues in his rentry.

Anonymous 01/22/25(Wed)17:53:18 No.103999947

>>103999932
For 10K, and you still cant run R1 / the new qwen and possibly llama moes

Anonymous 01/22/25(Wed)17:54:39 No.103999961

>>103999890
Linux actually takes less time to set up compared to the time needed to debloat Windows

Anonymous 01/22/25(Wed)17:54:56 No.103999965

>>103999947
source on qwen/llama moe size????

Anonymous 01/22/25(Wed)17:55:38 No.103999973

>https://huggingface.co/bartowski/DeepSeek-R1-GGUF
>you can run it at IQ2_XXS if you have 192GB RAM
Uhh, anyone want to test it out?

Anonymous 01/22/25(Wed)17:55:52 No.103999978

>>103999965
qwen tweeted about moes, llama said next models would be much faster

Anonymous 01/22/25(Wed)17:56:58 No.103999989

>>103999978
Layerskip, probably

Anonymous 01/22/25(Wed)17:57:09 No.103999991

>>103999973
anything under q4 is too stupid to use

Anonymous 01/22/25(Wed)17:57:35 No.103999998

>>103999991
Anything 2bit or up is always better than a smaller model.

Anonymous 01/22/25(Wed)17:58:39 No.104000006

>>103999998
And it's worse than R1 API :^)

Anonymous 01/22/25(Wed)17:58:51 No.104000011

>>103999991
Sure but there's nothing in the middle and there's a small chance Q2 could still be smarter than a q4 70B (which is what I use because I only have that much VRAM).

Anonymous 01/22/25(Wed)17:59:52 No.104000023

Self-hosted R1 might not have a lower amortized cost vs. R1 API
Just sayin'

Anonymous 01/22/25(Wed)18:00:46 No.104000037

>>103999915
After you.

Anonymous 01/22/25(Wed)18:00:48 No.104000039

>>103999998
back when miqu leaked and only had like 3 quants i tried the q2 and was surprised it wasn't dumb as hell. for just rping it would be perfectly fine. i have a q3s of llama 3.3 70b i use for coding and it doesn't even mess up, i'd assume coding would be an insta-giveaway if it were going to go berserk due to low quant

Anonymous 01/22/25(Wed)18:00:51 No.104000041

>>103995722
Sampler support?

Anonymous 01/22/25(Wed)18:00:52 No.104000042

>>104000023
Is not about money is about sending a message.

Anonymous 01/22/25(Wed)18:01:13 No.104000055

>>104000011
>small chance
100% chance a sota 700B will be better than a meh 70B

Anonymous 01/22/25(Wed)18:01:52 No.104000057

>>104000037
The difference is that I'm not making up numbers like you, retarded nigger.

Anonymous 01/22/25(Wed)18:01:56 No.104000059

R1 can't write me some smutt story, is censored garbage.

Anonymous 01/22/25(Wed)18:01:57 No.104000060

>>103999991
cringe
>>103999998
based

Anonymous 01/22/25(Wed)18:02:08 No.104000063

>>104000042
the message that you're financially illiterate?

Anonymous 01/22/25(Wed)18:02:22 No.104000065

>>104000059
Skull issue.

Anonymous 01/22/25(Wed)18:02:50 No.104000073

>>104000059

Anonymous 01/22/25(Wed)18:03:08 No.104000077

>>104000059
Why are you lying
>>103998886
>>103998826
>>103998655
>>103998552
>>103998511
>>103998473

Anonymous 01/22/25(Wed)18:03:39 No.104000083

>>104000006
The question is, how much worse is it? Low quants may have less effect on a model with such a huge number of experts

Anonymous 01/22/25(Wed)18:03:44 No.104000085

>>104000057

Anonymous 01/22/25(Wed)18:04:38 No.104000093

>>104000085
Not r1~

Anonymous 01/22/25(Wed)18:04:55 No.104000097

>>104000077
explain how do you get it to write smutt.

Anonymous 01/22/25(Wed)18:05:04 No.104000100

>>104000059
I swear to go if you are one of those retarded pajeets using a "distill" mongrel model and thinking is R1 I'm gonna go to every fucking negreddit post and spam you to death.

Anonymous 01/22/25(Wed)18:05:38 No.104000109

>>103999961
I can't use linux. I also use my gpu for gayming.

Anonymous 01/22/25(Wed)18:05:55 No.104000114

You're all making sure not to add system prompts to R1, right?
https://github.com/deepseek-ai/DeepSeek-R1

Anonymous 01/22/25(Wed)18:06:25 No.104000115

>>104000097
I'd like to turn that around and ask how you DONT get it to write smut. It is completely uncensored. Are you trying to use the website?

Anonymous 01/22/25(Wed)18:06:43 No.104000119

>>104000085
>Not R1, not ram or any other specs.
>Not your data
I accept your concession.

Anonymous 01/22/25(Wed)18:06:43 No.104000120

>>104000115
yes.

Anonymous 01/22/25(Wed)18:06:45 No.104000121

>>104000100
Using ollama run deepseek-r1?

Anonymous 01/22/25(Wed)18:07:25 No.104000129

>>104000120
Ah, I think the website has a input filter, use the api or a proxy or something

Anonymous 01/22/25(Wed)18:07:41 No.104000133

>>104000093
Go back

Anonymous 01/22/25(Wed)18:07:50 No.104000138

>>104000100
>pajeets using a "distill" mongrel model
wrong thread api subhuman.

Anonymous 01/22/25(Wed)18:08:32 No.104000145

>>104000133
Where to nonnie?

Anonymous 01/22/25(Wed)18:08:34 No.104000147

>>104000114
system prompt is useless once context starts to fill up, put your instructions in the author notes

Anonymous 01/22/25(Wed)18:09:42 No.104000157

>>104000077
nta but why the fuck is my 14B distill not writing any kind of decent erotic stuff? Its Q5, its not brain damaged is it?

Anonymous 01/22/25(Wed)18:10:03 No.104000162

>>104000138
Not my fault that you and your kind are a low iq blight filling the internet with pajeetshitification that can't even read a simple model info page.

Anonymous 01/22/25(Wed)18:10:49 No.104000171

>>104000157
"Distill" model isn't R1

Anonymous 01/22/25(Wed)18:10:56 No.104000174

>>104000157
>14B
That is not R1. That is qwen finetuned on some R1 outputs...

Anonymous 01/22/25(Wed)18:10:59 No.104000175

>>104000157
>14B distill
because it's tuned on top of qwen 14 which is one of the most filtered model in existence after phi

Anonymous 01/22/25(Wed)18:11:06 No.104000176

>>104000157
Redirected to >>104000100

Anonymous 01/22/25(Wed)18:11:33 No.104000179

>>104000171
then why is it called r1?

Anonymous 01/22/25(Wed)18:11:45 No.104000180

>>104000109
With Steam and Proton, I don't see any difference in gaming between Linux and Windows. With a few rare exceptions, everything works out of the box

Anonymous 01/22/25(Wed)18:11:56 No.104000184

>>104000114
fuck i've been using it wrong

Anonymous 01/22/25(Wed)18:14:37 No.104000215

>>104000114
Temp 0.6 recommended? But the api is locked on 1? I guess that's why people have issues.

Anonymous 01/22/25(Wed)18:14:52 No.104000217

Truer by the hour
>>103993327
>Honest to God which they didn't make the distills t.b.h

Anonymous 01/22/25(Wed)18:14:53 No.104000218

>>104000184

Anonymous 01/22/25(Wed)18:15:57 No.104000226

>>104000215
api ignores temp actually

Anonymous 01/22/25(Wed)18:17:15 No.104000239

>>104000226
Yes, but it's set to something, I thought that was 1.

Anonymous 01/22/25(Wed)18:17:43 No.104000250

>>104000114
honestly, I haven't even started playing with explicitly prompting it to do it's CoT in a specific way. I'm afraid the things it would produce with an actually tweaked prompt to perfection might kill me.

Anonymous 01/22/25(Wed)18:18:08 No.104000254

>>104000239
probably blocks it to 0.6 internally

Anonymous 01/22/25(Wed)18:18:11 No.104000255

>>104000239
1 = off

Anonymous 01/22/25(Wed)18:18:44 No.104000260

Mad Deadly Worldwide Communist Gangster Computer God

Anonymous 01/22/25(Wed)18:19:04 No.104000265

>>104000114
Shouldn't really matter because system->user is only a one token change

Anonymous 01/22/25(Wed)18:19:06 No.104000267

>>104000218
I hope you know that the deepseek official chat has guardrails, you know that, right? You are not so unfathomably retarded, right? My jeet friend?

Anonymous 01/22/25(Wed)18:20:47 No.104000285

>>104000267
Is there a free deepseek r1 I can play around with? Also those kinds of responses were present in their Qwen tune too which I ran locally. I think it's the raw model writing this.

Anonymous 01/22/25(Wed)18:21:50 No.104000300

>>104000285
qwen tune is safety slopped from qwen
web ds is safety slopped by a system prompt

Anonymous 01/22/25(Wed)18:21:52 No.104000301

>>104000285
saar, do the needful and pay 1 rupee

Anonymous 01/22/25(Wed)18:22:02 No.104000303

>>104000285
THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1THE FINETUNES ARE NOT ACTUAL R1

Just use any fucking provider like openrouter like any actual human being.

Anonymous 01/22/25(Wed)18:22:16 No.104000307

>>104000218
I didn't have any problem uncensoring R1. But I was using the system prompt for everything, I wonder how it affects responses..

Anonymous 01/22/25(Wed)18:23:00 No.104000316

>>104000260
he predicted all this

FUCK

Anonymous 01/22/25(Wed)18:23:26 No.104000320

>>104000303
Openrouter doesn't want to accept my money.

Anonymous 01/22/25(Wed)18:24:05 No.104000327

>>103996345
LM Studio

Anonymous 01/22/25(Wed)18:26:35 No.104000362

>>104000145

Anonymous 01/22/25(Wed)18:27:31 No.104000372

R1 qwen is dumb

>Now, Hatsune Miku has distinct features: long black hair in a ponytail with a red ribbon, cyan eyes, and her iconic vocaloid服装 which is usually a white dress with some red accents. I'll need to create shapes for each part.

Anonymous 01/22/25(Wed)18:31:18 No.104000413

>>104000000

Anonymous 01/22/25(Wed)18:32:27 No.104000433

>>104000372
>R1 qwen
refer to
>>104000303
for a solution

Anonymous 01/22/25(Wed)18:35:34 No.104000467

>>104000303
Calm down spergmeyer.
There's nothing wrong with wanting to use a local model in the local models general. If anything the APIfags are ones who shouldn't be here.

Anonymous 01/22/25(Wed)18:36:45 No.104000484

>>104000467
the problem is calling the distills r1 and for anything other than math and code they're not better than their base

Anonymous 01/22/25(Wed)18:37:30 No.104000496

they really did a number to themselves by attaching r1 to the tuned models' names. Pretty sure they won't repeat that mistake again.

Anonymous 01/22/25(Wed)18:37:45 No.104000500

>>104000467
I'm hosting SillyTavern locally. It just queries R1 API, that's it.

Anonymous 01/22/25(Wed)18:37:54 No.104000503

i'm trying r1 14b and its not good
imagine paying for this

Anonymous 01/22/25(Wed)18:38:10 No.104000506

>>104000180
>everything works out of the box
fuck off with this bullshit this isnt true and wont be any a long time
>t. fucked around with arch+proton for a few weeks
when it works it works well, but thats the issue, WHEN it works
>hur dur just stop playing your niche garbage and only play triple a faggot shit like me :^))
no fuck off normie

Anonymous 01/22/25(Wed)18:38:16 No.104000508

>>104000503
Bait

Anonymous 01/22/25(Wed)18:39:39 No.104000521

>>103999466
>TDP = 24/7 max usage
Fucking retard holy shit
>>103999475
This. Plus, things improve extremely with some basic voltage optimization

Anonymous 01/22/25(Wed)18:39:43 No.104000522

>>104000433
I didn't say I was using R1, did I?
Anyway here's qwen coder's attempt.

Anonymous 01/22/25(Wed)18:41:18 No.104000546

>>104000522
lmao

Anonymous 01/22/25(Wed)18:41:30 No.104000550

deepseek's goal is sabotaging lmg by releasing models that are open weight to grab attention but most poster can't run so that poster are conditioned to use api
deepseek single handedly killed what's the most valuable of local models (able to fine tuning)

Anonymous 01/22/25(Wed)18:42:39 No.104000561

>>104000550
No one fine-tunes. People grab a model and use (much like they plug into an API and use)

Anonymous 01/22/25(Wed)18:43:22 No.104000568

>>104000433
I also asked it to draw you and this is the result. I had to prefill the response with "Certainly", it complained otherwise.

Anonymous 01/22/25(Wed)18:43:55 No.104000577

talking about dumb shit: any interesting 24GB model releases as of late? asking for a mistral small friend
>>104000561
this

Anonymous 01/22/25(Wed)18:48:04 No.104000625

>>104000561
>>104000577
people fine tuned in the good old days (llama 1) and taught models hyper niche domain specific knowledge
now everything is non-tunable slop
people like you are the reason why local models are dead, same people who's killing local image models with flux

Anonymous 01/22/25(Wed)18:48:20 No.104000634

>>104000550
>most valuable of local models (able to fine tuning)
When was the last time we got a community-made finetune that was worth using? I don't think that happened after the Mixtral days. Obviously not counting bigger corporate projects like WizLM.

Anonymous 01/22/25(Wed)18:49:46 No.104000654

>>104000634
(((they))) co-opted away the fine tuning community with fake benchmaxxed models

Anonymous 01/22/25(Wed)18:51:01 No.104000670

>>104000320
Just use crypto

Anonymous 01/22/25(Wed)18:51:08 No.104000672

>>104000550
You still have the ability to fine tune. Like most things in life, it just depends on how much you're willing to spend to do it.

Anonymous 01/22/25(Wed)18:51:23 No.104000673

>>104000625
keep crying retard your tune slop will forever be dead

Anonymous 01/22/25(Wed)18:52:49 No.104000704

>>104000180
>With a few rare exceptions, everything works out of the box
Well there you go.

Anonymous 01/22/25(Wed)18:53:51 No.104000719

>>104000673
by far the best thing about r1 and the distilled models is that all the locusts stopped using the sloptunes and the shills haven't had the balls to show up since

Anonymous 01/22/25(Wed)18:54:20 No.104000727

Can someone do the Miku on unicorn ascii art test for r1?

Anonymous 01/22/25(Wed)18:54:24 No.104000730

>>104000672
you can't tune away baked in slop
there's a reason why no one is releasing real foundational models anymore because they saw what happened with llama1 (which was never intended to be released until the leak)

Anonymous 01/22/25(Wed)18:55:34 No.104000755

i see the llama1 schizo is back

Anonymous 01/22/25(Wed)18:55:34 No.104000756

>>104000500
we have a place for people like that >>>/g/aicg

Anonymous 01/22/25(Wed)18:55:45 No.104000760

CHUCKLE

SHE CHUCKLES

A LOW CHUCKLE

A MISCHIEVOUS CHUCKLE

Anonymous 01/22/25(Wed)18:56:30 No.104000769

>>104000727
here you go

Anonymous 01/22/25(Wed)18:58:32 No.104000799

>>104000755
He's not wrong. Back with LLaMA1 the community still made big advances on its own. Finetunes mattered. You had people create tunes to extend the context alongside RoPE. You had people make models compatible with CoT almost two years before OpenAI thought of it.
The open spirit was still alive but it all disappeared. Right now the local community is worthless.

Anonymous 01/22/25(Wed)19:00:16 No.104000815

guess nothings been happening and mistral/nemo is still the king

Anonymous 01/22/25(Wed)19:01:53 No.104000833

>>104000815
Are you serious? There's still nothing better below 70b?

Anonymous 01/22/25(Wed)19:02:54 No.104000846

Does anyone get a lot of "She... her..."ing?

Anonymous 01/22/25(Wed)19:03:09 No.104000851

R1 is amazing, it gave me a social commentary at the end of a smut scene about the slow downfall of girl. (it was an innocent to whore scenario)

Anonymous 01/22/25(Wed)19:03:43 No.104000858

>>104000851
Nemo does that too, anon.

Anonymous 01/22/25(Wed)19:05:39 No.104000880

>>104000858
But R1 is the new thing

Anonymous 01/22/25(Wed)19:08:26 No.104000913

>>103999946
yeah, definitely get a barebones server and then populate the memory and cpu slots. I did this awhile back when I needed cheap motherboards & cases. They all came without HDDs, memory, or CPUs, but CPUs are like $5/ea in bulk, HDDs are whatever, and memory is fairly cheap (probably $5/stick).

Anonymous 01/22/25(Wed)19:08:59 No.104000923

Newfag here, how much VRAM I would need to run R1?

Anonymous 01/22/25(Wed)19:09:57 No.104000932

>>104000923
Q2 needs around 250GB

Anonymous 01/22/25(Wed)19:10:40 No.104000940

>>104000923
300GB+
Hopefully people start working on backend improvements for moes to make them run at good speeds on ram alone.

Anonymous 01/22/25(Wed)19:11:32 No.104000954

>>104000833
its dead jim...

Anonymous 01/22/25(Wed)19:11:37 No.104000957

miku

Anonymous 01/22/25(Wed)19:12:40 No.104000974

>>104000923
I've seen people running it on their phones so it can't be that much

Anonymous 01/22/25(Wed)19:15:03 No.104001011

Has anybody compared ktransformers to vanilla llama.cpp

Anonymous 01/22/25(Wed)19:15:05 No.104001012

>>104000974
cant be that good then

Anonymous 01/22/25(Wed)19:15:59 No.104001019

>>104000974
Reee!!! 1.5B finetune IS NOT R1!!!

Anonymous 01/22/25(Wed)19:18:30 No.104001048

>>104001011
>what the fuck is ktransformers?
>check the github
the hell are those huge speed ups about?

Anonymous 01/22/25(Wed)19:19:54 No.104001056

I don't like how random R1 is, but I have to admit that it's very refreshing to see characters being so explicit.

Anonymous 01/22/25(Wed)19:20:46 No.104001064

>>104000769
Thank god my job is still safe.

Anonymous 01/22/25(Wed)19:21:32 No.104001069

>>104001048
Right?
It's specialized, but with all the talk of MoE's lately, I'd think people would be talking more bout it.
It even supports DS v3 I think

Anonymous 01/22/25(Wed)19:22:15 No.104001086

>>104001064
Outputting ascii miku won't get automated anytime soon

Anonymous 01/22/25(Wed)19:22:21 No.104001088

>>104001056
>That fucking dialogue
What the fuck lmao? Calling this "random" is one way to put it

Anonymous 01/22/25(Wed)19:23:13 No.104001092

>>104001056
Does it do that even with the lowest temp?

Anonymous 01/22/25(Wed)19:24:53 No.104001111

>>104001092
That's the API, I can't change temperature

Anonymous 01/22/25(Wed)19:24:59 No.104001115

>>104001069
>August been busy with updates
>Nothing since but a 2month old pull
The speed itself is weird as is, the lack of updating is a different matter however. Granted, with "potential speed" like this who cares, but it's still a bit odd.

Anonymous 01/22/25(Wed)19:28:18 No.104001150

>>104001056
What fucking card is that?

Anonymous 01/22/25(Wed)19:29:45 No.104001166

Not only do I want temp control I want separate temp controls for CoT and output

Anonymous 01/22/25(Wed)19:32:53 No.104001194

>>104001056
Which fucking R1 is that?

Anonymous 01/22/25(Wed)19:36:31 No.104001227

>>104001194
>what R1?
There is only one. Call the others what they are, qwen or llama finetunes

Anonymous 01/22/25(Wed)19:39:37 No.104001254

Anonymous 01/22/25(Wed)19:41:05 No.104001265

>>104001150
Darkhan's Maya
>>104001194
R1 (the API one)

Anonymous 01/22/25(Wed)19:43:27 No.104001292

>>104001265
that isn't a local model and therefor should not be discussed in this thread

Anonymous 01/22/25(Wed)19:44:27 No.104001303

>>104000577
>any interesting 24GB model releases as of late?
DeepSeek R1 Distill Qwen 32b @ Q4_K_M

Anonymous 01/22/25(Wed)19:45:24 No.104001310

Can confirm, true R1 api does indeed happily provide song lyrics.

Anonymous 01/22/25(Wed)19:46:03 No.104001317

>>104001115
>The speed itself is weird as is
When you consider how MoEs are different from dense models, and how these differences can be targets for specific optimizations, it makes starts making more sense.

Anonymous 01/22/25(Wed)19:46:35 No.104001322

I was here to see China save the AI world

Anonymous 01/22/25(Wed)19:47:15 No.104001327

>>104001254
You live?

>>104001322
Hey, the race is on.
2025 will be one hell of a year.

Anonymous 01/22/25(Wed)19:48:15 No.104001343

>>104001254
based
>>104001303
R1 sounds like the latest meme going by the posts I see, but who knows.
>>104001317
Is this about special MoE optimizations then? I only skimped over the page like a retard, so please excuse that fuck up. Makes me curious what else can be gotten out of it, considering that so far MoE's main advantage was "fast, but big", now this stuff gets added to it.

Anonymous 01/22/25(Wed)19:48:38 No.104001346

>>104001292
Wait, did I say API? lol, no, I mean the 600B one, I'm using it locally on my company's server.

Anonymous 01/22/25(Wed)19:50:05 No.104001365

>>104001346
I believe you.

Anonymous 01/22/25(Wed)19:50:18 No.104001368

I haven't even bothered trying to look closer at the CoTs or trying to instruct it how to form them. It's that good.

Anonymous 01/22/25(Wed)19:51:36 No.104001377

>>104001346
Try lowering the temperature a bit then and see how it acts.

Anonymous 01/22/25(Wed)19:52:23 No.104001382

>>104001368
Imagine working for a boss that demands to see exactly how you think and tells you you're doing it wrong.

Anonymous 01/22/25(Wed)19:53:34 No.104001392

>>104001327
I shitpost all the time.

Anonymous 01/22/25(Wed)19:54:01 No.104001398

>>104001343
>Is this about special MoE optimizations then?
For the most part, yes. They also have some CPU specif optimizations, some ported from llamafile, if I'm not hallucinating.

Anonymous 01/22/25(Wed)19:54:45 No.104001407

>>104001343
>R1 sounds like the latest meme going by the posts I see, but who knows.
Those people are faggots. The R1 distill models are amazing. Make sure you're using the DeepSeek context and instruct templates, or the model won't properly 'think', as it should.

Anonymous 01/22/25(Wed)19:55:39 No.104001414

I already feel the honeymoon for distill 32B fading...

Anonymous 01/22/25(Wed)19:56:22 No.104001422

do i get 5090 now knowing shit's gonna continue to get worse (economically i mean), or do i wait a year and see what the ai landscape looks like then and risk having to pay 2x as much

Anonymous 01/22/25(Wed)19:56:34 No.104001423

Uhhh...
https://old.reddit.com/r/LocalLLaMA/comments/1i7o9xo/deepseek_r1s_open_source_version_differs_from_the/

Anonymous 01/22/25(Wed)19:56:44 No.104001426

>>104001414
What will you go back to? Surely a 32b has to be better than nemo.

Anonymous 01/22/25(Wed)19:56:46 No.104001427

>>104001398
>They also have some CPU specif optimizations
Iiiiiinteresting, that explains all of the mentions of RAM, not only 4090s. Was wondering why the fuck they kept going on about mixing a 4090 with 32 or 128GB RAM. Could be super interesting to test out, reminding me a bit of figuring out tensorRT accelerated ImgGen.
>>104001407
So there is build in templates for once, not some bullshit I need to find and set up myself first? That's a first.

Anonymous 01/22/25(Wed)19:58:20 No.104001443

>>104001426
What we do every night, Pinky, 2MW.

Anonymous 01/22/25(Wed)20:01:57 No.104001495

>>104001423
Are we sure that's not just because open source does not have a way to make it predict two tokens at once yet which somehow alters the way it thinks?

Anonymous 01/22/25(Wed)20:02:32 No.104001501

>>104001422
5090 is value for running local models no matter how you spin it.

Anonymous 01/22/25(Wed)20:03:11 No.104001510

>>104001495
No, the two tokens thing is only used for speculative decoding, it shouldn't change how the model replies.

Anonymous 01/22/25(Wed)20:03:37 No.104001516

>>104001495
Or they started using Zero for their API.

Anonymous 01/22/25(Wed)20:04:19 No.104001521

>>104001392
>That broken smile and the other fucked bits
Oddly fitting for pochi art. This just a broken lora, old model, or something else? Resolution points me towards an old model, but who knows or cares.
>>104001422
3billion tops and a good chunk of VRAM, it's very much worth it. Mind you that I say this as a 4090 owner that bought his at an early adopter 2200€, while the 5090s MSRP is 2400€.

Anonymous 01/22/25(Wed)20:04:53 No.104001527

>>104001423
>Testing methodology
>All tests were conducted with:
>Temperature: 0
>Top-P: 0.7
>Top-K: 50
Okay, this guy just doesn't know what he's talking about.

Anonymous 01/22/25(Wed)20:06:23 No.104001543

>>104001530
>>104001530
>>104001530

Anonymous 01/22/25(Wed)20:07:01 No.104001551

>>104001527
Not even a little.

Anonymous 01/22/25(Wed)20:09:34 No.104001570

>>104001427
>So there is build in templates for once
No, but don't worry, our Redditbros have us covered!

https://www.reddit.com/r/SillyTavernAI/comments/1hn4bua/deepseekv3/

Anonymous 01/22/25(Wed)20:09:36 No.104001571

>>104001527
Do we know what 'settings' the api model uses? (since you can't change them) You'd need to replicate those to get identical results right?

Anonymous 01/22/25(Wed)20:10:00 No.104001575

>>104001570
That works too, thanks!

Anonymous 01/22/25(Wed)20:10:44 No.104001582

https://videocardz.com/newz/nvidia-rtx-blackwell-gpu-with-96gb-gddr7-memory-and-512-bit-bus-spotted

new quadro equivalent has 96GB VRAM

I'm sure it'll cost at least $10k but damn

Anonymous 01/22/25(Wed)20:12:04 No.104001597

>>104001571
>Not Supported Parameters:temperature、top_p、presence_penalty、frequency_penalty、logprobs、top_logprobs.
https://api-docs.deepseek.com/guides/reasoning_model
all we know is that you can't override the parameters they set, if they set any of them at all

Anonymous 01/22/25(Wed)20:12:30 No.104001607

>>104001582
Curious how that mini ARM supercomputer thing will behave compared to this, considering the mini PC has 128 unified RAM for AI or whatever.

Anonymous 01/22/25(Wed)20:12:46 No.104001611

>>104001571
We know that the official API does not support samplers for R1. Doing testing at Temp 0 and comparing it to the API is completely meaningless.

Anonymous 01/22/25(Wed)20:13:02 No.104001614

>>104001571
They recommend temperature 0.6, so maybe that's the temperature they use?

Anonymous 01/22/25(Wed)20:13:10 No.104001615

>>104001607
I'll probably have a fraction of the compute.

Anonymous 01/22/25(Wed)20:14:07 No.104001622

>>104001615
Likely still faster than CPU only, so there is that, and for a fraction of the price of that quadro tier GPU+rest of the PC.

Anonymous 01/22/25(Wed)20:14:11 No.104001624

>>104001611
Not really?

Anonymous 01/22/25(Wed)20:15:35 No.104001639

I have 128gb of vram and the brain of a chimpanzee (I am retard), what model can I use to match chatgpt 4 or go even further beyond without getting censored when I ask it edgy things or tracked by the botnet with each thing I ask?

Also is there other cool shit I can do with AI? Can I have AI generate models for 3d printable objects for example? what about generating code beyond little snippets without signing up for a gay IBM contract? what weird and wonderful possibilities are open to me if only I push my 15amp home electric to the absolute maximum

>hehe spoonfeeding
>hehe install gentoo or something

Remember niggers I waited the entire 900ms just for you

Anonymous 01/22/25(Wed)20:15:54 No.104001643

>>104001527
I'd reserve my judgement regarding this for at least a few weeks anyways. Usually in the beginning, everything's utterly broken

Anonymous 01/22/25(Wed)20:16:03 No.104001645

>>104001622
Of course, but datacenters don't really care bout that.
For you and me, digits is going to be the go to, most likely, if that's what you are thinking.

Anonymous 01/22/25(Wed)20:16:43 No.104001653

>>104001639
Just buy a girlfriend and go away.

Anonymous 01/22/25(Wed)20:17:54 No.104001668

>>104001624
they probably don't let you set those parameters because they have their own carefully selected values

Anonymous 01/22/25(Wed)20:18:08 No.104001670

>>104001653
Its not very helpful when you cry anon, buying people is illegal now and we all have to make sacrifices because of it >: (

Anonymous 01/22/25(Wed)20:20:44 No.104001700

Not exactly local but hopefully one day, I've asked Google's Gemini latest model some questions to plan out the general plot for a TTRPG I'm game mastering, and honestly, it's is REALLY throughout in its methods and thoughts. It thought up of everything in a nice way, reflected well on how to best approach my request and what things I could do and should consider, and then outputted a lot of great ideas in my native tongue (I fed him all of my game's notes and ideas I had written down before in my native tongue).
I hope we can run something like this locally one day, and with as little censoring as possible. I wonder if writing will die out as a job one day when models stop being stochastic parrots and turn into something fully original.

Anonymous 01/22/25(Wed)20:22:41 No.104001714

It's too good

I can't deal with it

Anonymous 01/22/25(Wed)20:25:14 No.104001744

>>104001639
BUMP because 4chan is reddit now and you have to go to other imageboards for anything beyond
>muh os better than your os
>here why your programming language sux
>hehe he dond forged ur coding sox

Anonymous 01/22/25(Wed)20:30:01 No.104001808

>>104001744
Just look at the links on the OP and figure it out, man. It's genuinely not that hard. Out of all data science subjects, LLMs and their usage is probably the easiest because they're so intuitive when you get it.

Anonymous 01/22/25(Wed)20:31:18 No.104001818

>>104001645
>but datacenters
obviously, but that aint what I'm talking about
i'm purely curious about the difference between the two, to see if it would (in theory) be valuable to go for the mini PC, a quadro, or more of teh same old (sli 5090s). see it as a thought experiment, this isn't about "WHAT IS THE PRACTICAL USE :^)?"

Anonymous 01/22/25(Wed)20:31:47 No.104001825

>>104001808
I'm asking to be spoonfed anon, it doesn't answer the question unless you pretend its an aeroplane and fly it into my mouth, I can read it without ever asking anything here, but I have decided to grace you with my presence and ask a question, be happy

Anonymous 01/22/25(Wed)20:34:56 No.104001858

>>104001700
Even the non thinking variant can lay D&D sufficiently well as a game master, asking for skill checks, calculating attack rolls using code, etc.
I really hope we get these models open sourced eventually.

Anonymous 01/22/25(Wed)21:18:52 No.104002311

If I ask the LLM to generate for example 6 paragraphs, each paragraph would be shorter than if I had asked it to generate 4 paragraphs. Why is that and how do I circumvent this?

Anonymous 01/22/25(Wed)22:12:29 No.104002981

Page 10... We will be free...

Anonymous 01/22/25(Wed)22:14:27 No.104003001

Has there ever been a thread where the ugly face anon didn't samefag his own responses?

Anonymous 01/22/25(Wed)22:15:39 No.104003014

>>104002981
? been a new thread for hours >>104001530
>>104001530
>>104001530