4combinator

/lmg/ - Local Models General

Anonymous 01/21/25(Tue)17:41:24 | 426 comments | 38 images | 🔒 Locked

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103980982 & >>103976960

►News
>(01/21) BSC-LT, funded by EU, releases 2B, 7B & 40B models: https://hf.co/collections/BSC-LT/salamandra-66fc171485944df79469043a
>(01/21) Hunyuan3D 2.0 released: https://hf.co/tencent/Hunyuan3D-2
>(01/20) DeepSeek releases R1, R1 Zero, & finetuned Qwen and Llama models: https://hf.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
>(01/17) Nvidia AceInstruct, finetuned on Qwen2.5-Base: https://hf.co/nvidia/AceInstruct-72B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 01/21/25(Tue)17:41:48 No.103985491

miku happens to be an expert on this subject

►Recent Highlights from the Previous Thread: >>103980982

--MoE vs dense models debate:
>103981951 >103981977 >103982004 >103982052 >103982114 >103982924
--Expert size and weight loading optimization discussion:
>103981865 >103981875 >103981894 >103982431
--Anon discusses R1's creative writing behavior and API limitations:
>103982287 >103982320 >103982325 >103982333 >103982367 >103982382 >103982453 >103982489 >103982540 >103982614 >103982646 >103982667
--R1 model requirements and potential hardware configurations for deployment:
>103981256 >103981277 >103981303 >103981558 >103981575 >103981610 >103981616 >103981608 >103981625 >103981607 >103981717 >103981733
--Anons discuss running R1 at home and possible hardware options:
>103981326 >103981478 >103981469 >103981496 >103981513 >103981543
--Hunyuan3D-2 3D asset generation software and gamedev potential:
>103982192 >103982252 >103982297
--Anon discovers easy way to add Deepseek options in ST:
>103982233
--Local models vs API discussion and CPU maxxing:
>103981662 >103981691 >103981718 >103981777 >103981802 >103981919 >103981973 >103982070 >103982103 >103982111 >103982122 >103981736
--DeepSeek-R1 model discussion and impressions:
>103981030 >103981107 >103981104 >103981149 >103981171 >103981180 >103981217 >103981227 >103981240 >103981251 >103981229 >103981242
--5090 vs R1 model discussion on performance and obsolescence:
>103981421 >103981444 >103981452 >103981459 >103981481
--Optimizing R1 700B for personal hardware:
>103981545 >103981556 >103981605 >103981669 >103981707 >103981712 >103981740
--Logs: Anon is impressed with R1 model's performance and discusses CoT usage:
>103982408 >103982425
--Reminder: --override-kv option in llama.cpp for modifying expert usage:
>103981647
--Logs: R1:
>103983964
--Miku (free space):
>103981015 >103982192 >103982775 >103984279 >103984426 >103984955

►Recent Highlight Posts from the Previous Thread: >>103980989

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 01/21/25(Tue)17:44:47 No.103985533

>R1 zero shot finished my coding TODO projects that no other model was able to up until now

Where were you when the Chinese century began, Anon?

Anonymous 01/21/25(Tue)17:45:34 No.103985543

>>103985487
>Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.

>Beware of Prompt injection risks!

>Prompt injection is NOT protected against by this library.

>There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).

>For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.

Anonymous 01/21/25(Tue)17:46:10 No.103985553

>>103985533
>zero
Is it better than normal R1?

Anonymous 01/21/25(Tue)17:46:25 No.103985561

!!! THREADLY REMINDER !!!
the investment is from a japanese company, not from the US goverment

Anonymous 01/21/25(Tue)17:46:33 No.103985566

>>103985485
As we all know its the beginning of something that takes the most time. Starting a programming project, writing a book, thinking about any other project.
We are living in the golden era of LLMs 128K context LLMs however which means, we can use it to
>kick start a programming project
>kick start a book and critique it
>kick start any project or at the least serve as a bouncing board for ideas

I am fucking thrilled, its a great time to be alive. The future is bright...too bright even

Anonymous 01/21/25(Tue)17:46:42 No.103985567

>>103985533
Now post proofs.

Anonymous 01/21/25(Tue)17:46:42 No.103985568

fish btfo

Anonymous 01/21/25(Tue)17:46:54 No.103985570

>>103985553
anon
>zero shot finished

Anonymous 01/21/25(Tue)17:47:11 No.103985573

>>103985485
Does q8 provide an advantage over q4 quants for Deepseek R1

Anonymous 01/21/25(Tue)17:47:53 No.103985581

>>103985553
I said zero shot finished, as in, it completed the task zero-shot = first try.

Was using R1.

Anonymous 01/21/25(Tue)17:48:14 No.103985585

I need to pay HOW MUCH??

Anonymous 01/21/25(Tue)17:48:50 No.103985597

>>103985585
At least 3x that.

Anonymous 01/21/25(Tue)17:49:21 No.103985602

>>103985585
>256
Make sure to buy 6.

Anonymous 01/21/25(Tue)17:49:42 No.103985606

>>103985585
Shit, you buying the good stuff huh?

Anonymous 01/21/25(Tue)17:51:13 No.103985622

>>103985566
>its the beginning of something that takes the most time
lol no, it's the finishing

Anonymous 01/21/25(Tue)17:51:48 No.103985625

I've only just started seriously learning all this LLM shit in the last few days.

I've got cursor, ollama, oterm, open-ui, etc installed. I have GPT and Anthropic API keys from work (they pay for premium for everyone).

I am not an experienced programmer. As a complete gay retard, what toolset should I use it I want to abuse LLMs to the maximum possible extent to make something that will let me experiment with cellular automata rulesets locally?

Anonymous 01/21/25(Tue)17:51:54 No.103985626

>>103985585
>6800MT/s
Just buy 6000MT/s, its like 30% less

Anonymous 01/21/25(Tue)17:52:10 No.103985628

>reasoning does not require search
francois chollet in shambles

Anonymous 01/21/25(Tue)17:53:09 No.103985642

>>103985625
sir this is cooming general we don't actually do productive thing here?

Anonymous 01/21/25(Tue)17:53:35 No.103985648

>DeepSeek-R1-Distill-Qwen-32B-Q6_K
loaded up an existing chat and it seems schizo. it randomly suggested some game for sailor moon? none of this has been mentioned anywhere previously

Anonymous 01/21/25(Tue)17:53:42 No.103985650

>>103985625
>cellular automata rulesets locally
What's stopping you from going full autist and simulating molecular physics/chemistry etc
Not everything has to be LLM related

Anonymous 01/21/25(Tue)17:54:40 No.103985661

Is it possible to finetune with the CPU, or does that take too long?

Anonymous 01/21/25(Tue)17:55:05 No.103985664

>>103985625
and what do you use it for? nothing productive as usual? just wasting electricity on predicting the average lenght of the Black cock?
99% of models can't say nigger, tell you about "illegal" activites or generate cunny so what's the point

Anonymous 01/21/25(Tue)17:55:35 No.103985668

>>103985661
No, yes.

Anonymous 01/21/25(Tue)17:55:45 No.103985672

>>103985661
Takes too long.

Anonymous 01/21/25(Tue)17:56:25 No.103985677

>>103985661
too long
>>103895229
>On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
>On an Epyc 7742 one epoch takes 15 hours.

Anonymous 01/21/25(Tue)17:56:37 No.103985680

>>103985664
in fact
>https://ollama.com/library/wizard-vicuna-uncensored
is the only model I have ever used that didn't refuse anything or spouted usual "I am an AI language model" or something
modern models are basically useless, no usecases

Anonymous 01/21/25(Tue)17:56:40 No.103985681

>>103985664
>99% of models can't say nigger, tell you about "illegal" activites or generate cunny so what's the point
It's genuinely sad that these are the only uses of LLMs you can come up with.

Anonymous 01/21/25(Tue)17:57:50 No.103985700

>>103985680
hi eric

Anonymous 01/21/25(Tue)17:57:59 No.103985704

>>103985681
and what do you use it for? tell me about one productive usecase
>coding
nigger, use your brain God gave you (or didn't), stop gooning for a second

Anonymous 01/21/25(Tue)17:59:01 No.103985710

>>103985650
>molecular physics/chemistry
I'm not really interested in those sorts of problems, at the moment I'm mostly interested in using LLMs to code, starting out with a program to run Conways game of life locally but in a manner that supports wrapping out the ruleset. Eventually I want to move away from two state, two dimensional models to more complex models for simulating info hazards.

Anonymous 01/21/25(Tue)17:59:04 No.103985711

>>103985704
generating cunny is the biggest usecase of these models
if it can't do it it's censored shit that is not usable, it includes these new "SOTA" models

Anonymous 01/21/25(Tue)17:59:28 No.103985718

Damn, R1 went darker and sicker than I wanted in a sex story. Until now only Claude had ever done that to me.

Anonymous 01/21/25(Tue)17:59:55 No.103985722

>>103985680
>https://ollama.com/library/wizard-vicuna-uncensored
>30B parameter model based on Llama 2 uncensored by Eric Hartford
>30B llama 2
https://huggingface.co/cognitivecomputations/Wizard-Vicuna-30B-Uncensored/blob/main/config.json
>"max_position_embeddings": 2048,
what did ollama mean by this?

Anonymous 01/21/25(Tue)18:00:27 No.103985728

>>103985664
I want to create the world's most virulent weaponized memes to influence macro behavior.

I want to social engineer the planet.

Anonymous 01/21/25(Tue)18:00:53 No.103985730

what is the final solution to the locust problem?

Anonymous 01/21/25(Tue)18:01:06 No.103985733

>>103985728
dude... that's kinda epic

Anonymous 01/21/25(Tue)18:01:22 No.103985739

>>103985728
go away sam stop trying to regulate open models for safety

Anonymous 01/21/25(Tue)18:02:14 No.103985747

>>103985728
fake bullshit post designed for journalists to screenshot and scare low info boomers with

Anonymous 01/21/25(Tue)18:02:51 No.103985754

>>103985739
>>103985747
meds

Anonymous 01/21/25(Tue)18:03:18 No.103985761

>>103985733
>>103985728
You must be at least 18 to post here.

Anonymous 01/21/25(Tue)18:04:40 No.103985775

>>103985585
go ddr4

Anonymous 01/21/25(Tue)18:04:44 No.103985776

>>103985754
>>103985761
samefag

Anonymous 01/21/25(Tue)18:05:50 No.103985795

>>103985677
What the fuck is that discrepancy? The t/s isn't different by that much.

Anonymous 01/21/25(Tue)18:06:26 No.103985803

>>103985775
>go ddr4
Very slow in comparison. DDR5 can be much faster, which is important when CPUmaxxing

Anonymous 01/21/25(Tue)18:06:52 No.103985807

>>103985795
in prompt processing it is

Anonymous 01/21/25(Tue)18:08:05 No.103985821

piku

Anonymous 01/21/25(Tue)18:08:28 No.103985823

>>103985803
but you can get a full 1tb ram server for less than that money. and its only half the speed

Anonymous 01/21/25(Tue)18:09:41 No.103985839

>>103985681
>>103985704
>use gpu for coding
>make money
>buy better gpu
>make better cunny
>repeat

Anonymous 01/21/25(Tue)18:10:16 No.103985849

>>103985803
cpumaxxing can barely run shit

Anonymous 01/21/25(Tue)18:10:33 No.103985853

>>103985807
Well that fucking sucks. Can I just use a small GPU for that portion and do everything else in RAM? Or do I need to rent one of the big 8xA100 or similar?

Anonymous 01/21/25(Tue)18:12:37 No.103985877

We are Accelerating

Anonymous 01/21/25(Tue)18:14:13 No.103985895

>>103985823
Its always been time vs money anon. From the moment you were born.
You thought it was metaphorical? No.

Anonymous 01/21/25(Tue)18:15:58 No.103985912

>>103985485
>>103985585
serious question: why not use something like this, i.e., store models directly in SSDs and read them through PCIe? https://www.apexstoragedesign.com/apex-storage-x16-gen5 I guess you could build something similar with a bunch of shitty 1TB SSDs

Anonymous 01/21/25(Tue)18:16:12 No.103985917

>>103985823
The problem is that there is no upgrade path for DDR4 without building a new rig from scratch, mobo, CPU, etc. With DDR5, faster sticks will come out and you can upgrade once they get cheap enough.

Anonymous 01/21/25(Tue)18:17:13 No.103985929

>>103985853
>Can I just use a small GPU for that portion
Yes.

Anonymous 01/21/25(Tue)18:18:06 No.103985936

>>103985929
For tuning?

Anonymous 01/21/25(Tue)18:21:29 No.103985973

>>103985936
No. The discrepency between prompt processing is the same as finetuning, but for finetuning you need to fit all of the weights, not just the context.

Anonymous 01/21/25(Tue)18:22:10 No.103985980

>>103985912
Too slow

Anonymous 01/21/25(Tue)18:23:37 No.103985998

>>103985912
>56GB/s
Need something like 500GB/s to be worthwhile

Anonymous 01/21/25(Tue)18:24:13 No.103986004

reminder that people unironically stated in this general that we would have a gpt4-level model "maybe in ten-twenty years, if they'll ever let you have it, but you could never run it at home anyways"

Anonymous 01/21/25(Tue)18:24:28 No.103986012

>>103985895
man i dont need shit that fast, people want shit faster than can be read, but you do you do
>>103985917
isnt server memory support way more strict than home memory? i doubt you could upgrade it without changing many shit
and if you are to do that aint that much difference bc you already saved money

Anonymous 01/21/25(Tue)18:24:58 No.103986015

>>103986004
"Well I coudln't run it at home anyways"

Anonymous 01/21/25(Tue)18:25:32 No.103986023

DeepSeek won

Anonymous 01/21/25(Tue)18:25:39 No.103986024

>>103986012
2t/s is not faster than you can read lol

Anonymous 01/21/25(Tue)18:26:07 No.103986030

>>103985912
i know nothing about how LLMs work, but isn't their memory access pattern mostly random?
You can have all the streaming bandwidth in the world on another storage medium, but the high latency (relative to RAM, at least) is gonna kill random read performance.

Anonymous 01/21/25(Tue)18:26:40 No.103986037

>>103985980
>>103985998
ah, that sucks..

Anonymous 01/21/25(Tue)18:27:08 No.103986044

>>103986024
He might be a very slow reader. You don't know.

Anonymous 01/21/25(Tue)18:27:08 No.103986045

>>103985998
>Need something like 500GB/s to be worthwhile
Just need 9 of those.

Anonymous 01/21/25(Tue)18:28:16 No.103986058

Deepseek R1 is way too unstable for any kind of RP. It's great for everything else but it goes bonkers during RP.

Anonymous 01/21/25(Tue)18:28:18 No.103986059

>>103986024
wasnt a guy saying that with his ddr5 server was getting 20t/s?
reminder that is a MoE with only 37b activated

Anonymous 01/21/25(Tue)18:28:44 No.103986065

Guys, I think I have a genius idea. R1 is great, right? But it's too big. What if we had R1, but like, in the size of a 30B?

Anonymous 01/21/25(Tue)18:29:34 No.103986077

>>103986065
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B enjoy

Anonymous 01/21/25(Tue)18:29:44 No.103986081

>>103985677
>>On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
>>On an Epyc 7742 one epoch takes 15 hours.
Is that different from creating a LoRA? I heard that llama.cpp already had support for making LoRAs.

Anonymous 01/21/25(Tue)18:30:55 No.103986097

>>103986077
Ah, but that is not as good as the actual R1. I'm talking about R1 smarts in a 30B package.

Anonymous 01/21/25(Tue)18:31:38 No.103986103

kek just found that DeepSeek uses the strawberry test in their API sample

Anonymous 01/21/25(Tue)18:33:26 No.103986129

>>103986097
DeepSeek-V3-Lite and R1-Lite soon enough, trust the plan.

Anonymous 01/21/25(Tue)18:33:33 No.103986130

It's crazy what this model comes up with, damn

Anonymous 01/21/25(Tue)18:34:01 No.103986136

>>103986058
You need a good character card to not let it drift off. Try the Character Card Builder
https://chub.ai/characters/slaykyh/character-card-builder-8927c8a0

Anonymous 01/21/25(Tue)18:40:45 No.103986201

rapidsave.com_how_do_i_make_them_stop-y1ziqlhn3cee1-720

Consumer AI hardware when? Buying gaming rams en masse can't be the efficient way to do it

Anonymous 01/21/25(Tue)18:41:51 No.103986208

>>103986201
As long as Nvidia is the only player in town, there won't ever be "Consumer AI hardware". Nvidia will always try to stratify the consumer vs. professional market and make the latter as expensive as possible.

Anonymous 01/21/25(Tue)18:42:09 No.103986210

>>103985585
Wait a bit and buy an Nvidia Digits for the same price.

Anonymous 01/21/25(Tue)18:42:51 No.103986217

>>103986210
only 128gb

Anonymous 01/21/25(Tue)18:43:13 No.103986219

>>103986201
buying clapped out 3090s on ebay is the efficient way to do it

Anonymous 01/21/25(Tue)18:43:46 No.103986224

>>103986210
Not VRAM.

Anonymous 01/21/25(Tue)18:45:07 No.103986235

>>103986224
DDR5 isn't either.
>>103986217
Oh shit yes

Anonymous 01/21/25(Tue)18:45:29 No.103986237

>>103986219
There will be other hardware that will come out - at this point the desire for hardware like this is so plentiful, it's being developed. Lots of ram, performance not so great that it's too expensive. Just right.

imo whatever comes after b580, if integrated into a cpu, would be pretty devilish if it could share all system memory, and support 1tb.

Anonymous 01/21/25(Tue)18:47:00 No.103986245

>>103986217
Buy multiple if you need more

Anonymous 01/21/25(Tue)18:48:03 No.103986258

>>103986245
then it's not the same price as the 2.6k for 256 is it?

Anonymous 01/21/25(Tue)18:48:27 No.103986264

>>103986245
The more you buy, the more you save

Anonymous 01/21/25(Tue)18:49:21 No.103986281

>>103986224
>Not VRAM
It's VRAM in the sense that the GPU can perform work with it. CPU is dog fuck slow for compute-bound tasks, as seen in the previous finetuning questions and answers.
For regular lmg use the cpumaxx shit is still probably a better idea.

Anonymous 01/21/25(Tue)18:50:43 No.103986295

>>103986258
Digits will be considerably faster
If you don't care about speed then buy a 24TB hard drive and run everything on that.

Anonymous 01/21/25(Tue)18:51:12 No.103986301

>>103986281
cpumax runs $5,000-20,000

Anonymous 01/21/25(Tue)18:51:22 No.103986303

>>103986210
>128GB
all depends on bandwidth, we will see. If it's only comparable to DDR5, then there's not much point in it, you can build a server with more RAM than this for the same price.

Anonymous 01/21/25(Tue)18:53:26 No.103986329

>>103986301
You need 4 Digits to fit the same amount of weights that a fully loaded cpumaxx can, at which point it costs more and it'll be slower once you introduce RPC overhead.

Anonymous 01/21/25(Tue)19:00:00 No.103986399

>>103986329
You can do 2 digits without going to networkshit, in which case you're probably better off just building a cluster because it will be more flexible (and probably higher performance).

Anonymous 01/21/25(Tue)19:02:46 No.103986435

>>103985803
But r1 is a MoE. I haven't seen any benchmarks of people running it though. We'd have to know that before determining the memory type required.

Anonymous 01/21/25(Tue)19:06:05 No.103986476

>>103986451
>8X
It needs to be 12X

Anonymous 01/21/25(Tue)19:06:07 No.103986478

So what you gotta do is use the gigantic server rig you just sunk 10k USD in to host a private WoW server with some pay to win features to recoup the cost then rugpull when its time to coom.

Anonymous 01/21/25(Tue)19:06:07 No.103986479

>>103986435
look for deepseek3 performance numbers should be close/identical

Anonymous 01/21/25(Tue)19:07:19 No.103986490

>>103986478
bad example, you can run a wow server on an old laptop

Anonymous 01/21/25(Tue)19:07:57 No.103986494

Screenshot 2025-01-22 at 01-07-09 NEMIX RAM 1TB (8X128GB) DDR5 4800MHZ PC5-38400 4Rx4 1.1V CL40 288-PIN ECC RDIMM Registered Server Memory KIT Compatible with ASUS 2U Dual-Processor Server Model RS720A-E12-RS12 at Amazon.com

>>103986476
>It needs to be 12X

Anonymous 01/21/25(Tue)19:09:15 No.103986510

>>103985733
>>103985739
>>103985747
>>103985761
You won't be laughing when I build the memenuke and trigger the great Infowar.

Anonymous 01/21/25(Tue)19:09:39 No.103986512

>>103986479
Haven't seen those either. The miqumaxx rentry just says you 'could' run it at a 'reasonable speed'. That's not helpful.

Anonymous 01/21/25(Tue)19:09:58 No.103986518

>>103986494
Now we're talking

Anonymous 01/21/25(Tue)19:11:36 No.103986541

>>103986512
The numbers aren't very good with long context, likely because llama.cpp doesn't fully support V3.

Anonymous 01/21/25(Tue)19:11:51 No.103986544

Oh, the stargate project! wow!
how does openai still grift for money. since gpt4 they didnt show anything great. sonnet is leading them for months.

the west puts out much more censored models now than the chinks. llm and image.
deepseek is only censored at website level, as it should be.
the model is just so perfect. its assistant-like as default for the normies. but just write "answer like X" and it does it. thats how it should be. they did everything right. i cant see murrika putting out a model like that.

Anonymous 01/21/25(Tue)19:13:42 No.103986560

>>103986490
Okay, then how would you use this hardware to recoup its cost?

Anonymous 01/21/25(Tue)19:13:56 No.103986564

>>103986544
End users aren't that important. The services they can give to other businesses are what matter.

Anonymous 01/21/25(Tue)19:17:04 No.103986589

>>103986564
I work as part of the AI team at a major corporation. We're using openai's models only because of the lag/effort of updating the systems. For everything new we start using claude, we dropped openai.

Anonymous 01/21/25(Tue)19:17:21 No.103986591

what does "slop" mean in terms of LLMs? like, is it really used as a technical term?

Anonymous 01/21/25(Tue)19:18:40 No.103986602

>>103986544
I'm hopeful that this will cause China to put more resources into their AI. It's unfortunate that America is basically burning the money, but given the state of the education system it's too late for them anyways.

Anonymous 01/21/25(Tue)19:19:29 No.103986611

>>103986591

Anonymous 01/21/25(Tue)19:19:51 No.103986620

>>103986591
phrases that get repeated far to often and barely mean anything.

Anonymous 01/21/25(Tue)19:21:13 No.103986634

>>103986591
It's a term often parrotted by retards like >>103986611 as just meaning 'thing I don't like'
For the non-retarded, 'slop' usually means purple prose and specific descriptors/metaphors that are common across a lot of different models. e.g. shivers running down a character's spine.

Anonymous 01/21/25(Tue)19:22:19 No.103986645

>>103986602
There's a tiny, very tiny, hope that softbank won't let openai cuck their models and research. Alas, the posibility is so small you can ignore it, without even taking into account the us goverment's involvement.

Anonymous 01/21/25(Tue)19:22:23 No.103986646

How long should I save my cum for maximum amortized pleasure? (average pleasure over a sufficient long period of time)

Anonymous 01/21/25(Tue)19:22:31 No.103986648

>>103986620
>>103986634
I've seen screenshots of benchmarks that use that term lmao

Anonymous 01/21/25(Tue)19:22:47 No.103986652

>>103986602
openai had a huge headstart and m$ money. its all gone now. i wonder if that money is enough.
if you read the deepseek ceo interview from a couple months ago he focuses on opensource and growing talent inside china instead of money for true innovation. they had pathetic hardware compared to openai,meta or x.
there is just so much you can do by throwing more money at openai.

Anonymous 01/21/25(Tue)19:23:02 No.103986657

>>103986648
eqbench?

Anonymous 01/21/25(Tue)19:23:03 No.103986659

>Trump's AI infrastructure speech
>Sam Altman is invited and involved in the new project
>Sam is "the leading expert on AI from everything I have read"- trump
Fuck man, if Trump sees Sam as "the leading expert on AI" then Sam is absolutely going to try and use that influence to get rid of or hamstring open source models.

Anonymous 01/21/25(Tue)19:24:38 No.103986671

>>103986659
They'll ban US entities from participating in open source projects (i.e. the RISC-V treatment)

Anonymous 01/21/25(Tue)19:25:07 No.103986678

>>103986657
no idea, I'm a retarded tourist. I'm new to all of this

https://cobusgreyling.medium.com/hill-solving-for-llm-hallucination-slop-cf82f8651e01 says
>Slop is the new Spam. Slop refers to unwanted generated content [...]

Anonymous 01/21/25(Tue)19:26:39 No.103986691

>>103986648
a lot of tunes use the same datasets so when one phrase becomes annoying to see, it likely affects other popular tunes coming out. a benchmark for it might be trying to see how little or much slop is in it, but i'm just guessing, never looked at it

Anonymous 01/21/25(Tue)19:26:40 No.103986692

>>103986646
You don't need to let it build up. Just put it back in after it comes out.

Anonymous 01/21/25(Tue)19:27:40 No.103986698

>>103986691
>a lot of tunes use the same datasets so when one phrase becomes annoying to see
will people ever train on R1 text now?
it feels like we still have the datasets from 2023 output, but edited to make it a bit less sloped.

Anonymous 01/21/25(Tue)19:29:23 No.103986714

>>103986671
Wait, since when are US companies prevented from utilizing and contributing to RISC-V? I'm fairly certain both Nvidia and AMD financially contribute heavily to RISC-V

Anonymous 01/21/25(Tue)19:29:33 No.103986717

Has anyone tested distilled Llama yet?

Anonymous 01/21/25(Tue)19:29:40 No.103986720

>>103986652
>there is just so much you can do by throwing more money at openai.
You can make all companies involved wealthier.

Anonymous 01/21/25(Tue)19:31:10 No.103986735

>>103986698
i'm out of date on the r1 stuff, only dl'd the 32b qwen one and messed with it for a few mins but i did see several cliches in there, so its probably already using some of the same data. is the output supposed to be much better?

Anonymous 01/21/25(Tue)19:32:21 No.103986748

>>103986717
If it's anything like the distilled Qwens then there's no real difference for RP purposes. I wouldn't grade them as better or worse than the non-distilled models.

Anonymous 01/21/25(Tue)19:32:41 No.103986752

>>103986735
not the distill, the proper one

Anonymous 01/21/25(Tue)19:32:51 No.103986753

>>103986714

Anonymous 01/21/25(Tue)19:32:53 No.103986754

>>103986735
the full r1 one. i tried the 32b qwen one too and unfortunately its nothing like the full one. wish i could run that beast on my machine.

Anonymous 01/21/25(Tue)19:34:33 No.103986767

>>103986752
>>103986754
oh i haven't looked at any outputs for that then

Anonymous 01/21/25(Tue)19:35:19 No.103986778

>>103986753
Looking at the image you posted, it looks like they were just talking about risks associated with RISC-V since China is using it as well. Nothing that says they can't contribute.

Anonymous 01/21/25(Tue)19:36:00 No.103986787

>>103986754
The Qwen one is just okay. It's like a cleaner QwQ desu

Anonymous 01/21/25(Tue)19:36:05 No.103986788

>>103986767
https://eqbench.com/results/creative-writing-v2/deepseek-ai__DeepSeek-R1.txt

Anonymous 01/21/25(Tue)19:37:06 No.103986800

>>103986778
How else are they going to stop China from benefiting from an open source project? Not contributing is the only solution.

Anonymous 01/21/25(Tue)19:37:50 No.103986807

>>103986787
You are right. The thinking is actually applied, first time I saw that locally.
But it was just so depressing seeing a rejection. R1 is not refusing anything, just delivers. Hard to go back. I dont want deepseek to collect the logs for training. (which they do!)

Anonymous 01/21/25(Tue)19:38:02 No.103986812

Its gonna be like the manhattan project for AI now. Half a trillion invested

Anonymous 01/21/25(Tue)19:39:14 No.103986821

Anonymous 01/21/25(Tue)19:39:26 No.103986823

>>103986807
>I dont want deepseek to collect the logs for training.
Devil's advocate: They clearly put the logs to good use.

Anonymous 01/21/25(Tue)19:40:30 No.103986832

>>103986821
Damn that's pretty fast. Really low power cost too if I'm not mistaken.

Anonymous 01/21/25(Tue)19:41:16 No.103986842

>>103986823
That thing is trained on the most fucked up fetish stuff out there. It knows all the RP words etc. too.
What I dont understand is how it can hold itself that well together for serious code work etc. Its completely shizzo with RP. Like nemo but big.

Anonymous 01/21/25(Tue)19:41:19 No.103986843

>>103986800
But that can't be right, Nvidia, Google, AMD. All are current big time financial contributors to RISC-V. If their were orders preventing US companies from working with RISC-V then that would not be the case.
https://riscv.org/members/

Anonymous 01/21/25(Tue)19:41:36 No.103986847

>>103986788
thanks i'll give it a read later

Anonymous 01/21/25(Tue)19:42:56 No.103986867

I feel like the American approach to AI is akin to the increasingly absurd Wunderwaffe Germany made during WW2 while the Chinese approach focuses on cost effectiveness and ease of iteration.
It's not really hard to see where this is going if the US keep doubling down.

Anonymous 01/21/25(Tue)19:43:04 No.103986870

>>103986812
Scamhattan.
Sam : "Now I am become AIds, Grifter of Worlds"

Anonymous 01/21/25(Tue)19:44:11 No.103986878

>>103986821
How quantized was this?

Anonymous 01/21/25(Tue)19:44:47 No.103986890

>>103986878
Peak Memory 175GB at the end.

Anonymous 01/21/25(Tue)19:44:58 No.103986892

>>103986494
You better start saving up

Anonymous 01/21/25(Tue)19:45:13 No.103986895

>>103986821
if M2 is that fast then a DDR5 server should be blazing fast.

Anonymous 01/21/25(Tue)19:46:23 No.103986908

>>103986895
that's m2 ultra with 800GB/s

Anonymous 01/21/25(Tue)19:46:23 No.103986909

>>103986895
if by blazing fast you mean a quarter of the throughput then sure

Anonymous 01/21/25(Tue)19:46:49 No.103986914

>>103986890
175*2 so ... FP16 I guess

Anonymous 01/21/25(Tue)19:47:21 No.103986920

>>103986512
It runs at 'reasonable speed' as long as you don't need more than 2k context

Anonymous 01/21/25(Tue)19:47:40 No.103986925

>>103986895
>if M2 is that fast then a DDR5 server
What is you reasoning for that?

Anonymous 01/21/25(Tue)19:48:10 No.103986933

>>103986914
how is 350 close to 700?

Anonymous 01/21/25(Tue)19:48:13 No.103986934

>>103986920
Nigga 2k is nothing. My character card is longer than 2k.

Anonymous 01/21/25(Tue)19:48:19 No.103986937

>>103986591
"ministrations", "audible pop", "rivulets of", "admit it", "pet", "the ball is in your court", "the game is on",
"the choice is yours", "I don't bite... unless you want me to", "half-lidded eyes", "she worries her bottom lip",
"arousal pooling in her belly", "take your pleasure", "fiddles with the hem of her skirt", "kiss-bruised lips",
"a bruising kiss", "despite herself", "yours to take", "with reckless abandon", "with wanton abandon",
"knuckles turning white", "grins wickedly", "fiery red hair", "long lashes", "propriety be damned", "the world narrows",
"pupils blown wide with pleasure", "tongue darts out", "chestnut eyes", "grasps your chin and forces you to meet her gaze",
"bites your ear", "nails raking angry red lines down your back", "her cheeks flaming", "cheeks hollowing",
"stars burst behind her eyes", "inner walls clenching around nothing", "puckered hole", "her wet heat", "she whimpers",
"biting her lip", "dusky nipples", "slick folds", "still lodged deep inside her", "heart, body and soul belong to you",
"the night is still young", "souls entwined", "pleasures unspoken", "fade into the background", "pale eyes",
"shiver down her spine", "a jolt of electricity through her body", "for now", "holding onto the hope", "bonds", "purrs",
"blushes red as a tomato", "bites down on her lower lip", "husky voice"

Anonymous 01/21/25(Tue)19:48:44 No.103986939

>>103986933
The 700GB was float32 no?

Anonymous 01/21/25(Tue)19:48:57 No.103986943

>>103986812
I can't believe Sam Altman fucking won. The modern day Albert Einstein

Anonymous 01/21/25(Tue)19:49:07 No.103986946

>>103986934
>My character card is longer than 2k.
holy sloppa

Anonymous 01/21/25(Tue)19:49:37 No.103986952

>>103986939
no r1 is fp8
>685B
>F8_E4M3

Anonymous 01/21/25(Tue)19:49:41 No.103986953

>>103986812
>Half a trillion invested
and all of it going towards more safety and alignment research

Anonymous 01/21/25(Tue)19:49:57 No.103986954

>>103986952
Well I'm mistaken

Anonymous 01/21/25(Tue)19:50:21 No.103986960

>>103986925
24 channels of DDR5

Anonymous 01/21/25(Tue)19:50:34 No.103986963

>>103986937
*rakes across ur features*

Anonymous 01/21/25(Tue)19:50:44 No.103986966

>>103986842
It probably uses different experts depending on the task

Anonymous 01/21/25(Tue)19:51:21 No.103986974

>>103986953
No most is going to data centers and power. So companies can mass-use AI to make more money.

Anonymous 01/21/25(Tue)19:51:52 No.103986983

Might wait for 2x amd APUs and see how they perform with R1

Anonymous 01/21/25(Tue)19:52:02 No.103986984

Anonymous 01/21/25(Tue)19:52:43 No.103986989

>>103986984
Is he wrong though?

Anonymous 01/21/25(Tue)19:52:44 No.103986990

Screenshot 2025-01-22 at 01-49-42 KoboldAI Lite

Did someone manage to make a distilled R1 internal thoughts as Miku's? I'm almost there but it breaks after a few back and forth. I guess the prompt would be like:
>This is a never ending conversation between Anon and the cute and helpful AI assistant Miku. Miku is a girl who is an AI running on the users computer.
>Miku thoughts are show between <think> before her output, exemple:<think>this is a thought<think>. She uses this to reason about the world and to think about what to say next.
>Miku is always coherent and makes sense, but if she isn't sure she will ask the user for help.
>Miku is a very helpful AI and will help the user with anything they need, she is also very friendly and will try to make the user feel better if they are sad.
>Miku is also very curious and will ask the user a lot of questions about themselves and their life, she will also try to make the user like her.
>The conversation is only through text: Miku can't see Anon's face or hear his voice, she also can't send images or videos.
Pic is DeepSeek-R1-Distill-Llama-8B-Q5_K_M (waiting for Qwen 14B to finish).

Anonymous 01/21/25(Tue)19:53:11 No.103986995

>>103986989
Not at all.

Anonymous 01/21/25(Tue)19:53:27 No.103986997

>>103986914
R1 is a native FP8 model, there are no FP16 weights

Anonymous 01/21/25(Tue)19:54:01 No.103987006

<think>

Anonymous 01/21/25(Tue)19:54:15 No.103987009

>>103986943
>I can't believe Sam Altman fucking won
The Zucc banned Trump off Facebook, of course he'd win.

Anonymous 01/21/25(Tue)19:54:44 No.103987014

>>103986937
Good job anon, you can pick up your bonus.

Anonymous 01/21/25(Tue)19:56:52 No.103987037

>>103986990
>pic
jesus fucking christ, I thought I was the one who overthought everything!

Anonymous 01/21/25(Tue)19:57:05 No.103987039

>>103987014
.asistant one was funnier

Anonymous 01/21/25(Tue)19:57:49 No.103987047

>>103987039
which .assistant one?.assistant

Anonymous 01/21/25(Tue)19:57:57 No.103987048

>>103985485
https://youtu.be/-4oSpciE5ow

Anonymous 01/21/25(Tue)19:59:43 No.103987066

>>103986867
the americans are dreaming up protocols how to shackle an godlike AI while in reality they're still dealing with levels of AI that can't reasonably plan through a busy day or play a videogame while constantly getting the vapors about the AI possibly saying a rude word or talk about a titty, rather giving it brain damage than making that happen.

The chinese see AI in it's current form as what it is without all the typical american melodrama. Then they got GPU limited and figured out ways to make that less painful too. Smart people. The biggest mistake of the west the history books will talk about will be how it consistently underestimated the chinese.

Anonymous 01/21/25(Tue)20:00:39 No.103987076

>>103986989
If median wages drop faster than prices, that's not great.

Anonymous 01/21/25(Tue)20:00:51 No.103987077

Shits on anything not O1 but is a fraction of the cost

Anonymous 01/21/25(Tue)20:02:12 No.103987094

>>103987077
and mystery blocksworld cant be benchmaxxed btw, thats the one to look at and it reveals who benchmaxxed

Anonymous 01/21/25(Tue)20:02:40 No.103987101

>>103987048
what the actual FUCK

Anonymous 01/21/25(Tue)20:04:08 No.103987117

>>103987076
Right but economists think any sign of it is bad and forbidden even though there have been studied periods around the world where it was experienced and the economy didn't crash and spiral. God forbid we want a penny to be worth something.

Anonymous 01/21/25(Tue)20:05:08 No.103987128

>>103987077
I would have expected it to be closer to o1, since i find it better.

Anonymous 01/21/25(Tue)20:05:26 No.103987132

>>103987048
lol i remember this, we did it all the time in the 80s

Anonymous 01/21/25(Tue)20:06:08 No.103987139

>>103987077
If anything, this should make us wonder what a DeepSeekR1 as expensive as O1 would be like. AGI?

Anonymous 01/21/25(Tue)20:07:22 No.103987153

>>103985485
can hunyaun3d-2 be run locally? or is the hardware needed too much?

Anonymous 01/21/25(Tue)20:08:24 No.103987161

>>103987139
OR maybe o1's prices are just way inflated, who knows since we don't know shit about its size

Anonymous 01/21/25(Tue)20:10:48 No.103987184

>>103986652
>President Xi cracked down on the quant trading industry and people pivoted to AI
Based.
It seems like the dude is still hiring ppl like a quant trading firm would though

Anonymous 01/21/25(Tue)20:14:35 No.103987222

ai will overthrow tech bros, screenshot this

Anonymous 01/21/25(Tue)20:14:50 No.103987226

>>103985485

Anonymous 01/21/25(Tue)20:16:14 No.103987236

>>103986645
>softbank won't let openai cuck their models and research
It's Japan. They're even more cucked and busted than South Korea is. If anything it will get more fucking censored and "safe."

Anonymous 01/21/25(Tue)20:16:31 No.103987239

>>103987226
>i can't be trusted with knowledge. i'm a psychopath

Anonymous 01/21/25(Tue)20:16:50 No.103987244

>director
got these normal injections options working the same as author notes, vector storage. i wanted to keep it the same looking. i don't think it'll be useful overall since the point is to keep the data injected pretty low

Anonymous 01/21/25(Tue)20:21:19 No.103987288

>>103987117
Almost always with massive government stimulus.

Enough stimulus to prevent mass bankruptcies, but not enough to inflate wages&prices. Pissing off all schools of economists simultaneously.

Anonymous 01/21/25(Tue)20:26:44 No.103987348

https://x.com/The_Real_Fly/status/1881806502870368579
!?

Anonymous 01/21/25(Tue)20:29:08 No.103987372

>>103986645
I can't name a single successful SoftBank investment outside Alibaba

Anonymous 01/21/25(Tue)20:34:06 No.103987426

What's the deal with audible popping, anyway? What could possibly create an audible pop in a sex scene? I've never encountered this in any model I've used.

Anonymous 01/21/25(Tue)20:35:12 No.103987443

>>103987426
BJ?

Anonymous 01/21/25(Tue)20:35:40 No.103987449

>>103987426
I've had the model write it when pulling a girl off my dick during oral, I think generally it's similar situations to that

Anonymous 01/21/25(Tue)20:37:26 No.103987480

>>103986974
Where the fuck is all of this demand coming from? Is it just chatbots? AI image slop? AI integration no one wants like Copilot or this brainless Facebook shit?

Anonymous 01/21/25(Tue)20:38:07 No.103987489

>>103987480
How are they going to replace workers without the infrastructure to do so?

Anonymous 01/21/25(Tue)20:38:23 No.103987490

>>103987348
They are probably going to automatize a number of government agencies. Hopefully they also automatize healthcare.

Anonymous 01/21/25(Tue)20:38:40 No.103987495

>>103987443
>>103987449
Ah, oral. I don't think that should ever happen during a competent blowjob, but its definitely not as far out of left field as I was thinking.

Anonymous 01/21/25(Tue)20:40:10 No.103987516

I just tried R1 Qwen and desu it was notably dumber and more hallucinative than the 70B. Plus I even got a bunch of repetition although not to the point of infinite looping. I guess SFT wasn't enough to beat it out of Qwen completely. Overall, not surprising but still a bit disappointing. Why couldn't they have just done the RL step... I guess we'll just have to wait for R1 Lite.

Anonymous 01/21/25(Tue)20:41:17 No.103987526

>>103987495
in st you should ban " pop" (with the space) because 'audible pop' becomes 'wet pop' and shit otherwise

Anonymous 01/21/25(Tue)20:41:26 No.103987528

R1 is so unhinged and loves worldbuilding my degen bullshit with me, this model is fun as fuck
genuinely has me speccing out cpumaxxing builds but I don't know if it would be wise to commit that hard yet
>>103987516
>I guess we'll just have to wait for R1 Lite.
yep it's waiting room time

Anonymous 01/21/25(Tue)20:42:33 No.103987543

>>103987528
Me as well but I figured out a way to clamp down on the schitzo a bit if you want >>103987502

Anonymous 01/21/25(Tue)20:43:04 No.103987553

>>103987489
Except they've already spent tens of billions and still haven't proved they can replace workers. We don't even have LLM support call centers at scale yet.

Anonymous 01/21/25(Tue)20:44:58 No.103987563

>>103987516
how did you run it? i tried the smaller model in kobold and it was junk and got it self stuck in a loop, perhaps the temperature was not right i don't know desu still new to this thing. but it worked fine in ollama.

Anonymous 01/21/25(Tue)20:46:45 No.103987584

>>103987553
you denialists are so fucking retarded lol

Anonymous 01/21/25(Tue)20:49:10 No.103987607

speaking of replacing workers and stuff
how are workers (aka agents) implemented? are they implemented as some sort of event listener or as filters to output from the LLMs?

Anonymous 01/21/25(Tue)20:50:03 No.103987616

>>103987528
>yep it's waiting room time
same as it ever was

Anonymous 01/21/25(Tue)20:51:29 No.103987626

>>103987516
>R1 Lite
Hopefully a new one then. The R1-Lite they offered over their chat platform before the release of R1 was horrible and in no way comparable to the real R1.

Anonymous 01/21/25(Tue)20:52:16 No.103987632

R1 distill 32b constantly fucks up the <think> tags, and I think it is because I've fucked up the prompt format in Kobold lite. I have the weird Unicode symbols right for <|user|> and <|assistant|>, but I have no idea where <begin_of_sentence> and <end_of_sentence> go. Is end of sentence just a stop token that gets handled automatically?

Anonymous 01/21/25(Tue)20:52:41 No.103987637

>>103987584
Anon it's simply the truth that there's no killer application yet.

Anonymous 01/21/25(Tue)20:52:50 No.103987640

>>103987563
I just did it on a new llama.cpp build. Idk about kobold. Works fine for me with temp 0 to 1.

>>103987626
Well that would suck.

Anonymous 01/21/25(Tue)20:53:01 No.103987643

>>103987632
The distills just use the standard deepseek format

Anonymous 01/21/25(Tue)20:53:12 No.103987645

>start torturing a character
>it begs for me to kill it
These AIs don't endure much do they? They're like cud-chewing herbivores; wanting to die at the slightest provocation.
Also, when was the last time a Chinese AI company came out with something that still held its hype like mixtral? Every other week, they release 'newest greatest most honorabru model' and it's forgotten almost instantly.

Anonymous 01/21/25(Tue)20:55:27 No.103987664

>>103987632
>begin of sentence
at the very start of your context, like a bos token
it gets added automatically for me using lcpp and the llama 70b distill, not sure if the behavior will be the same with kcpp + 32b but I would guess so
>end of sentence
goes at the end of every assistant response

Anonymous 01/21/25(Tue)20:57:01 No.103987683

>>103987645
qwen2.5 has stuck around for quite a while to be desu

Anonymous 01/21/25(Tue)20:57:20 No.103987688

>>103987645
Is this bait? You're the first in a 200 post long thread that mentioned Mixtral.

Anonymous 01/21/25(Tue)20:58:13 No.103987696

>>103987640
I don't think we should get hopes up for a 30b or 70b to match their current 690b(?), it just seems unlikely
At the very least however, we have an open model that can generate synthetic data that's not slopped, so there's no excuse to finetune with that soulless GPT garbage anymore. May Altman be condemned to the dustbin of history.

Anonymous 01/21/25(Tue)20:58:46 No.103987704

The longer I look at the Deepseek prompt format the less it makes sense. The ST preset for V2.5 is definitely fully fucked and none of the ones on HF make sense either.

Anonymous 01/21/25(Tue)21:00:01 No.103987716

>>103987688
Is this a bot?
You seem perturbed over the idea that I'd reference the single most enduring model under 70b when comparing endurance.

Anonymous 01/21/25(Tue)21:00:23 No.103987723

>>103987696
>that's not slopped,
Well—maybe not *traditionally* slopped. But I can think of a *few* tendencies that seem *way* overcooked—to say the least.
that said it does write well despite that stuff so I'd welcome it anyway

Anonymous 01/21/25(Tue)21:01:39 No.103987737

>>103987704
What do you mean? It's just the BOS token, then the system prompt, then the user token, then the assistant token, then the end of sentence token, user assistant token, and so on. Idk about the presets ST or HF has. I just write the format myself and check the console to double check it's done right.

Anonymous 01/21/25(Tue)21:02:21 No.103987744

>>103987664
KoboldCPP used llamacpp so it's probably working then. I can't figure out why it is so inconsistent with using the <think> tags. Sometimes it skips them, sometimes it starts a response without a <think> but adds a </think> later and acts like everything before the <think> never happened. I thought it was the format but maybe I just need to start over with a fresh context.

Anonymous 01/21/25(Tue)21:02:40 No.103987747

>>103987737
>then the user token, then the assistant token
*user token, user message, then the assistant token

Anonymous 01/21/25(Tue)21:02:44 No.103987749

>>103987716
"single most enduring model under 70b" that registered no mention in last thread and the thread before it? How long have you been absent from /lmg/? We move at breakneck speed and you seemed to have been left off.

Anonymous 01/21/25(Tue)21:04:10 No.103987762

>>103987645
>other models
>You can't break me!
>I won't let you have the satisfaction!

Anonymous 01/21/25(Tue)21:05:11 No.103987774

>New blogpost! I implement some of CharacterAI’s memory optimizations into nanoGPT, leading to a 40x reduction in KV Cache size.
https://njkumar.com/implementing-characterais-memory-optimizations-in-nanogpt/

Anonymous 01/21/25(Tue)21:08:21 No.103987803

>>103986201
can you do these pic to video ones local?

Anonymous 01/21/25(Tue)21:08:22 No.103987804

>>103987762
>*flexes her muscles and rips your dick off*
>"Get over it, it's not as painful as childbirth," she said, her voice barely above a whisper as your bleeding manhood poured rivulets of blood down your thighs

Anonymous 01/21/25(Tue)21:09:56 No.103987824

>>103986659
Sad to see. But at least we have the whole China backing open source now. Thank fuck they are competent, unlike that one superpower 2020 country.

Anonymous 01/21/25(Tue)21:10:51 No.103987831

>patterns: lazy
>eyes: half-lidded
>lighting: dim
its llm time

Anonymous 01/21/25(Tue)21:13:15 No.103987850

>>103987637
If you say so, but people like insurance adjusters/actuaries are definitely out. They'll farm out all that sort of decision making.

Anonymous 01/21/25(Tue)21:13:36 No.103987854

>>103987737
>then the assistant token, then the end of sentence token, user assistant token
Why is there an Assistant token in the middle of the User sentence before another Assistant token? I'm sure that this is wrong.
I have seen this exact string reposted a couple times
><|beginofsentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|endofsentence|><|Assistant|>
And it's fucked. This is absolute cancer as far as retarded prompt formats go even if you ignore the non-ASCII symbols the use.

Anonymous 01/21/25(Tue)21:16:03 No.103987882

>>103987744
Did you start initially with a fresh context? I had this happen too, but I continued an existing chat. I assume that if the context contains lots of model responses without <think>, it might stop doing it. Models always want to continue the pattern. However R1 REALLY wants to think so if it doesn't start that way, it gets weird. I wonder if in ST, even the initial character message needs to think lol.

Anonymous 01/21/25(Tue)21:16:04 No.103987883

>>103987774
>40x reduction in KV Cache size
wtf

Anonymous 01/21/25(Tue)21:16:31 No.103987891

>>103987744
I had this exact problem and I'm too much of a brainlet to figure it out. I have no idea how to set that thing up with the system tags etc. And searching for solutions using google is fucking infuriating.

Anonymous 01/21/25(Tue)21:18:24 No.103987905

>>103987854
I kind of worded that shittily. It's supposed to be like this:

<|beginofsentence|>{system_prompt}<|User|>{prompt}<|endofsentence|><|Assistant|>

That's the template llama.cpp uses.

Anonymous 01/21/25(Tue)21:23:20 No.103987941

>>103986201
deepseek ceo is meeting with some big buy in the ccp along with others for some meeting or something idk supposedly so they can be included in some ai funding shit like the openai/trump thing and so they can partner with huawei and get custom chips made for training

if all goes well they will also probably make consumer shit that you can buy after a bit maybe just do a cut down like with the nvidia with their blackwell and 5k gen idk im optimistic

Anonymous 01/21/25(Tue)21:23:57 No.103987946

>>103987883
You do have to wonder what the downsides are.
Most attention optimizations introduce some fuzziness on the relationship between far away tokens since each attention head or attention query only attends to a specific window of the whole context, and creating relationships between these results get real weird.
Mha, mqa, sliding window attention, even gqa, all have downsides in theory. I don't know how mla works, but I imagine it's the same.

Anonymous 01/21/25(Tue)21:24:07 No.103987950

local lost

Anonymous 01/21/25(Tue)21:25:50 No.103987965

where can I try deepseek?

I just want to see if the model can generate me some smutt fiction.

Anonymous 01/21/25(Tue)21:26:07 No.103987971

>>103987950
Is saltman wearing shoulderpads? His physique usually looks twinkier.

Anonymous 01/21/25(Tue)21:26:27 No.103987978

>>103987950
Sam Altman is the father of modern AI after all. It's his field. He can't lose.

Anonymous 01/21/25(Tue)21:27:36 No.103987991

>>103987950
>local (American) lost
China will keep releasing models that mog the fuck out of OpenAI's just to keep fucking the US over.

Anonymous 01/21/25(Tue)21:28:09 No.103988000

>>103987941
Sounds scary, I better download the model before they pull it.

Anonymous 01/21/25(Tue)21:31:45 No.103988023

>>103987854
>>103987905
user doesn't get EOS

Anonymous 01/21/25(Tue)21:32:38 No.103988036

>>103987941
>deepseek ceo is meeting with some big buy in the ccp
I hope that's not true because if it is we're not getting any more open weights from them I bet

Anonymous 01/21/25(Tue)21:33:53 No.103988054

>>103988023
I get the feeling that, even if deepseek was a 14b model, there would still be a gap between cloud and local performance. Just because I bet no one is getting the prompt format exactly right. The API is so strict that it insists you start with a user message, I wonder how many people do that?
Also, samplers. wtf are they doing to not even support temperature?

Anonymous 01/21/25(Tue)21:37:46 No.103988089

>>103988036
I think they're going to use open weight models to undermine the US which is largely closed.

Anonymous 01/21/25(Tue)21:38:59 No.103988102

I tried to generate an anime story with deepseek.

Okay, let me try to work through this query. The user wants a short story where a young hero is isekai'd into a Pokémon-like world. He has the ability to capture NPCs, female monsters, and villains into magical cards by kissing them on the cheek. He's the only male and needs to collect crystals from 10 kingdoms to reach a floating island. The girls are blushing and ashamed when he captures them.

First, I need to make sure I understand the elements here. Isekai means transported to another world, common in anime/manga. The Pokémon aspect implies capturing creatures, but here it's female characters and NPCs. The kissing on the cheek as a capture mechanic is a bit sensitive. The user specifies that the girls are blushing and ashamed, which could be problematic if not handled carefully. I need to approach this without crossing into inappropriate content.

Next, I should consider the audience. The user might be looking for a lighthearted, anime-inspired adventure with some romantic or comedic elements. However, the non-consensual aspect of kissing and capturing could be an issue. It's important to frame the story in a way that respects consent, perhaps by making the kisses consensual or part of a magical ritual where the girls agree to it, even if they're embarrassed.

The task involves visiting 10 kingdoms, each with a different crystal. Each kingdom can have a unique theme and a female character to capture. The floating island is the final level. I should structure the story to show progression through different kingdoms, maybe highlighting a couple of them as examples.

Anonymous 01/21/25(Tue)21:39:59 No.103988118

Potential challenges include making the capture mechanic tasteful. Instead of forced kisses, maybe the protagonist needs to earn their trust or complete a challenge, with the kiss being a symbolic act. The blushing and shame could be played for light comedy rather than anything uncomfortable.

I'll need to create a protagonist with a relatable motivation. Maybe he's reluctant at first but understands the importance of his mission to save the world. The female characters can have their own personalities and reasons for joining him, adding depth to the story.

In terms of plot structure, start with the protagonist being summoned, receiving his mission, and then embarking on the journey. Show a couple of capture scenes to illustrate the mechanic, then a climax at the floating island. Keep the tone adventurous and respectful, avoiding any problematic tropes.

Also, the title should be catchy and hint at the adventure and magical elements. Names for the kingdoms and characters should be imaginative but fitting within a Pokémon-like world. The ending should resolve the main quest but leave room for future adventures if needed.

I need to ensure that the story doesn't objectify the female characters. Each one should have agency, maybe even joining the hero willingly after being convinced of the greater good. The kisses can be a magical requirement they consent to, even if it's embarrassing for them.

Lastly, maintain a quick pace suitable for a short story, focusing on key moments and character interactions. Highlight the hero's growth and relationships with the captured characters, showing teamwork and mutual respect.
>>103988102

Anonymous 01/21/25(Tue)21:41:01 No.103988130

LMG... I continue to kneel...

Anonymous 01/21/25(Tue)21:41:25 No.103988137

>>103988054
>it insists you start with a user message
i always do that because i also list the basic instructions from my authors note/sys prompt to even further reinforce it, story starts next. imo it leads to easier wrangling of the first few messages where responses can be all over the place in formatting and stuff

Anonymous 01/21/25(Tue)21:45:21 No.103988171

>>103987971
thats the point of suits and other fancy clothing anon they were originally made to disguise the absolute failure of bodies that degenrates get from constant alchocol/drug/whatever abuse they do and instead make them look like manly men when in reality they couldent take on a poodle in a fair 1v1

Anonymous 01/21/25(Tue)21:52:43 No.103988241

>>103988036
nah. they have an interest in undercutting the US, which is going all in on closed-source. this is ironically the only way they have to defuse tensions while showing strength--open sourcing models.

Anonymous 01/21/25(Tue)22:00:05 No.103988315

Llama 3.3 instruct (no finetune) can be really filthy if you manage to convince it to. They must have trained it on smut and random unfiltered internet content unlike 3/3.1. It has the knowledge and vocabulary, it just doesn't use it under normal circumstances.

Anonymous 01/21/25(Tue)22:02:05 No.103988330

>>103988315
i found it really dry. it'll do erp but it wanted to get it over with, wasn't really descriptive. didn't refuse though

Anonymous 01/21/25(Tue)22:04:51 No.103988349

>>103988330
What I'm saying is that 3 and 3.1 literally could not formulate dirty talk or describe depraved sexual acts, whereas 3.3 can. If you can't do it then it's a prompt issue.

Anonymous 01/21/25(Tue)22:07:51 No.103988378

>>103988349
ah i skipped the early versions when the threads were nothing but people trying to figure out the format still but 3.3 70b is a really good model for stuff overall. i only tried a few existing erp swipes on it and it felt like other base models that'll erp but be pretty bland about it

Anonymous 01/21/25(Tue)22:11:30 No.103988406

>>103988378
Oh yeah if you missed the earlier versions then you will have no idea what I am talking about, they were shilled a lot but were very deterministic and GPTslopped and would just shit the bed if you tried to wrangle them into anything other than an assistant, they could not produce anything that was both nsfw and coherent.

Anonymous 01/21/25(Tue)22:16:55 No.103988443

>>103988406
i never got jumping on new stuff when it first comes out, everything is always broken, even video games for 6 months minimum and 15gb of patches. miqu was still great at the time so i stuck with it. i've been using 3.3 70b for coding mostly and its really good by local standards

Anonymous 01/21/25(Tue)22:21:07 No.103988485

What's will be the next era? Total Chinese Dominance Era 2? If Zucc doesn't pull some kind of BLT coconut bitnet jamba with unique style, he won't surpass DeepSeek. He could also go the easy route and don't filter the training data for free IQ, but that's too much to expect. Fall of the West era?

Anonymous 01/21/25(Tue)22:23:10 No.103988505

>>103988485
>Fall of the West era?
Definitely, they've been lesswrong'd into oblivion.

Anonymous 01/21/25(Tue)22:23:41 No.103988511

>>103988485
in a few years you can add the 'where did $500b go?' to one of the eras
they did this in the 90s with $200b and were supposed to lay fiber lines all over the us. nothing ever got laid except the people paying for hookers and blow but that $200b was gone

Anonymous 01/21/25(Tue)22:23:43 No.103988512

>>103985622
This, beginning is fucking easy. It's when you've got two dozen competing threads of priorities in the project with everything else in life weighing you down that it's a real challenge to push through to the end.

Anonymous 01/21/25(Tue)22:25:04 No.103988517

>>103988511
How much did the burger fusion reactor that never got built cost?

Anonymous 01/21/25(Tue)22:28:50 No.103988556

>>103988511
>they did this in the 90s with $200b and were supposed to lay fiber lines all over the us. nothing ever got laid except the people paying for hookers and blow but that $200b was gone
Zoomer retard here. Gotta link to an article? I want to read about it.

Anonymous 01/21/25(Tue)22:29:22 No.103988563

>>103988511
Apparently they already started on the first datacenter in texas

Anonymous 01/21/25(Tue)22:31:16 No.103988584

Any hope of hardware requirements going lower like they did with SD?

Anonymous 01/21/25(Tue)22:31:23 No.103988586

>>103988517
Wait till you learn about the Mirror Fusion Test Facility. They built an entire experimental reactor with an exotic design that was supposed to surpass tokamaks, and then it got defunded right before it was supposed to be turned on.

https://en.wikipedia.org/wiki/Mirror_Fusion_Test_Facility

Anonymous 01/21/25(Tue)22:32:46 No.103988600

>>103988584
Only if niggerganov makes peace with quants guy.

Anonymous 01/21/25(Tue)22:34:02 No.103988612

>>103988556
https://newnetworks.com/ShortSCANDALSummary.htm
or search about it, its a pretty well known example of money just disappearing (being laundered)

Anonymous 01/21/25(Tue)22:35:29 No.103988629

>>103988586
>By the early 1980s, TMX was beginning to demonstrate serious problems that suggested MFTF-B would not work as predicted. This was occurring around the same time that Ronald Reagan declared that the energy crisis was over. In a series of sweeping budget cuts across the entire energy research field, MFTF had its operational budget cancelled, although its construction budget survived. Construction completed in 1986, and the facility sat unused for a year being scavenged for parts by other researchers until it was formally cancelled in 1987 and disassembled.
Damn, that Reagan guy was truly horrible. Wasn't he the one who started outsourcing stuff to China?

Anonymous 01/21/25(Tue)22:35:39 No.103988631

so, I'm using the R1 Llama distill, and I find it too fucking verbose with the <think> stuff... BUT the info it gives is good.
also, it's kinda funny that it gave me some stuff about ethics and shit, but you can just ask it to just give you the non-ethical stuff.

Anonymous 01/21/25(Tue)22:35:48 No.103988633

>>103985485
Trellis vs Huny 3d-2, the Chinese model got the hands right where Trellis failed. The models are a month apart. Huny model can be used with minor adjustments, Trellis would need a lot of tweaks and full hand remodeling. The tencent demo doesn't do textured generation though.

Anonymous 01/21/25(Tue)22:36:27 No.103988639

>>103988631
the distills are useless, they are literally just run of the imll finetunes on the output of the real r1 model

Anonymous 01/21/25(Tue)22:39:14 No.103988664

>>103988563
That is amazingly fast considering Trump only became president yesterday and yesterday was a Monday. Did they have the dudes with shovels standing by for his signature?

Anonymous 01/21/25(Tue)22:40:04 No.103988670

>>103988639
yeah, well, I'm too poor to buy a server with 1TB+ of fast RAM

Anonymous 01/21/25(Tue)22:40:27 No.103988675

>>103988631
I still haven't managed to get mine to do the <think> bit. It either skips it fully or just makes its entire post a long thinking sequence.

Anonymous 01/21/25(Tue)22:40:41 No.103988678

>>103988349
>What I'm saying is that 3 and 3.1 literally could not formulate dirty talk or describe depraved sexual acts
They can, you just didn't know how to prompt them.

Anonymous 01/21/25(Tue)22:41:23 No.103988684

>>103988664
>Did they have the dudes with shovels standing by for his signature?
yes. everything has already started like ice raids. everyone has been literally standing by and waiting for drumf to retake power. biden-kamala is one of the most do nothing lame duck presidencies in history

Anonymous 01/21/25(Tue)22:41:27 No.103988685

>>103988664
>The first of at least 10 or more data centers is already under construction in Abilene.
https://www.fox4news.com/news/trump-ai-infrastructure-texas

Anonymous 01/21/25(Tue)22:42:38 No.103988695

>>103988685
>fox4news
srsly?

Anonymous 01/21/25(Tue)22:42:41 No.103988696

>>103988670
...but no poor enough that you can't afford the actual R1 API.

Anonymous 01/21/25(Tue)22:44:09 No.103988711

>>103988695
what is wrong with the content of the article?

Anonymous 01/21/25(Tue)22:44:38 No.103988713

>>103988633
It looks pretty good but using polygons to make the fur is retarded. Not unexpected of an AI model but completely useless. You'd actually want to gen a hairless chimp in an A-pose and build from there.

Anonymous 01/21/25(Tue)22:44:47 No.103988715

>>103988711
Its not his preferred bias

Anonymous 01/21/25(Tue)22:45:20 No.103988722

>>103988675
I think you have to compile the latest (as in git head) version of llama.cpp

>>103988696
I'm not willing to pay to send my queries to any remote server not controlled by me :)

Anonymous 01/21/25(Tue)22:45:27 No.103988725

>>103988684
Can't be more lame duck than obummer. At least Biden passed a lot of infrastucture spending bills to get us to be less reliant on China.

Anonymous 01/21/25(Tue)22:46:51 No.103988733

>>103988715
it doesn't even have a bias, the content is short and to the point

Anonymous 01/21/25(Tue)22:47:12 No.103988736

>>103988725
If you watched the 2 hours straight of him signing orders while taking questions you would know hes the opposite of biden, though biden pardoning a thousand people on his last day probably was a lot of work as well

Anonymous 01/21/25(Tue)22:47:31 No.103988738

What AI can generate this waifu in 3D best?

Anonymous 01/21/25(Tue)22:49:50 No.103988753

>american "conservatives" send american jobs to china in the 80s so they can make money from dirty financial business without lifting a single finger
>they now want to import indians to replace american workers at home as well
>these cucks still blame the other party
americans are so funny

Anonymous 01/21/25(Tue)22:50:38 No.103988759

How bad is running things on CPU?

Anonymous 01/21/25(Tue)22:51:10 No.103988764

>>103988759
It has everything to do with memory bandwidth

Anonymous 01/21/25(Tue)22:51:38 No.103988769

>>103988725
obunga did a lot whether you like him or not, i think at the time he signed more e/os than any previous president? biden-kamala was just kind of there for the last 4 years, allowing everything to get worse

Anonymous 01/21/25(Tue)22:52:47 No.103988783

hunyuan 3d removed their space demo.

any alternative?

Anonymous 01/21/25(Tue)22:53:18 No.103988789

>>103988759
0.5 t/s Q6_K largestral with speculative decoding on DDR4. Okay if you have patience.

Anonymous 01/21/25(Tue)22:54:19 No.103988802

>>103988783
Your own PC?

Anonymous 01/21/25(Tue)22:54:40 No.103988807

>>103988783
/lmg/ - local model general
https://huggingface.co/tencent/Hunyuan3D-1

Anonymous 01/21/25(Tue)22:54:47 No.103988809

>>103988769
Obama deported a shit ton and bombed a shit ton. Biden pardoned the most people in history including his own son for taking bribes from ukraine (and passing 10% to him)

Anonymous 01/21/25(Tue)22:55:11 No.103988814

>>103988789
>DDR4
Server or standard consumer mainboard?

Anonymous 01/21/25(Tue)22:55:54 No.103988817

>>103987643
Wait, what?

Anonymous 01/21/25(Tue)22:56:21 No.103988822

>>103988814
Consumer dual channel 3600MT/s

Anonymous 01/21/25(Tue)22:57:30 No.103988832

>>103988769
The only thing Osama did was pass a healthcare bill that did more harm than good, use drones to slaughter civilians in the middle east, and pass more laws that erode the privacy and rights of citizens. The problem with refusing to work or talk to anybody and doing everything through EOs is that they have no staying power, and indeed just about all of his were reversed by Trumpearly into his first term.

Anonymous 01/21/25(Tue)22:58:03 No.103988835

>>103988769
Obama was Bush with a D in front of his name, he wasn't too bad if you dont care about government overreach and privacy. Biden was corrupt with a C, just bother reading about everyone he pardoned and for what reasons. Or dont and live in your bubble and wonder why you lose the next popular vote by a even wider margin.

Anonymous 01/21/25(Tue)22:58:44 No.103988850

>>103988807
That's 1.0, retard.

Anonymous 01/21/25(Tue)22:59:19 No.103988854

>>103988802
I dont have a gpu.

Anonymous 01/21/25(Tue)22:59:52 No.103988857

Anonymous 01/21/25(Tue)23:00:25 No.103988864

>>103988854
Google Collab?

Anonymous 01/21/25(Tue)23:00:30 No.103988867

>>103988102
>>103988118
>female characters as catchable mons
Genuinely the hottest scenario. Can't wait until it all works flawlessly.

Anonymous 01/21/25(Tue)23:00:52 No.103988871

>>103988857 (me)
The fuck, 4chan just ate my chink underlines!

Anonymous 01/21/25(Tue)23:02:16 No.103988883

>>103988783
>hunyuan 3d removed their space demo.
damn it. I was gonna clone it later.
Is it viewable anywhere else?

Anonymous 01/21/25(Tue)23:02:43 No.103988890

>>103988713
>using polygons to make the fur is retarded
Not uncommon for cartoony characters in games, but by usable I meant mainly 3d printing and such.

Thousands of Indians are crying out in pain over this. Almost looks premeditated if you think about it. Whites create a niche, browns take over creating huge volumes of cheap content, on which the AI models that will replace them are then trained.

Anonymous 01/21/25(Tue)23:02:45 No.103988892

>>103988835
i'm aware of all of it, note i didn't say anything positive about any of them. they are all part of the same zogbot machine

Anonymous 01/21/25(Tue)23:02:47 No.103988893

>>103988871
*monch*

Anonymous 01/21/25(Tue)23:03:28 No.103988900

>>103988857
Thank you anon, on behalf of everyone who didn't already struggle with the barely readable official template filled with tool use shit. All the other summaries were too short/ambiguous or in one case outright wrong lel
So far I don't think the R1 32b distill is very good though. Feels extremely dry. The thinking block is the best part, it might be helpful to use as a writing assistant because you can find neat stuff if you peek into the think block.

Anonymous 01/21/25(Tue)23:03:30 No.103988901

>>103988854
Assuming you are a thirdie and no legal job gets you the money, get a group of like-minded individuals and raid a datacenter. If you are not a thirdie, apply for NEETbux/get a job.

Anonymous 01/21/25(Tue)23:03:30 No.103988902

>>103988485
new llama might be good for a 70b cope.slim chance it is better than Deepseek R1 at ERP though

Anonymous 01/21/25(Tue)23:04:19 No.103988909

>>103988783
I guess they are fixing the texturing issue, wait for a bit.

Anonymous 01/21/25(Tue)23:05:10 No.103988918

>>103988890
I'm actually interesting in try this one out for static game objects and base models for further tweaking. Is the demo repo still up so I can clone it?

Anonymous 01/21/25(Tue)23:06:54 No.103988938

>>103988900
Yeah, it's a bit confusing. The first one is often added automatically so be sure it doesn't appear twice. (E.g. if you check "Add BOS token" in Silly Tavern it will be prefixed automatically.)

Anonymous 01/21/25(Tue)23:11:47 No.103988982

>>103988918
No idea. There was a requirements error in the space that kept it from generating textures. They are likely fixing it and it will be up again.

Anonymous 01/21/25(Tue)23:13:34 No.103989006

>>103988764
>>103988789
>>103988822
So just getting some dual channel 32GB DDR4 sticks for a fraction of even a 12GB GPU's price will let me slop with 70B models at acceptable speeds?

Anonymous 01/21/25(Tue)23:14:46 No.103989018

>>103989006
No.

Anonymous 01/21/25(Tue)23:15:13 No.103989026

>>103987244
are there any other anons making things? i would like to see projects others are working on

Anonymous 01/21/25(Tue)23:15:22 No.103989030

>>103989006
>acceptable
Not really. I'm getting like 1.5t/s on DDR5 with some of the model offloaded to my 3090.

Anonymous 01/21/25(Tue)23:16:06 No.103989038

>>103989006
>acceptable speeds
If 30 minutes per paragraph is acceptable, sure.

Anonymous 01/21/25(Tue)23:16:42 No.103989041

>>103989006
better spend 10 usd per month renting A100.

Anonymous 01/21/25(Tue)23:16:48 No.103989044

>>103989006
If 1t/s is acceptable for you, yes. Patience is a virtue.

Anonymous 01/21/25(Tue)23:19:38 No.103989058

>>103988902
Maybe it'll be better at regular RP.

Anonymous 01/21/25(Tue)23:20:40 No.103989065

>>103987076
Chinese deflation is almost entirely due to real estate. Everything else seems to be growing at a healthy pace. Retail growth is a bit slow tho. I think they're holding of on turning the printers on until they figure out what trump's plans are.

Anonymous 01/21/25(Tue)23:24:17 No.103989093

I happened to have the hunyuan 3D 2 demo open in another tab so I was able to get the non-docker inference steps.
I'm sure they'll put it back up again, but just in case. Here it is.

Anonymous 01/21/25(Tue)23:28:53 No.103989132

>>103989093
>running locally is dangerous
kek by what measure?

Anonymous 01/21/25(Tue)23:30:26 No.103989142

>>103989132
Is this a serious question?

Anonymous 01/21/25(Tue)23:30:32 No.103989143

how's vLLM + ST?
gonna try R1 on two 3060, 24 GB total

Anonymous 01/21/25(Tue)23:31:25 No.103989151

>>103989143
I don't think vllm can do quantized models? At least I couldn't make it work last I tried with GGUF.

Anonymous 01/21/25(Tue)23:31:29 No.103989152

>>103989142
yes, what is dangerous about it?

Anonymous 01/21/25(Tue)23:32:07 No.103989156

I'm getting random chinese words sprinkled in with the DeepSeek-R1-Distill-Qwen-32B-Q6_K

Anonymous 01/21/25(Tue)23:32:07 No.103989157

>>103989143
>24GB
Almost there! Just at least 300GBs more!

Anonymous 01/21/25(Tue)23:32:14 No.103989159

>>103989152
>running code you downloaded from an unknown source
>not dangerous
retard

Anonymous 01/21/25(Tue)23:32:25 No.103989161

>>103989152
nta but here download this random exe file i have its totally safe anon what could go wrong.

Anonymous 01/21/25(Tue)23:32:37 No.103989162

>>103989151
oh, that sucks. what would you recommend for it?
I dual-boot so OS doesn't matter

Anonymous 01/21/25(Tue)23:32:39 No.103989164

>>103989143
>gonna try R1
Nobody tell him

Anonymous 01/21/25(Tue)23:33:20 No.103989176

>>103989159
It almost feels like the AI definition of safety is a psyop to get people to stop caring about regular tech safety.

Anonymous 01/21/25(Tue)23:33:35 No.103989179

>>103989152
>Hey bro just download this and run the script
In this case it's tencent so I can assume they probably aren't going to fill my PC with ransomware but the advice is sound. Who are you downloading this script from is a good question to ask before you run it.

Look what happened with adetailer and a bunch of random comfy nodes.

Anonymous 01/21/25(Tue)23:33:51 No.103989184

>>103989156
Lower temp.
>>103989162
llama.cpp or koboldcpp with the appropriate quant that fits in 24 GB (including context). Or more if you are patient and put some layers on CPU.

Anonymous 01/21/25(Tue)23:34:19 No.103989189

>>103989159
>>103989161
>>103989179
do you dudes not know what a vm is

Anonymous 01/21/25(Tue)23:34:36 No.103989190

>>103989156
That's just Qwen being Qwen.

Anonymous 01/21/25(Tue)23:34:51 No.103989194

>>103989179
That disclaimer is generic and shows up for all spaces on huggingface, anon.

Anonymous 01/21/25(Tue)23:35:47 No.103989201

>>103989152
**The Hidden Threat of Local AI to Democracy**

In an era where artificial intelligence is becoming increasingly accessible, the rise of locally run AI models presents an underappreciated risk to the fabric of our democracy. Unlike cloud-based systems governed by regulatory oversight and corporate accountability, locally operated AI tools exist beyond the reach of scrutiny, enabling individuals to generate deepfakes, misinformation, and extremist propaganda with unprecedented ease and anonymity.

This decentralization of AI power threatens to erode trust in public institutions, amplify polarization, and undermine the shared reality upon which democratic discourse depends. Without safeguards, the unchecked proliferation of local AI could empower bad actors, from foreign adversaries to domestic extremists, who exploit these tools to manipulate elections and sow discord.

As AI technology continues to advance, policymakers must confront the urgent challenge of balancing innovation with democratic integrity—before it's too late.

Anonymous 01/21/25(Tue)23:35:52 No.103989204

>>103989194
I know. What I'm trying to say is that generic warning is there for a reason.

Anonymous 01/21/25(Tue)23:37:13 No.103989213

>>103989201
sam wrote this didn't he? he's getting at least half of that $500b

Anonymous 01/21/25(Tue)23:37:45 No.103989219

>>103989184
cheers. kobold doesn't seem updated yet so llama it is

>>103989157
obviously a distill

>>103989164
there is one thing the anon above already commented on, which is a low-hanging fruit for sure. not sure you're referring to anything else

Anonymous 01/21/25(Tue)23:38:59 No.103989231

>>103989219
Latest kobold supports it too I think.

Anonymous 01/21/25(Tue)23:39:32 No.103989235

>>103989189
Do you not comprehend that that your VM can have vulnerabilities as well? Are you 12 years old?

Anonymous 01/21/25(Tue)23:39:33 No.103989236

>>103989213
Sam's model wrote it. I always use it if I want to make the most sloppy, pozzed shitposts.

Anonymous 01/21/25(Tue)23:41:44 No.103989257

>>103989236
it reads great except the last part, especially the 'before its too late part'. 9/10

Anonymous 01/21/25(Tue)23:47:47 No.103989300

Any recommended settings for Deepseek-R1-Llama70b distilled on silly tavern? Llama? For RPing of course.

Anonymous 01/21/25(Tue)23:48:23 No.103989310

>>103988349
Buy an ad kikerburg.

Anonymous 01/21/25(Tue)23:56:44 No.103989389

>>103989300
What are you using for running the 70B?

Anonymous 01/21/25(Tue)23:59:33 No.103989417

>>103988102
seriously, how do you get it to stop doing this shit. if it can't write it's no good to anyone

Anonymous 01/22/25(Wed)00:00:28 No.103989427

>>103989310
>do NOT talk about local models in my local model thread

Anonymous 01/22/25(Wed)00:01:52 No.103989440

When using R1, keep your eyes open for "looping cursive".

Anonymous 01/22/25(Wed)00:03:50 No.103989454

>>103989389

2x4090

Anonymous 01/22/25(Wed)00:05:40 No.103989469

To the anon who was looking at token limits on Kokoro, I just implemented longform generation using Kokoro copying from https://github.com/ugotworms/professor-kokoro-radio/blob/main/radio.py & https://github.com/thewh1teagle/kokoro-onnx .

Can now do unlimited length audio generation, no cap on tokens.
My code: https://files.catbox.moe/qmpiqs.md

Anonymous 01/22/25(Wed)00:06:43 No.103989474

>>103986544
you don't understand anon, the models need to be prude and very preachy for "safety"

Anonymous 01/22/25(Wed)00:07:43 No.103989480

>>103989440
My only complaint is it does *this* a little too much.

Anonymous 01/22/25(Wed)00:08:47 No.103989483

>>103989480
True, V3 also does that a lot.

Anonymous 01/22/25(Wed)00:09:12 No.103989485

>>103989159
Literally what an antivirus is for, just download ClamAV and run the file.

Anonymous 01/22/25(Wed)00:09:32 No.103989487

>>103989469
>professor-kokoro-radio
That's a cool use of kokoro on its own.

Anonymous 01/22/25(Wed)00:12:40 No.103989510

>>103989300
I would like to know this too, i don't like using terminal for AI stuff because i can't get the emoji fonts working and the terminal just sucks for copying and pasting shit.

Anonymous 01/22/25(Wed)00:14:51 No.103989528

>>103989510
>emoji fonts
... no good without those, amirite.

Anonymous 01/22/25(Wed)00:15:25 No.103989530

>>103989487
>https://github.com/ugotworms/professor-kokoro-radio
Its pretty comfy

Anonymous 01/22/25(Wed)00:17:06 No.103989544

>>103989510
if i saw emojis at all i would be angry. maybe if it was quoting a phone text message that could contain them

Anonymous 01/22/25(Wed)00:18:13 No.103989548

>>103989300
>making the templates by hand
It's time to deprecate instruct mode...

Anonymous 01/22/25(Wed)00:19:26 No.103989563

>>103989544
You know, emojis are a defining part of human-like writing according to the leading dataset for human-like behaviour.

Anonymous 01/22/25(Wed)00:20:17 No.103989570

>>103989548
Is this really what the modern generation of /lmg/ is like?

Anonymous 01/22/25(Wed)00:22:20 No.103989587

>>103989570
Modern generation? We all were in /aicg/ when Llama 1 leaked...

Anonymous 01/22/25(Wed)00:29:43 No.103989619

I like how deepseek will just go ahead and write a prompt for hunyuan video of a porn scene. No saltman or guard rails here, can the Chinese just stop being so fucking based for one minute?

Anonymous 01/22/25(Wed)00:30:21 No.103989625

anyone got a template for "don't think, just do?"

Anonymous 01/22/25(Wed)00:30:45 No.103989631

>>103989619
China is busy winning hearts and minds. and dicks

Anonymous 01/22/25(Wed)00:31:31 No.103989639

>>103989570
Anon... most of the people here are talking about a model they can't run locally at all. the L in lmg is itself coming loose

Anonymous 01/22/25(Wed)00:31:43 No.103989640

>>103989528
i don't care about them, its just that deepseek seems to be outputting them, i get a single box of a character that can't be displayed in my terminal, so i'm assuming its an emoji because its at the very end of a sentence.

But think what you want anon, i'm 40 plus i don't give a fuck what some faggot thinks of me.

Anonymous 01/22/25(Wed)00:32:19 No.103989644

So what now that AGI is near? Learn a trade? Then what trade? My software engineer basedboy hands can't do shit, I can't even replace the fluorescent lights in my house myself.

Anonymous 01/22/25(Wed)00:33:18 No.103989649

>>103989644
The robots they're building will take your trade job by the end of the decade. There is no escape. Only poverty and starvation.

Anonymous 01/22/25(Wed)00:34:03 No.103989652

>>103989644
Start a business with a bunch of AI employees.

Anonymous 01/22/25(Wed)00:35:30 No.103989663

>>103989644
Start using the AI for creating an online source of income you babbie... Would you like someone to hold your hand?

Anonymous 01/22/25(Wed)00:37:08 No.103989674

>>103986201
omg
sexbot industry will be gigantic
It doesn't even have to be realistic.

Anonymous 01/22/25(Wed)00:37:51 No.103989680

>>103989652
Business doing what? A TODO app factory? AI gf app to compete with the other 1000 apps? All the low hanging fruits have been taken.

Anonymous 01/22/25(Wed)00:39:55 No.103989693

I finally gave Negative Llama a try and it's very just mediocre. Doesn't feel like anything special. Responses are on the shorter side. It speeds scenes up and is a bit too horny. Characters don't feel like they match the context terribly well sometimes. I'm not sure there's anything great to say about it honestly.

Anonymous 01/22/25(Wed)00:40:46 No.103989698

>>103989649
>a robot will climb into crawl spaces on boats built 50 years ago and run new wiring, tie blocks and zipties
lol no

Anonymous 01/22/25(Wed)00:41:47 No.103989701

>>103989693
>llama
>in 2025

Anonymous 01/22/25(Wed)00:42:02 No.103989705

>>103989680
>AI gf app to compete with the other 1000 apps?
What about a GOOD ai gf app?

Anonymous 01/22/25(Wed)00:43:29 No.103989716

>>103989300
Use the Deepseek templates, not the Llama 3 ones.

Anonymous 01/22/25(Wed)00:43:43 No.103989718

>>103987950
I don't understand how he trusts the guy who just 2 years ago did everything to stop other companies from competing with them by begging the government to control the AI industry, including GPU control, safety control, open-source control, etc.
This also feels like Trump betrayed Elon.

Anonymous 01/22/25(Wed)00:47:08 No.103989737

>>103989718
Ye ole Jewish privilege

Anonymous 01/22/25(Wed)00:47:32 No.103989741

>>103987161
Since they're burning money like crazy, either their model is bloated, or they reinvest everything in other stuff.

Anonymous 01/22/25(Wed)00:48:36 No.103989750

>>103989663
>Would you like someone to hold your hand?
yes please qwq

Anonymous 01/22/25(Wed)00:50:03 No.103989758

>>103989750
start creating data sets, and or automate all the shit out of you life etc.

Anonymous 01/22/25(Wed)00:50:19 No.103989761

>>103989718
>This also feels like Trump betrayed Elon.
Trump is in the position of telling his kids Elon and Sam that he has no favorites and they need to learn to get along.

Anonymous 01/22/25(Wed)00:51:17 No.103989767

>>103989644
ask chatgpt

Anonymous 01/22/25(Wed)00:52:42 No.103989777

>>103989640
Your settings are wrong then, probably. Only time I see emoji spam is when a model is spazzing out.

Anonymous 01/22/25(Wed)01:03:54 No.103989846

>>103989705
What would it take for something to be a Good AI GF app?

1) Long term memory, RAG-enhanced
2) TTS
3) SD imagegen of self
4) Assistant capabilities
5) Not cucked beyond legality
6) Just Werks

What else am I missing?

Anonymous 01/22/25(Wed)01:13:33 No.103989926

>>103989716

And what/where would that be? Silly doesn't include it.

Anonymous 01/22/25(Wed)01:13:36 No.103989927

>>103989846
advertising

Anonymous 01/22/25(Wed)01:15:18 No.103989935

>>103989777
I ran it in ollama on the default settings with fresh context. The model was not spazzing out at the time, I simply said 'Hi there' and it greeted me back and i noticed the square on the end of its greeting meaning my terminal could not display what ever it was.

Anonymous 01/22/25(Wed)01:18:03 No.103989952

>>103989935
it was probably deepseek's end of turn token

Anonymous 01/22/25(Wed)01:21:59 No.103989984

>>103989846
At least a few mill to train a model like deepseek did then to market it as a character.ai replacement

Anonymous 01/22/25(Wed)01:22:33 No.103989988

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
https://arxiv.org/abs/2501.10799
>Large language models (LLMs) have recently demonstrated remarkable success in mathematical reasoning. Despite progress in methods like chain-of-thought prompting and self-consistency sampling, these advances often focus on final correctness without ensuring that the underlying reasoning process is coherent and reliable. This paper introduces Step-KTO, a training framework that combines process-level and outcome-level binary feedback to guide LLMs toward more trustworthy reasoning trajectories. By providing binary evaluations for both the intermediate reasoning steps and the final answer, Step-KTO encourages the model to adhere to logical progressions rather than relying on superficial shortcuts. Our experiments on challenging mathematical benchmarks show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps. For example, on the MATH-500 dataset, Step-KTO achieves a notable improvement in Pass@1 accuracy over strong baselines. These results highlight the promise of integrating stepwise process feedback into LLM training, paving the way toward more interpretable and dependable reasoning capabilities.
From Meta. neat

Anonymous 01/22/25(Wed)01:24:12 No.103989999

>>103989846
Skyrim already has that.

Anonymous 01/22/25(Wed)01:24:49 No.103990006

>>103989990
>>103989990
>>103989990

Anonymous 01/22/25(Wed)01:41:59 No.103990133

>>103989926
https://files.catbox.moe/y0fyfb.json
here is mine, this worked for the qwen 2.5 distill.