/lmg/ - Local Models General
Anonymous 01/14/25(Tue)04:26:03 | 450 comments | 44 images | 🔒 Locked
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>103881688 & >>103871751
►News
>(01/14) MiniCPM-o 2.6 released with multi-image and video understanding, realtime speech conversation, voice cloning, and multimodal live streaming: https://hf.co/openbmb/MiniCPM-o-2_6
>(01/08) Phi-4 weights released: https://hf.co/microsoft/phi-4
>(01/06) NVIDIA Project DIGITS announced, capable of running 200B models: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Previous threads: >>103881688 & >>103871751
►News
>(01/14) MiniCPM-o 2.6 released with multi-image and video understanding, realtime speech conversation, voice cloning, and multimodal live streaming: https://hf.co/openbmb/MiniCPM-o-2_6
>(01/08) Phi-4 weights released: https://hf.co/microsoft/phi-4
>(01/06) NVIDIA Project DIGITS announced, capable of running 200B models: https://nvidianews.nvidia.com/news/
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWeb
https://rentry.org/tldrhowtoquant
►Further Learning
https://rentry.org/machine-learning
https://rentry.org/llm-training
https://rentry.org/LocalModelsPaper
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/lea
Code Editing: https://aider.chat/docs/leaderboard
Context Length: https://github.com/hsiehjackson/RUL
Japanese: https://hf.co/datasets/lmg-anon/vnt
Censorbench: https://codeberg.org/jts2323/censor
GPUs: https://github.com/XiongjieDai/GPU-
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngl
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-M
Sampler Visualizer: https://artefact2.github.io/llm-sam
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-g
https://github.com/LostRuins/kobold
https://github.com/ggerganov/llama.
https://github.com/theroyallab/tabb
https://github.com/vllm-project/vll
Anonymous 01/14/25(Tue)04:26:33 No.103888594
►Recent Highlights from the Previous Thread: >>103881688
--Paper: SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training:
>103886784 >103886926
--Paper: Transformer^2: Self-adaptive LLMs:
>103886931 >103887964
--Papers:
>103886689 >103886794 >103887023
--Test-time compute storytelling and the role of model size in creative writing:
>103884801 >103884814 >103884855 >103884873 >103884914 >103885026 >103885107
--Relationship between model size and quantization sensitivity discussed:
>103881791 >103881850 >103881898 >103881964 >103882008 >103882183
--Discussion of DIGITS and PC building options, with a focus on memory bandwidth and performance:
>103883782 >103883785 >103883880 >103883904 >103883934 >103883963 >103883986 >103884107 >103884142 >103884150 >103884596 >103883999 >103884009 >103884136 >103883825
--Discussion of Mac and DIGITS systems, memory, and GPU capabilities:
>103882515 >103882577 >103882642 >103882740 >103882781 >103882884 >103883056 >103883116 >103883142 >103883315
--Discussion of AI models, GPU performance, and optimization strategies:
>103884724 >103884770 >103885024 >103886084 >103886381 >103886954 >103886979 >103886993 >103887024 >103887252 >103887019 >103887037 >103887132
--Speculation about Nvidia's mysterious repository on Hugging Face:
>103884597 >103884660 >103884687 >103884712 >103885202 >103884910
--FP8 vs Q8: data types, precision, and information loss:
>103883157 >103883204 >103883239 >103883245 >103883249
--Phi vs Llama for finetuning discussion:
>103884786 >103884861 >103884972 >103885004 >103885793
--Anon shares anonymous-chatbot response explaining lolilibaba concept:
>103883568 >103883604 >103884750
--UGI Leaderboard evaluates language models' ideological leaning and neutrality:
>103883290 >103883625
--Miku (free space):
>103883015 >103884150 >103884327 >103884720 >103886919 >103887221
►Recent Highlight Posts from the Previous Thread: >>103881693
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
--Paper: SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training:
>103886784 >103886926
--Paper: Transformer^2: Self-adaptive LLMs:
>103886931 >103887964
--Papers:
>103886689 >103886794 >103887023
--Test-time compute storytelling and the role of model size in creative writing:
>103884801 >103884814 >103884855 >103884873 >103884914 >103885026 >103885107
--Relationship between model size and quantization sensitivity discussed:
>103881791 >103881850 >103881898 >103881964 >103882008 >103882183
--Discussion of DIGITS and PC building options, with a focus on memory bandwidth and performance:
>103883782 >103883785 >103883880 >103883904 >103883934 >103883963 >103883986 >103884107 >103884142 >103884150 >103884596 >103883999 >103884009 >103884136 >103883825
--Discussion of Mac and DIGITS systems, memory, and GPU capabilities:
>103882515 >103882577 >103882642 >103882740 >103882781 >103882884 >103883056 >103883116 >103883142 >103883315
--Discussion of AI models, GPU performance, and optimization strategies:
>103884724 >103884770 >103885024 >103886084 >103886381 >103886954 >103886979 >103886993 >103887024 >103887252 >103887019 >103887037 >103887132
--Speculation about Nvidia's mysterious repository on Hugging Face:
>103884597 >103884660 >103884687 >103884712 >103885202 >103884910
--FP8 vs Q8: data types, precision, and information loss:
>103883157 >103883204 >103883239 >103883245 >103883249
--Phi vs Llama for finetuning discussion:
>103884786 >103884861 >103884972 >103885004 >103885793
--Anon shares anonymous-chatbot response explaining lolilibaba concept:
>103883568 >103883604 >103884750
--UGI Leaderboard evaluates language models' ideological leaning and neutrality:
>103883290 >103883625
--Miku (free space):
>103883015 >103884150 >103884327 >103884720 >103886919 >103887221
►Recent Highlight Posts from the Previous Thread: >>103881693
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous 01/14/25(Tue)04:29:48 No.103888618
>Let's try this "MiniCPM"
>1 hour later
>Still building flash attention wheel
Yes I installed ninja
>1 hour later
>Still building flash attention wheel
Yes I installed ninja
Anonymous 01/14/25(Tue)04:36:39 No.103888658
Tetolove
Anonymous 01/14/25(Tue)04:47:24 No.103888709
>>103888594
I love you, Recap Teto.
I love you, Recap Teto.
Anonymous 01/14/25(Tue)05:05:15 No.103888840
are any of these good enough to use with a frontend ai chatbot with reasonable speed? I have a 3090, I'm guessing most of you do something similar? I last tried llama3 and it was impressive for running locally and free but fairly bad compared to chatgpt and other online models
Anonymous 01/14/25(Tue)05:12:03 No.103888885
>>103888840
Local models are usually all caught up to SaaS in at least one area, but there aren't really any models that accel in all areas like a lot of the corpo models do. You gotta pick and choose a model for your niche. Llama 3 is kind of the exception there in that it's aggressively mediocre at everything
Local models are usually all caught up to SaaS in at least one area, but there aren't really any models that accel in all areas like a lot of the corpo models do. You gotta pick and choose a model for your niche. Llama 3 is kind of the exception there in that it's aggressively mediocre at everything
Anonymous 01/14/25(Tue)05:12:04 No.103888886
what would you use if you had two 3060 (12 GB each) and 32 GB VRAM? for either RP or other things
>inb4 jokes about how the rig is shit, I guess
>inb4 jokes about how the rig is shit, I guess
Anonymous 01/14/25(Tue)05:13:26 No.103888893
>>103888886
A Qwen 32b based model at Q5 or something probably. It's not the worst ending.
A Qwen 32b based model at Q5 or something probably. It's not the worst ending.
Anonymous 01/14/25(Tue)05:16:12 No.103888919
>>103888886
>and 32 GB VRAM
Nice 5090 + 2x 3060 setup.
On a more serious note, I'd go with Cydonia first. See if you like it.
>and 32 GB VRAM
Nice 5090 + 2x 3060 setup.
On a more serious note, I'd go with Cydonia first. See if you like it.
Anonymous 01/14/25(Tue)05:16:55 No.103888923
>>103888885
damn, almost neato digits & thnks for the info, Ill investigate more tomorrow. all the gooners in aicg dying over jailbreaks and proxies when lmg might be the answer
damn, almost neato digits & thnks for the info, Ill investigate more tomorrow. all the gooners in aicg dying over jailbreaks and proxies when lmg might be the answer
Anonymous 01/14/25(Tue)05:55:21 No.103889140
Are there models that can translate as in are those specialized or any model can? I'm not really finding anything atm(found something from 2 year ago though)
Anonymous 01/14/25(Tue)06:09:44 No.103889230
bitnet millions of experts 70b when?
Anonymous 01/14/25(Tue)06:11:22 No.103889242
>>103889221
By being attractive.
By being attractive.
Anonymous 01/14/25(Tue)06:12:50 No.103889251
Anonymous 01/14/25(Tue)06:22:17 No.103889317
>>103889251
*audibly pops the magic bubble*
*audibly pops the magic bubble*
Anonymous 01/14/25(Tue)06:24:04 No.103889333
>>103889317
GLGLLUGLLGLRHH
GLGLLUGLLGLRHH
Anonymous 01/14/25(Tue)06:31:51 No.103889378
>>103889333
Noooo :(
Noooo :(
Anonymous 01/14/25(Tue)06:40:41 No.103889448
>>103888886
Do you mean 32GB RAM? Or 56GB VRAM total?
Do you mean 32GB RAM? Or 56GB VRAM total?
Anonymous 01/14/25(Tue)06:46:59 No.103889484
>>103888893
examples?
examples?
Anonymous 01/14/25(Tue)06:47:46 No.103889489
>>103889140
Take a look at some popular models.
See what languages they have been trained on.
They should be able to pretty-much translate text between those languages.
Take a look at some popular models.
See what languages they have been trained on.
They should be able to pretty-much translate text between those languages.
Anonymous 01/14/25(Tue)07:11:02 No.103889655
>>103888893
>>103888919
Thank you. I'll try them both out.
>>103889448
Yeah, I have 32 additional VRAM that was soldered on by some dude in Shenzhen while I had noodles
>>103888919
Thank you. I'll try them both out.
>>103889448
Yeah, I have 32 additional VRAM that was soldered on by some dude in Shenzhen while I had noodles
Anonymous 01/14/25(Tue)07:13:05 No.103889676
>https://huggingface.co/openbmb/MiniCPM-o-2_6/tree/main
I hate the demo because I get nervous talking with female voices
I hate the demo because I get nervous talking with female voices
Anonymous 01/14/25(Tue)07:18:29 No.103889710
https://web.archive.org/web/20250114121236/https://www.theregister.com/2025/01/09/us_weighing_global_limits_ai_exports/
>"Along with compute caps on tier-2 nations, the rules may also include limits on the export of closed AI model weights. Model weights represent the numerical values that dictate how modern AI models function. Under the proposed rules, the Commerce Department aims to prevent companies from hosting closed model weights in tier-3 countries like China and Russia. Such a move would prevent major closed-source models from being served from these nations. Open models, like Meta's Llama 3.1 405B, would not be subject to these rules, nor would any closed model deemed less sophisticated than an existing open model."
>"nor would any closed model deemed less sophisticated than an existing open model."
What are they trying to do here?
>"Along with compute caps on tier-2 nations, the rules may also include limits on the export of closed AI model weights. Model weights represent the numerical values that dictate how modern AI models function. Under the proposed rules, the Commerce Department aims to prevent companies from hosting closed model weights in tier-3 countries like China and Russia. Such a move would prevent major closed-source models from being served from these nations. Open models, like Meta's Llama 3.1 405B, would not be subject to these rules, nor would any closed model deemed less sophisticated than an existing open model."
>"nor would any closed model deemed less sophisticated than an existing open model."
What are they trying to do here?
Anonymous 01/14/25(Tue)07:28:33 No.103889779
>>103889710
My guess is preventing potentially hostile countries from having privileged access over powerful AI models.
My guess is preventing potentially hostile countries from having privileged access over powerful AI models.
Anonymous 01/14/25(Tue)07:41:35 No.103889859
Titanpill me RIGHT NOW
https://arxiv.org/pdf/2501.00663v1
https://arxiv.org/pdf/2501.00663v1
Anonymous 01/14/25(Tue)07:43:24 No.103889870
Anonymous 01/14/25(Tue)08:00:13 No.103889960
>>103888589
https://youtu.be/OSKgz8NfUoI
https://youtu.be/OSKgz8NfUoI
Anonymous 01/14/25(Tue)08:01:41 No.103889965
>>103889859
I'm a retard when it comes to math formulas, but at least understand the terminology, and if I get it right, the basic idea is:
Models we use these days function best as an equivalent of short-term memory (which is why they get dumber with longer contexts). This architecture involves basically having a meta-model with additional context: a long-term memory that evaluates how "surprising" or "memorable" something is in its context, and feeds that data to the core model that operates on a smaller context (they keep using the phrase "learning to memorize at test time", but I see nothing to suggest any moving parts, so to speak). In other words, important details should be preserved from a much larger context (they claim it can reach 2M context), influencing the short-term memory, and in turn influencing the output.
It seems theoretically solid. Long-term memory is one of the aspects of LLMs that badly need a breakthrough, and this seems like a viable approach without requiring dynamic data on the user side.
I'm a retard when it comes to math formulas, but at least understand the terminology, and if I get it right, the basic idea is:
Models we use these days function best as an equivalent of short-term memory (which is why they get dumber with longer contexts). This architecture involves basically having a meta-model with additional context: a long-term memory that evaluates how "surprising" or "memorable" something is in its context, and feeds that data to the core model that operates on a smaller context (they keep using the phrase "learning to memorize at test time", but I see nothing to suggest any moving parts, so to speak). In other words, important details should be preserved from a much larger context (they claim it can reach 2M context), influencing the short-term memory, and in turn influencing the output.
It seems theoretically solid. Long-term memory is one of the aspects of LLMs that badly need a breakthrough, and this seems like a viable approach without requiring dynamic data on the user side.
Anonymous 01/14/25(Tue)08:06:50 No.103889997
Are non-dense models anti-local because they tend to require corporate-tier amounts of VRAM and substantially less compute, making them optimal for SaaS deployments?
Anonymous 01/14/25(Tue)08:09:08 No.103890016
Anonymous 01/14/25(Tue)08:09:09 No.103890018
>>103889859
Sorry, you said pill, not explain.
Well, if the results can be trusted, this basically cracks the problem of attention dilution over long contexts wide open. Finds the needle in the haystack near-perfectly. You know that important detail you mentioned exactly once in your RP some 10k tokens ago, that gradually got washed out until it was completely forgotten? This solves that problem.
Sorry, you said pill, not explain.
Well, if the results can be trusted, this basically cracks the problem of attention dilution over long contexts wide open. Finds the needle in the haystack near-perfectly. You know that important detail you mentioned exactly once in your RP some 10k tokens ago, that gradually got washed out until it was completely forgotten? This solves that problem.
Anonymous 01/14/25(Tue)08:09:23 No.103890019
>>103889960
cftf?
cftf?
Anonymous 01/14/25(Tue)08:10:37 No.103890023
>>103889997
Not necessarily, Mixtral used to be a VRAMlet friendly model because as long as it can fit in the RAM and the active parameters aren't too many, it can ran at decent speeds
Not necessarily, Mixtral used to be a VRAMlet friendly model because as long as it can fit in the RAM and the active parameters aren't too many, it can ran at decent speeds
Anonymous 01/14/25(Tue)08:13:03 No.103890038
>>103890018
tl;dr loredumpfags rejoice?
tl;dr loredumpfags rejoice?
Anonymous 01/14/25(Tue)08:13:52 No.103890043
Anonymous 01/14/25(Tue)08:18:57 No.103890079
>>103889997
There are ratios of total size to expert size where it is actually better for local since you can use your regular ram to get a more parameters at a small speed cost.
There are ratios of total size to expert size where it is actually better for local since you can use your regular ram to get a more parameters at a small speed cost.
Anonymous 01/14/25(Tue)08:19:40 No.103890085
I tried getting the the local demo for MiniCPM-o 2.6 working but there were too many problems after the other to make it work well.
Seems solid otherwise. Might make a fun bantz buddy when gaming or something.
Seems solid otherwise. Might make a fun bantz buddy when gaming or something.
Anonymous 01/14/25(Tue)08:20:46 No.103890093
>>103889859
just skimmed it. feels more than a meme paper this time. (he said)
just skimmed it. feels more than a meme paper this time. (he said)
Anonymous 01/14/25(Tue)08:21:19 No.103890096
>>103890019
go back
go back
Anonymous 01/14/25(Tue)08:21:37 No.103890100
Anonymous 01/14/25(Tue)08:22:10 No.103890105
Anonymous 01/14/25(Tue)08:27:22 No.103890151
I want a model or finetune that understands memes and culture.
Anonymous 01/14/25(Tue)08:29:00 No.103890163
>>103890151
Claude. Heard it can even talk like a zoomer if prompted.
Claude. Heard it can even talk like a zoomer if prompted.
Anonymous 01/14/25(Tue)08:33:00 No.103890199
gemini didn't like my strategy for tsunamis
Anonymous 01/14/25(Tue)08:37:00 No.103890220
>>103890199
>missing the meme and taking it completely literally, at face value instead
At last, artificial autism.
>missing the meme and taking it completely literally, at face value instead
At last, artificial autism.
Anonymous 01/14/25(Tue)08:47:54 No.103890295
They should make a RP leaderboard of sorts entirely based on Nala test. It is just such a good test for so many reasons and filter shite
Anonymous 01/14/25(Tue)08:51:38 No.103890322
>>103890295
It's actually a completely retarded meme though.
It's actually a completely retarded meme though.
Anonymous 01/14/25(Tue)08:53:08 No.103890329
>>103890295
I agree.
You evaluate the responses based on a couple of categories such as anatomical understanding, character adherence, etc.
Do 5 swipes for each model and take the average of the best 3 or something.
I agree.
You evaluate the responses based on a couple of categories such as anatomical understanding, character adherence, etc.
Do 5 swipes for each model and take the average of the best 3 or something.
Anonymous 01/14/25(Tue)09:12:31 No.103890472
>>103888589
cute teto
cute teto
Anonymous 01/14/25(Tue)09:12:48 No.103890474
>>103890163
A local model.
A local model.
Anonymous 01/14/25(Tue)09:17:50 No.103890518
>>103890474
DeepSeekV3
DeepSeekV3
Anonymous 01/14/25(Tue)09:24:21 No.103890566
llamiku 4 when
Anonymous 01/14/25(Tue)09:30:23 No.103890626
>>103888589
>MiniCPM-o
Has anyone got this working locally with it's functionality intact? I couldn't get it to work.
>MiniCPM-o
Has anyone got this working locally with it's functionality intact? I couldn't get it to work.
Anonymous 01/14/25(Tue)09:46:14 No.103890770
I'm going to miss lmg-anon...
Anonymous 01/14/25(Tue)09:50:56 No.103890815
>>103890770
He'll be back in 3-15 years, no worries. Just in time for llama4.3-70b
He'll be back in 3-15 years, no worries. Just in time for llama4.3-70b
Anonymous 01/14/25(Tue)09:55:55 No.103890872
>>103890770
So what did he do exactly? Say a woman should suck his dick cause his PC runs pytorch and a woman actually did it?
So what did he do exactly? Say a woman should suck his dick cause his PC runs pytorch and a woman actually did it?
Anonymous 01/14/25(Tue)10:00:10 No.103890911
>>103890770
Which tag do I use for the middle one's tit shape?
Which tag do I use for the middle one's tit shape?
Anonymous 01/14/25(Tue)10:01:05 No.103890923
Has anyone tried Sky-T1-32B-Preview?
Allegedly it is like qwq but less buggy and better at programming.
Allegedly it is like qwq but less buggy and better at programming.
Anonymous 01/14/25(Tue)10:01:43 No.103890930
Anonymous 01/14/25(Tue)10:08:08 No.103891015
>>103890911
bestiality
bestiality
Anonymous 01/14/25(Tue)10:09:02 No.103891025
>>103891015
I don't get the joke.
I don't get the joke.
Anonymous 01/14/25(Tue)10:09:38 No.103891030
>>103890923
it's a model fine-tuned on qwq outputs, i doubt it's any better than it
it's a model fine-tuned on qwq outputs, i doubt it's any better than it
Anonymous 01/14/25(Tue)10:16:13 No.103891090
>>103891030
It should be better but not much better.
It should be better but not much better.
Anonymous 01/14/25(Tue)10:19:46 No.103891132
I doubt thats this is even lmg-anon but that statement is not true right? That sounds crazy.
Anonymous 01/14/25(Tue)10:19:59 No.103891137
Updated Silly Tavern on single board computer to include maintanace items, and get it to monitor and re-connect to wifi if dropped.
https://rentry.org/SillyTavernOnSBC
https://rentry.org/SillyTavernOnSBC
Anonymous 01/14/25(Tue)10:20:56 No.103891147
>>103891132
Wrong pic, meant to post this.
Wrong pic, meant to post this.
Anonymous 01/14/25(Tue)10:22:41 No.103891170
Anonymous 01/14/25(Tue)10:26:28 No.103891200
Anonymous 01/14/25(Tue)10:30:15 No.103891235
>>103891147
WHO
WHO
Anonymous 01/14/25(Tue)10:30:38 No.103891241
>>103891147
Since you can get imprisoned for a few insults on LoL, I'm sure that's true.
Since you can get imprisoned for a few insults on LoL, I'm sure that's true.
Anonymous 01/14/25(Tue)10:30:40 No.103891242
>>103891235
some locust enabler
some locust enabler
Anonymous 01/14/25(Tue)10:32:49 No.103891264
>>103888589
Hi bros
Im a bit out of the loop on the memesamplers, can someone spoonfeed me a good value of smooth/dry or xtc for largestral?
normally i dont mess with them, but i remember smooth being kinda nice, and after doing a multicharacter card and the model giving VASTLY more attention to a character with a common name, i think i need to use them
Hi bros
Im a bit out of the loop on the memesamplers, can someone spoonfeed me a good value of smooth/dry or xtc for largestral?
normally i dont mess with them, but i remember smooth being kinda nice, and after doing a multicharacter card and the model giving VASTLY more attention to a character with a common name, i think i need to use them
Anonymous 01/14/25(Tue)10:33:46 No.103891273
Anonymous 01/14/25(Tue)10:38:55 No.103891318
Anonymous 01/14/25(Tue)10:40:16 No.103891329
>>103891147
The entirety of South Korea will disappear in 3 generations. He'll get the last laugh, might even be alive for it.
The entirety of South Korea will disappear in 3 generations. He'll get the last laugh, might even be alive for it.
Anonymous 01/14/25(Tue)10:40:27 No.103891331
>>103890220
No. Catching the meme is the autism in this scenario, anon. Imagine going up to some rando math professor and being like
>"ehehe... gotta go fast, bet"
He'll think you're retarded. Same with gemini. It's just not allowed to say it.
No. Catching the meme is the autism in this scenario, anon. Imagine going up to some rando math professor and being like
>"ehehe... gotta go fast, bet"
He'll think you're retarded. Same with gemini. It's just not allowed to say it.
Anonymous 01/14/25(Tue)10:40:49 No.103891333
Anonymous 01/14/25(Tue)10:41:09 No.103891341
>>103891273
Why is it so weird? They probably were accomplices to some degree for whatever he did.
Why is it so weird? They probably were accomplices to some degree for whatever he did.
Anonymous 01/14/25(Tue)10:42:07 No.103891353
Kill yourself.
Anonymous 01/14/25(Tue)10:43:20 No.103891367
>>103891333
is that kanken ni-kyu? Impressive.
is that kanken ni-kyu? Impressive.
Anonymous 01/14/25(Tue)10:48:28 No.103891423
>>103891333
im jealous
im jealous
Anonymous 01/14/25(Tue)10:49:31 No.103891433
t-thanks grok. cant wait to have that power locally soon.
Anonymous 01/14/25(Tue)10:52:58 No.103891478
>In South Korea, defamation laws are particularly stringent, as evidenced by the information provided in the related web results.
>Defamation can lead to criminal charges with potential imprisonment up to three years if the information is true, and up to seven years if it is false.
>This is highlighted in the context of South Korean cyber defamation law, where even true information that harms a person's reputation can be punishable.
>This legal framework is different from many Western countries where defamation typically results in civil rather than criminal liability.
>Specific Allegations: According to Air Katakana's subsequent posts, the person who reported him to the police is a professor at KAIST named ******* Lee.
>Air Katakana alleges that this professor forced him to send a large amount of money to his wife under the threat of revoking a job offer that had been promised over a year prior.
>This suggests that the defamation might be tied to these financial and employment-related disputes.
>imprisonment up to three years if the information is true
I seriously hope that's just hallucinated up. 3 years for posting true stuff somebody doesn't like. wow
>Defamation can lead to criminal charges with potential imprisonment up to three years if the information is true, and up to seven years if it is false.
>This is highlighted in the context of South Korean cyber defamation law, where even true information that harms a person's reputation can be punishable.
>This legal framework is different from many Western countries where defamation typically results in civil rather than criminal liability.
>Specific Allegations: According to Air Katakana's subsequent posts, the person who reported him to the police is a professor at KAIST named ******* Lee.
>Air Katakana alleges that this professor forced him to send a large amount of money to his wife under the threat of revoking a job offer that had been promised over a year prior.
>This suggests that the defamation might be tied to these financial and employment-related disputes.
>imprisonment up to three years if the information is true
I seriously hope that's just hallucinated up. 3 years for posting true stuff somebody doesn't like. wow
Anonymous 01/14/25(Tue)10:59:10 No.103891569
>>103891478
This is something air katakana wrote in one of his tweets, I think it was irony and the AI took it as a fact.
This is something air katakana wrote in one of his tweets, I think it was irony and the AI took it as a fact.
Anonymous 01/14/25(Tue)11:00:45 No.103891593
>>103891569
It's on wikipedia
>>103891478
I'd assume that's to discourage gossip and feuds about petty stuff, but it's ridiculously phrased.
It's on wikipedia
>>103891478
I'd assume that's to discourage gossip and feuds about petty stuff, but it's ridiculously phrased.
Anonymous 01/14/25(Tue)11:03:07 No.103891619
Anonymous 01/14/25(Tue)11:07:23 No.103891662
>>103890626
the vision model works ok for me
the vision model works ok for me
Anonymous 01/14/25(Tue)11:08:04 No.103891668
Anonymous 01/14/25(Tue)11:11:23 No.103891698
>>103890518
A local model I can run. I'm not a poorfag either. I have 64gb ddr5.
A local model I can run. I'm not a poorfag either. I have 64gb ddr5.
Anonymous 01/14/25(Tue)11:13:03 No.103891721
>>103891698
"understand memes and culture" is too broad. what do you want to do exactly?
"understand memes and culture" is too broad. what do you want to do exactly?
Anonymous 01/14/25(Tue)11:15:47 No.103891754
>>103891478
You have to understand that all of Korea was nothing by illiterate farmers just a couple generations ago. Unlike the Japanese and, at least the historically urban parts of, the Chinese, they don't really have a tradition of a stable and functioning modern civilization.
You have to understand that all of Korea was nothing by illiterate farmers just a couple generations ago. Unlike the Japanese and, at least the historically urban parts of, the Chinese, they don't really have a tradition of a stable and functioning modern civilization.
Anonymous 01/14/25(Tue)11:20:22 No.103891802
>>103891721
If you have to ask, your judgement won't be useful to me. I can tell based on the tone of your post that you're a pedantic shithead.
If you have to ask, your judgement won't be useful to me. I can tell based on the tone of your post that you're a pedantic shithead.
Anonymous 01/14/25(Tue)11:24:28 No.103891840
>>103891802
nta, but jesus fuck, anon...
>being pedantic about memes and "culture"
>I'm not a poorfag either.
>I have 64gb ddr5.
>pedantic shithead.
nta, but jesus fuck, anon...
>being pedantic about memes and "culture"
>I'm not a poorfag either.
>I have 64gb ddr5.
>pedantic shithead.
Anonymous 01/14/25(Tue)11:25:05 No.103891846
>>103891132
>I doubt thats this is even lmg-anon
he seems like a pretty cool guy. Does he actually hang out here? We could swap JLPT stories
>I doubt thats this is even lmg-anon
he seems like a pretty cool guy. Does he actually hang out here? We could swap JLPT stories
Anonymous 01/14/25(Tue)11:26:30 No.103891859
Anonymous 01/14/25(Tue)11:27:17 No.103891869
>>103891668
You're right.
You're right.
Anonymous 01/14/25(Tue)11:31:31 No.103891904
How exacly one start with this? I`m on linux.
From the rentry tutorial it says to download oobabooga, their github says to start_linux.sh.
Then I went to https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/
But couldn`t find a download link to any project. What after this?
From the rentry tutorial it says to download oobabooga, their github says to start_linux.sh.
Then I went to https://huggingface.co/spaces/open-
But couldn`t find a download link to any project. What after this?
Anonymous 01/14/25(Tue)11:34:32 No.103891932
>>103891904
By waiting 2 more weeks for better models.
By waiting 2 more weeks for better models.
Anonymous 01/14/25(Tue)11:34:34 No.103891933
>>103891904
copy-paste the name of the repo into the model download box in ooba. after that, load the model and done
copy-paste the name of the repo into the model download box in ooba. after that, load the model and done
Anonymous 01/14/25(Tue)11:34:47 No.103891935
>>103891802
are you a toddler that expects to be spoonfed with minimal communication (crying)?
are you a toddler that expects to be spoonfed with minimal communication (crying)?
Anonymous 01/14/25(Tue)11:44:44 No.103892037
>>103891904
>memeboard leaders these days are 78(!)B frankenmerges of Qwen-72B
I hadn't looked at that cesspool in a year. Good to know it hasn't changed since then after it got flooded by chinks and indians training meme models on benchmark data.
>memeboard leaders these days are 78(!)B frankenmerges of Qwen-72B
I hadn't looked at that cesspool in a year. Good to know it hasn't changed since then after it got flooded by chinks and indians training meme models on benchmark data.
Anonymous 01/14/25(Tue)11:49:07 No.103892080
>>103892037
we need better benchmarks
we need better benchmarks
Anonymous 01/14/25(Tue)12:04:55 No.103892241
how are speeds benchmarked anyway?
T/s just refers to output speed right.
Is there a factor x where processing input is faster as generating tokens or are these two unrelated. Also speeds seem to differ quite a bit by not only size but model and datatype.
T/s just refers to output speed right.
Is there a factor x where processing input is faster as generating tokens or are these two unrelated. Also speeds seem to differ quite a bit by not only size but model and datatype.
Anonymous 01/14/25(Tue)12:06:30 No.103892252
>>103892241
Most backends show you processing, generation and the total time each in t/s.
Most backends show you processing, generation and the total time each in t/s.
Anonymous 01/14/25(Tue)12:14:51 No.103892328
>>103892037
>>103892080
open-llm-leaderboard evaluates on non-CoT. Meaning, they don't let the model generate a full solution, and then search through it and extract the answer, but rather directly check the probability of the answer in the first few tokens. That's why qwen2.5-72b scores higher than qwen2.5-72b-instruct, even though instruct is a much better assistant (which this benchmark is trying to evaluate).
Someone correct me if I'm wrong.
>>103892080
open-llm-leaderboard evaluates on non-CoT. Meaning, they don't let the model generate a full solution, and then search through it and extract the answer, but rather directly check the probability of the answer in the first few tokens. That's why qwen2.5-72b scores higher than qwen2.5-72b-instruct, even though instruct is a much better assistant (which this benchmark is trying to evaluate).
Someone correct me if I'm wrong.
Anonymous 01/14/25(Tue)12:30:04 No.103892479
I'm following the "getting started" guide. It's telling me to "download nemo 12b instruct gguf". Where do I find this?
Anonymous 01/14/25(Tue)12:31:28 No.103892504
>>103892479
Nigga, this isn't spoonfeeding at this point, it's giving you knowledge in a fucking IV line.
Nigga, this isn't spoonfeeding at this point, it's giving you knowledge in a fucking IV line.
Anonymous 01/14/25(Tue)12:33:46 No.103892534
>>103886370
>deepseek repeats too much
using --chat-template deepseek3 with a recent llama.cpp has eliminated any repeating for me. I don't think I've seen it once.
The only other variable is that I self-quant, so I can't speak for any online ggufs if those are the core issue
>deepseek repeats too much
using --chat-template deepseek3 with a recent llama.cpp has eliminated any repeating for me. I don't think I've seen it once.
The only other variable is that I self-quant, so I can't speak for any online ggufs if those are the core issue
Anonymous 01/14/25(Tue)12:33:48 No.103892535
>>103892479
>the treasure map says X marks the spot, where do I find this so I can start digging?
>[Attached picture: big X on the ground]
>the treasure map says X marks the spot, where do I find this so I can start digging?
>[Attached picture: big X on the ground]
Anonymous 01/14/25(Tue)12:35:08 No.103892551
>>103892479
In the off chance you aren't trolling, have you tried writing "nemo 12b instruct gguf" in the search bar that's clearly visible in your image?
If not, try that.
You'll see a bunch of results, probably, download the bartowski one that has mistral in the name.
In the off chance you aren't trolling, have you tried writing "nemo 12b instruct gguf" in the search bar that's clearly visible in your image?
If not, try that.
You'll see a bunch of results, probably, download the bartowski one that has mistral in the name.
Anonymous 01/14/25(Tue)12:36:57 No.103892569
>>103892551
Doesn't "Vikhr" indicate that it's Russian?
Doesn't "Vikhr" indicate that it's Russian?
Anonymous 01/14/25(Tue)12:37:10 No.103892574
>>103892479
Use this https://ollama.com/
Use this https://ollama.com/
Anonymous 01/14/25(Tue)12:38:48 No.103892598
>>103892574
This is like giving someone a crackpipe.
This is like giving someone a crackpipe.
Anonymous 01/14/25(Tue)12:38:55 No.103892599
>>103892569
The one with mistral in the name anon.
The one with mistral in the name anon.
Anonymous 01/14/25(Tue)12:39:03 No.103892600
Anonymous 01/14/25(Tue)12:41:00 No.103892625
>>103892574
This is the one time recommending ollama is ok
This is the one time recommending ollama is ok
Anonymous 01/14/25(Tue)12:47:49 No.103892697
anyone ever use the cpu pinning feature (not --numa) in lcpp? it doesn't appear to obey the strict flag or pinning mask at all and my inference just gets slower, which doesn't make sense to me.
Anonymous 01/14/25(Tue)12:49:52 No.103892714
Anonymous 01/14/25(Tue)12:54:03 No.103892757
Did they cheat with Sky T1? Or is it actually comparable to o1?
Anonymous 01/14/25(Tue)12:55:46 No.103892779
Anonymous 01/14/25(Tue)12:57:34 No.103892808
>>103890518
Ok, now one that can do it for more than once message (since deepseek will come up with something good but then just repeat parts of it every message forever.)
Ok, now one that can do it for more than once message (since deepseek will come up with something good but then just repeat parts of it every message forever.)
Anonymous 01/14/25(Tue)12:58:43 No.103892824
dead on arrival
doesn't even know what a migu is.
doesn't even know what a migu is.
Anonymous 01/14/25(Tue)12:59:38 No.103892841
>>103890295
Once you make something a benchmark then it becomes useless since they'll optimize for it and the test isn't representative of true performance.
Once you make something a benchmark then it becomes useless since they'll optimize for it and the test isn't representative of true performance.
Anonymous 01/14/25(Tue)13:00:14 No.103892849
>>103892824
Not exactly a fair test being out of focus and off-model.
Not exactly a fair test being out of focus and off-model.
Anonymous 01/14/25(Tue)13:00:38 No.103892854
>>103892808
>deepseek will come up with something good but then just repeat parts of it every message forever.
you're doing something wrong
>deepseek will come up with something good but then just repeat parts of it every message forever.
you're doing something wrong
Anonymous 01/14/25(Tue)13:00:55 No.103892858
>>103892534
post a log with 5 bot messages
post a log with 5 bot messages
Anonymous 01/14/25(Tue)13:01:53 No.103892870
Anonymous 01/14/25(Tue)13:01:59 No.103892871
>>103892824
There's only so much a benchmaxxed 8B can do.
There's only so much a benchmaxxed 8B can do.
Anonymous 01/14/25(Tue)13:02:29 No.103892880
>>103892854
Don't think so, Chinese models just have a lot of shills. People said qwen was good too and I didn't like it either.
Don't think so, Chinese models just have a lot of shills. People said qwen was good too and I didn't like it either.
Anonymous 01/14/25(Tue)13:02:39 No.103892882
>>103892870
assistant messages
assistant messages
Anonymous 01/14/25(Tue)13:04:05 No.103892905
>>103892849
The whole purpose of machine learning is to create emergent, out of distribution capabilities. It's a perfectly fair test. >>103892871
Still pretty impressive level of understanding for an 8b
The whole purpose of machine learning is to create emergent, out of distribution capabilities. It's a perfectly fair test. >>103892871
Still pretty impressive level of understanding for an 8b
Anonymous 01/14/25(Tue)13:04:30 No.103892917
>>103892824
Knowing more obscure stuff is where more params come in.
Knowing more obscure stuff is where more params come in.
SnusGoose 01/14/25(Tue)13:05:19 No.103892930
A new model is out, its 45A450B model.
https://www.minimaxi.com/en
https://www.minimaxi.com/en
Anonymous 01/14/25(Tue)13:05:29 No.103892931
Anonymous 01/14/25(Tue)13:06:49 No.103892949
>>103892930
Nice marketing page. Now show me the weights.
Nice marketing page. Now show me the weights.
Anonymous 01/14/25(Tue)13:07:23 No.103892957
>>103892931
I didn't know that character till I saw it several times on /lmg
I didn't know that character till I saw it several times on /lmg
Anonymous 01/14/25(Tue)13:09:41 No.103892982
>>103892949
Looks like it exists but the retarded namefag didn't think of posting the hf link or even the proper blogpost kek.
Looks like it exists but the retarded namefag didn't think of posting the hf link or even the proper blogpost kek.
SnusGoose 01/14/25(Tue)13:10:11 No.103892992
https://huggingface.co/MiniMaxAI/MiniMax-Text-01
Idk if its any good, they did some linear attention fuckery
Idk if its any good, they did some linear attention fuckery
Anonymous 01/14/25(Tue)13:11:01 No.103893002
>>103892992
4M context? I like that.
4M context? I like that.
Anonymous 01/14/25(Tue)13:11:29 No.103893010
>>103892992
Why do you have 2 HF accounts? The one you link on your landing page is empty: https://huggingface.co/MiniMax-AI
Why do you have 2 HF accounts? The one you link on your landing page is empty: https://huggingface.co/MiniMax-AI
SnusGoose 01/14/25(Tue)13:12:37 No.103893026
It’s not my model I’m not sure why they did that
Anonymous 01/14/25(Tue)13:14:01 No.103893047
>>103892992
>MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
filled up my entire buzzword bingo card
>MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
filled up my entire buzzword bingo card
Anonymous 01/14/25(Tue)13:14:26 No.103893051
>>103892882
https://rentry.org/bds5pnoc
Here's the first nine from an rpg/text adventure type prompt. DSv3 Q6
Its not exactly unslopped or inspired prose, but its not repeating itself in any way I find alarming.
Believe it or not. The choice is yours!
https://rentry.org/bds5pnoc
Here's the first nine from an rpg/text adventure type prompt. DSv3 Q6
Its not exactly unslopped or inspired prose, but its not repeating itself in any way I find alarming.
Believe it or not. The choice is yours!
Anonymous 01/14/25(Tue)13:15:40 No.103893070
>>103892931
Miku isn't even a fucking thing anymore except for turbo-autists, literal oldfags (30+) and troons. The fact that you think she's still present in pop-culture puts you in the literal oldfag category, by the way.
Miku isn't even a fucking thing anymore except for turbo-autists, literal oldfags (30+) and troons. The fact that you think she's still present in pop-culture puts you in the literal oldfag category, by the way.
Anonymous 01/14/25(Tue)13:16:26 No.103893079
>>103893070
>puts you in the literal oldfag category
why do you say this like it's a bad thing, zoom zoom?
>puts you in the literal oldfag category
why do you say this like it's a bad thing, zoom zoom?
Anonymous 01/14/25(Tue)13:18:52 No.103893110
Anonymous 01/14/25(Tue)13:18:56 No.103893112
>>103893079
Man, I'm in that very same category, that's why I know damn well that you just never paused to think whether the things that were popular when you were a kid are still known at all. Happens to me all the time.
Man, I'm in that very same category, that's why I know damn well that you just never paused to think whether the things that were popular when you were a kid are still known at all. Happens to me all the time.
Anonymous 01/14/25(Tue)13:20:15 No.103893131
>>103893070
>Miku isn't even a fucking thing anymore
I dunno about the west, but Vocaloids are still massively popular with asian kids
>Miku isn't even a fucking thing anymore
I dunno about the west, but Vocaloids are still massively popular with asian kids
Anonymous 01/14/25(Tue)13:21:21 No.103893145
>>103893051
I can see repetition all through your text, but I guess it's good for you that you're blissfully unaware of it.
I can see repetition all through your text, but I guess it's good for you that you're blissfully unaware of it.
Anonymous 01/14/25(Tue)13:21:51 No.103893150
>>103893070
The point is that it used to be. And is thus highly represented within any stack of training data. Just about every AI model on earth knows what Hatsune Miku is.
The point is that it used to be. And is thus highly represented within any stack of training data. Just about every AI model on earth knows what Hatsune Miku is.
Anonymous 01/14/25(Tue)13:21:57 No.103893155
Anonymous 01/14/25(Tue)13:22:16 No.103893158
>>103893110
The simpleQA and IFEval score are high which means it's better for RP.
The simpleQA and IFEval score are high which means it's better for RP.
Anonymous 01/14/25(Tue)13:22:21 No.103893160
>>103893131
You know what, fair enough, I'll give you that. Still, you can't deny she's way less known on a global scale than she was a generation ago. Hate to say it, but Miku is a niche character at this point.
You know what, fair enough, I'll give you that. Still, you can't deny she's way less known on a global scale than she was a generation ago. Hate to say it, but Miku is a niche character at this point.
Anonymous 01/14/25(Tue)13:23:03 No.103893165
>>103893070
Yeah, Miku is pretty much like Touhou, it still exists only on the darkest corners of the internet.
Yeah, Miku is pretty much like Touhou, it still exists only on the darkest corners of the internet.
Anonymous 01/14/25(Tue)13:23:46 No.103893180
Anonymous 01/14/25(Tue)13:24:30 No.103893188
>>103893155
You still have time.
You still have time.
Anonymous 01/14/25(Tue)13:25:19 No.103893200
>>103893165
I'm still coping with that as a Touhoufag, to be honest. But yeah, it went from being the single most fervent fandom to a niche as well.
I'm still coping with that as a Touhoufag, to be honest. But yeah, it went from being the single most fervent fandom to a niche as well.
Anonymous 01/14/25(Tue)13:26:23 No.103893213
>>103893051
thanks for the log, I'll take a look
while I was waiting, I did a basic test using the chat api
thanks for the log, I'll take a look
while I was waiting, I did a basic test using the chat api
Anonymous 01/14/25(Tue)13:26:32 No.103893215
>>103893165
Hell yeah, now I am quirky and special for being a Mikufag
Hell yeah, now I am quirky and special for being a Mikufag
Anonymous 01/14/25(Tue)13:26:39 No.103893219
Anonymous 01/14/25(Tue)13:28:07 No.103893238
>>103893051
That is so different from what I do it's hard to say. If I use it as an assistant it pretty much starts and ends every reply the same way, you've kind of baked that into that method by having it ask you the same question on each turn so maybe that pacifies it.
That is so different from what I do it's hard to say. If I use it as an assistant it pretty much starts and ends every reply the same way, you've kind of baked that into that method by having it ask you the same question on each turn so maybe that pacifies it.
Anonymous 01/14/25(Tue)13:29:26 No.103893259
The goal for llms is predictability and to reduce surprises. The minority who use them to RP want surprises. You're fighting an uphill battle.
Anonymous 01/14/25(Tue)13:30:42 No.103893274
>>103893259
I thought the goal for LLMs was to become everything machines aka AGI.
I thought the goal for LLMs was to become everything machines aka AGI.
Anonymous 01/14/25(Tue)13:32:13 No.103893298
Anonymous 01/14/25(Tue)13:33:11 No.103893309
>>103892930
>>103892992
Hello Developer!
你好,开发者!
Welcome to 4chan LLM thread!
欢迎来到4chan LLM讨论串!
Please provide llama.cpp(https://github.com/ggerganov/llama.cpp) support so we can test your model!
请提供llama.cpp(https://github.com/ggerganov/llama.cpp)支持,以便我们可以测试你的模型!
Was your model trained on outputs of GPT4?
你的模型是基于GPT4的输出进行训练的吗?
>>103892992
Hello Developer!
你好,开发者!
Welcome to 4chan LLM thread!
欢迎来到4chan LLM讨论串!
Please provide llama.cpp(https://github.com/ggerga
请提供llama.cpp(https://github.com/gge
Was your model trained on outputs of GPT4?
你的模型是基于GPT4的输出进行训练的吗?
Anonymous 01/14/25(Tue)13:34:20 No.103893326
Anonymous 01/14/25(Tue)13:34:49 No.103893333
>>103889965
how is this different from genning summary?
I don't think you can summarize in parallel because individual tokens are created sequentially but weighted in full context.
My pleb expertise says the design is wrong and instead of having word tokens, it should also have sentence and story "super tokens" or any kind of structure or weights. Ai can't be smart if operates just on one dimensional context.
I don't jerk off because I remember that I did this yesterday nor because I noted down to do this today. So instead of summarizing, there should be a weighted interpretation.
Other example if saw a movie very long ago, you might not remember the plot but whether you liked it (at that time).
something like
if John jerked off 1,2,3,4 days ago...) the information gets condensed to John likes jerking off everyday. Then if he skips a day it would become he almost jerks off every day. Susy catches him multiple times and 'John is a pervert' becomes a fact (more weight, kept longer). If this story goes on for a long time most of this will get pushed out of summary unless there is a weight to all of this. Maybe all that remains of Susy is that she is John's sister and she thinks of him as a pervert instead of something like 'John's sister caught him jerking off a while ago', but at a later point another sister is introduced'
This would also replace definition because definitions can change. If John looses an arm the definition that he has an arm becomes redundant.
Thanks for reading my blogpost
how is this different from genning summary?
I don't think you can summarize in parallel because individual tokens are created sequentially but weighted in full context.
My pleb expertise says the design is wrong and instead of having word tokens, it should also have sentence and story "super tokens" or any kind of structure or weights. Ai can't be smart if operates just on one dimensional context.
I don't jerk off because I remember that I did this yesterday nor because I noted down to do this today. So instead of summarizing, there should be a weighted interpretation.
Other example if saw a movie very long ago, you might not remember the plot but whether you liked it (at that time).
something like
John:{
history: {1_day_ago{jerked off}}
facts: black hair, blue eyes, wears leather jacket, has a sister
assumption: maybe he likes jerking off
}
Susy:{
history:{1_day_ago:{saw John jerking off, called him a pervert}}
facts: sister of John
assumption: thinks John is pervert, dislikes John
}
if John jerked off 1,2,3,4 days ago...) the information gets condensed to John likes jerking off everyday. Then if he skips a day it would become he almost jerks off every day. Susy catches him multiple times and 'John is a pervert' becomes a fact (more weight, kept longer). If this story goes on for a long time most of this will get pushed out of summary unless there is a weight to all of this. Maybe all that remains of Susy is that she is John's sister and she thinks of him as a pervert instead of something like 'John's sister caught him jerking off a while ago', but at a later point another sister is introduced'
This would also replace definition because definitions can change. If John looses an arm the definition that he has an arm becomes redundant.
Thanks for reading my blogpost
Anonymous 01/14/25(Tue)13:35:56 No.103893353
>>103893165
I am pretty sure MLP followed the same trajectory as well, I see them a whole lot less then a mere decade ago.
I am pretty sure MLP followed the same trajectory as well, I see them a whole lot less then a mere decade ago.
Anonymous 01/14/25(Tue)13:37:03 No.103893372
>>103893326
It's by a literal nobody Chinese firm with no information about them. The fact that they don't even list how many tokens they trained on makes me think it's severely undertrained on trained on benchmarks to hit those scores.
It's by a literal nobody Chinese firm with no information about them. The fact that they don't even list how many tokens they trained on makes me think it's severely undertrained on trained on benchmarks to hit those scores.
Anonymous 01/14/25(Tue)13:38:32 No.103893387
Anonymous 01/14/25(Tue)13:41:21 No.103893417
>>103893051
>You awaken—or perhaps you simply become aware—in a place that defies comprehension.
>You drift—or perhaps you simply will yourself to move—through the Infinite Void
>The golden light emanating from the pinnacle casts long shadows
>The golden light from the pinnacle above grows brighter
>floating islands and crumbling structures bathed in the golden light that emanates from the pinnacle
>You must decide what to do next
>You must decide how to proceed.
>You must decide how to approach this discovery
>You must decide how to proceed.
>You must decide how to interact with it.
it's a bit less conspicuous because your have long replies and each message moves the plot forward, but it's still going to become tedious to read sonner than later
>You awaken—or perhaps you simply become aware—in a place that defies comprehension.
>You drift—or perhaps you simply will yourself to move—through the Infinite Void
>The golden light emanating from the pinnacle casts long shadows
>The golden light from the pinnacle above grows brighter
>floating islands and crumbling structures bathed in the golden light that emanates from the pinnacle
>You must decide what to do next
>You must decide how to proceed.
>You must decide how to approach this discovery
>You must decide how to proceed.
>You must decide how to interact with it.
it's a bit less conspicuous because your have long replies and each message moves the plot forward, but it's still going to become tedious to read sonner than later
Anonymous 01/14/25(Tue)13:41:55 No.103893424
>>103893387
Yea, minimax
Yea, minimax
Anonymous 01/14/25(Tue)13:42:06 No.103893425
>>103893372
I mean, you could've said this about Deepseek the company 3 months ago. Advances are increasingly in open source being driven by smaller firms.
I mean, you could've said this about Deepseek the company 3 months ago. Advances are increasingly in open source being driven by smaller firms.
Anonymous 01/14/25(Tue)13:42:10 No.103893426
>>103893387
Are they?
Are they?
Anonymous 01/14/25(Tue)13:46:20 No.103893461
>>103893426
No they just happen to be called minimax
No they just happen to be called minimax
Anonymous 01/14/25(Tue)13:46:31 No.103893463
>>103893160
>Hate to say it, but Miku is a niche character at this point.
the most popular vocaloid videos still get 100M+ views...somewhere around a mid-tier mr.beast video. I don't think its dying, but its not leading-edge culture any more.
I think you'll still find orders of magnitude more normies that could identify miku vs even the most well-known touhou character
>Hate to say it, but Miku is a niche character at this point.
the most popular vocaloid videos still get 100M+ views...somewhere around a mid-tier mr.beast video. I don't think its dying, but its not leading-edge culture any more.
I think you'll still find orders of magnitude more normies that could identify miku vs even the most well-known touhou character
Anonymous 01/14/25(Tue)13:49:03 No.103893488
>>103893387
Well, they're definitely up with the best in any case.
Well, they're definitely up with the best in any case.
Anonymous 01/14/25(Tue)13:49:36 No.103893492
>>103893417
>but it's still going to become tedious to read sonner than later
fair enough. I'd easily give the "you must.." prompt a pass since its told to be an interpreter, but maybe the others are annoying?
I can't think of another model I've used that is less prone to that type of prosal repetition though. What's your go-to for repetition-free outputs?
>but it's still going to become tedious to read sonner than later
fair enough. I'd easily give the "you must.." prompt a pass since its told to be an interpreter, but maybe the others are annoying?
I can't think of another model I've used that is less prone to that type of prosal repetition though. What's your go-to for repetition-free outputs?
Anonymous 01/14/25(Tue)13:50:24 No.103893497
>>103893274
How is it agi if it can't do something a human can do easily?
How is it agi if it can't do something a human can do easily?
Anonymous 01/14/25(Tue)13:52:02 No.103893512
>>103893497
That's part of my point, yes.
That's part of my point, yes.
Anonymous 01/14/25(Tue)13:53:10 No.103893528
>>103893512
Ah I see, I should have said that to who you replied to.
Ah I see, I should have said that to who you replied to.
Anonymous 01/14/25(Tue)13:53:16 No.103893531
Anonymous 01/14/25(Tue)13:54:49 No.103893548
>>103893070
are you retarded?
Open new miku MV, it already has million of view. If you combine all Miku songs you probably get more total views than any human artist (yes total views, I'm not saying Miku is the single most popular thing ever). Miku is probably also one of the chars with most art and most importantly she is around for 20 years which automatically makes her more relevant than any recent popular flavor of the month
Quick look at r34 tells us has like 20k art, for comparison d.va has 23k
Not that impressive you say?
r34 is very much a western site and just porn. On Sankaku she has 171k versus just 21k for d.va.
She is omnipresent in music, art, porn (spawned its own category of porn) and has tons of cameos in various media. If your dataset mentions anything weeb related its unlikely to not contain Miku.
are you retarded?
Open new miku MV, it already has million of view. If you combine all Miku songs you probably get more total views than any human artist (yes total views, I'm not saying Miku is the single most popular thing ever). Miku is probably also one of the chars with most art and most importantly she is around for 20 years which automatically makes her more relevant than any recent popular flavor of the month
Quick look at r34 tells us has like 20k art, for comparison d.va has 23k
Not that impressive you say?
r34 is very much a western site and just porn. On Sankaku she has 171k versus just 21k for d.va.
She is omnipresent in music, art, porn (spawned its own category of porn) and has tons of cameos in various media. If your dataset mentions anything weeb related its unlikely to not contain Miku.
Anonymous 01/14/25(Tue)13:57:12 No.103893569
Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference
Anonymous 01/14/25(Tue)13:57:24 No.103893570
>>103892992
Nala test please
Nala test please
Anonymous 01/14/25(Tue)13:58:13 No.103893578
>>103893548
holy cope
holy cope
Anonymous 01/14/25(Tue)14:00:55 No.103893600
>>103893578
You're the one coping, old man. Miku is in Fortnite.
You're the one coping, old man. Miku is in Fortnite.
Anonymous 01/14/25(Tue)14:02:25 No.103893611
>>103893569
They tested a lot of new stuff with this. Just wanted to point out.
They tested a lot of new stuff with this. Just wanted to point out.
Anonymous 01/14/25(Tue)14:03:05 No.103893623
>>103893492
it's not just "you must". you're going to see "you must decide how to" for the next 1000 messages. except once it says e.g. "how to approach" for the 2nd and then 3rd time, it's going to actually be "you must decide how to approach ...". and the looping part will keep growing
and I don't really have an alternative recommendation because deepseek is the only "local" model I've ever had a longer chat with.
it's not just "you must". you're going to see "you must decide how to" for the next 1000 messages. except once it says e.g. "how to approach" for the 2nd and then 3rd time, it's going to actually be "you must decide how to approach ...". and the looping part will keep growing
and I don't really have an alternative recommendation because deepseek is the only "local" model I've ever had a longer chat with.
Anonymous 01/14/25(Tue)14:04:05 No.103893630
https://rentry.org/ona836nk
minimax summarization of its own paper. also apparently they have some inhouse benchmarks that judge creative writing lol.
minimax summarization of its own paper. also apparently they have some inhouse benchmarks that judge creative writing lol.
Anonymous 01/14/25(Tue)14:06:22 No.103893654
Anonymous 01/14/25(Tue)14:09:19 No.103893681
>>103893623
>I don't really have an alternative recommendation
The orthodox way is to either improve your initial prompt/character card or edit the first few responses to either delete that, or edit it into a more consistent output
>I don't really have an alternative recommendation
The orthodox way is to either improve your initial prompt/character card or edit the first few responses to either delete that, or edit it into a more consistent output
Anonymous 01/14/25(Tue)14:09:34 No.103893684
>>103893488
>Wolf (dog adjacent)
>Not instantly going for the food on the table
That's how you can tell that this video is AI generated, any canine would instantly beeline towards the food and cuddle with you afterwards. Not the other way around.
>Wolf (dog adjacent)
>Not instantly going for the food on the table
That's how you can tell that this video is AI generated, any canine would instantly beeline towards the food and cuddle with you afterwards. Not the other way around.
Anonymous 01/14/25(Tue)14:09:36 No.103893685
>>103893274
LLMs will NEVER become AGI, they're not even AI. YWNBAI
LLMs will NEVER become AGI, they're not even AI. YWNBAI
Anonymous 01/14/25(Tue)14:10:39 No.103893692
I hate furfags they ruin everything that is good in this world
Anonymous 01/14/25(Tue)14:11:27 No.103893700
>>103893685
I agree 100% my guy. Doesn't lessen my point in the least however.
I agree 100% my guy. Doesn't lessen my point in the least however.
Anonymous 01/14/25(Tue)14:11:59 No.103893707
>>103893274
LLM's would be just one part of AGI, in the same way the language centers in our brain are just one part of the brain. Many other systems would have to work in conjunction with LLM's before it could be called an AGI. LLM's on their own will never be called that.
LLM's would be just one part of AGI, in the same way the language centers in our brain are just one part of the brain. Many other systems would have to work in conjunction with LLM's before it could be called an AGI. LLM's on their own will never be called that.
Anonymous 01/14/25(Tue)14:13:55 No.103893724
Just tried the roleplaying thing from the op. Came to it in a totally cynicle mind but horry shit
Anonymous 01/14/25(Tue)14:15:59 No.103893742
>>103893707
Yeah. I've said that in this general a couple of times too.
AGI needs to be a complex system much like a brain, with different complex parts that do different things, even if all the different parts are also neural networks of their own.
Of course, you take a transformers LLM and look inside it and it does have its own complex blocks, so you could begin expanding from inside the LLM into something bigger and still call it a LLM, but it would be something much more complicated than just the token prediction machines we have today.
Yeah. I've said that in this general a couple of times too.
AGI needs to be a complex system much like a brain, with different complex parts that do different things, even if all the different parts are also neural networks of their own.
Of course, you take a transformers LLM and look inside it and it does have its own complex blocks, so you could begin expanding from inside the LLM into something bigger and still call it a LLM, but it would be something much more complicated than just the token prediction machines we have today.
Anonymous 01/14/25(Tue)14:16:05 No.103893744
>>103893707
Doesn't gpt incorporate some form of fusion behind the scenes with code based instructions? Like web searching in some form
Doesn't gpt incorporate some form of fusion behind the scenes with code based instructions? Like web searching in some form
Anonymous 01/14/25(Tue)14:16:40 No.103893750
>>103893700
I dont care about your point. I just said what I wanted to. Take it or leave it.
I dont care about your point. I just said what I wanted to. Take it or leave it.
Anonymous 01/14/25(Tue)14:17:02 No.103893755
On ruler, so real context
Anonymous 01/14/25(Tue)14:17:18 No.103893757
>>103893750
base
base
Anonymous 01/14/25(Tue)14:19:13 No.103893773
>>103893259
There's a hidden premise here that isn't quite true. Which is that "truly and utterly unpredictable surprises are present in creative writing". But in fact, creativity of the human kind that people enjoy is not really surprising or unpredictable. The most creative people on the planet are the ones who have a vast wealth of knowledge which they can mix concepts with. That is how you get truly coherent creative surprises rather than utter chaotic randomness that doesn't make sense. So ultimately surprises that also make sense still benefit from an autoregressive prediction training objective. The issue isn't necessarily the training objective, but about whether companies care about training on "low quality" diverse internet data that would lead to creativity in writing.
There's a hidden premise here that isn't quite true. Which is that "truly and utterly unpredictable surprises are present in creative writing". But in fact, creativity of the human kind that people enjoy is not really surprising or unpredictable. The most creative people on the planet are the ones who have a vast wealth of knowledge which they can mix concepts with. That is how you get truly coherent creative surprises rather than utter chaotic randomness that doesn't make sense. So ultimately surprises that also make sense still benefit from an autoregressive prediction training objective. The issue isn't necessarily the training objective, but about whether companies care about training on "low quality" diverse internet data that would lead to creativity in writing.
Anonymous 01/14/25(Tue)14:19:13 No.103893774
Holy fuck we're so back...
Anonymous 01/14/25(Tue)14:21:51 No.103893800
>>103893755
Impressive
Impressive
Anonymous 01/14/25(Tue)14:22:31 No.103893809
Anonymous 01/14/25(Tue)14:22:54 No.103893817
Anonymous 01/14/25(Tue)14:24:17 No.103893828
>>103893774
I can't run this shit. Wonder what the price will be on OR.
I can't run this shit. Wonder what the price will be on OR.
Anonymous 01/14/25(Tue)14:25:12 No.103893839
>>103893488
Man, those hand/paws are pretty good.
Man, those hand/paws are pretty good.
Anonymous 01/14/25(Tue)14:26:18 No.103893858
>>103884327
I just wanted to share my appreciation for this excellent smug Miku gen.
From a smug sommelier with over 2,400 smug anime girl images, I deem this a 10/10 smug gen.
I just wanted to share my appreciation for this excellent smug Miku gen.
From a smug sommelier with over 2,400 smug anime girl images, I deem this a 10/10 smug gen.
Anonymous 01/14/25(Tue)14:27:42 No.103893876
>>103892930
>>103893309
>>103892949
https://github.com/MiniMax-AI/MiniMax-01
blog post:
https://www.minimaxi.com/en/news/minimax-01-series-2
>>103893309
>>103892949
https://github.com/MiniMax-AI/MiniM
blog post:
https://www.minimaxi.com/en/news/mi
Anonymous 01/14/25(Tue)14:27:51 No.103893878
>>103893817
It's snout is much to narrow and ears pointy to be a dog, the only other thing that comes to mind is coyote but it looks more like a wolf than a coyote to me.
It's snout is much to narrow and ears pointy to be a dog, the only other thing that comes to mind is coyote but it looks more like a wolf than a coyote to me.
Anonymous 01/14/25(Tue)14:31:20 No.103893911
>>103893654
>Qwen, GPT, Gemini, DeepSeek and Llama beat Sonnet
Who was the evaluator? GPT4?
Let's look at appendix...
First fucking example:
>Whispers of the Lost City
>Human Evaluator:
>The lyrics are effective due to their vivid imagery, emotional depth, and narrative structure. They create a mysterious and atmospheric setting with phrases like "moonbeams" and "ancient walls," while also conveying the emotional journey of the traveler. The repetition in the chorus reinforces the central theme, making the song memorable. The poetic language and space for interpretation add layers of intrigue and emotional resonance, making the song both engaging and thought-provoking.
Example 2:
>In the quaint village of Elderglen, nestled between ancient woods and misty hills, lived a young adventurer named Elara.
>Human Evaluator:
>The story demonstrates strong world-building and an engaging narrative. The concept of Aetheria is imaginative, with vivid descriptions of floating mountains, crystal rivers, and mystical creatures that evoke a sense of wonder... Overall, the story shows strong creative potential, with an imaginative world, a compelling heroine, and an uplifting message.
Ehm... MiniMax team, I have very bad news for you. Your human evaluators offloaded all of their work to GPT4. Please take note and penalize them. Your Benchmarks is pure fucking GPTSLOP that does NOT represent ACTUAL HUMAN PREFERENCE.
>Qwen, GPT, Gemini, DeepSeek and Llama beat Sonnet
Who was the evaluator? GPT4?
Let's look at appendix...
First fucking example:
>Whispers of the Lost City
>Human Evaluator:
>The lyrics are effective due to their vivid imagery, emotional depth, and narrative structure. They create a mysterious and atmospheric setting with phrases like "moonbeams" and "ancient walls," while also conveying the emotional journey of the traveler. The repetition in the chorus reinforces the central theme, making the song memorable. The poetic language and space for interpretation add layers of intrigue and emotional resonance, making the song both engaging and thought-provoking.
Example 2:
>In the quaint village of Elderglen, nestled between ancient woods and misty hills, lived a young adventurer named Elara.
>Human Evaluator:
>The story demonstrates strong world-building and an engaging narrative. The concept of Aetheria is imaginative, with vivid descriptions of floating mountains, crystal rivers, and mystical creatures that evoke a sense of wonder... Overall, the story shows strong creative potential, with an imaginative world, a compelling heroine, and an uplifting message.
Ehm... MiniMax team, I have very bad news for you. Your human evaluators offloaded all of their work to GPT4. Please take note and penalize them. Your Benchmarks is pure fucking GPTSLOP that does NOT represent ACTUAL HUMAN PREFERENCE.
Anonymous 01/14/25(Tue)14:32:28 No.103893927
Guess I'll be getting a digit when it comes out. If that turns out to be shit then I guess Ill be getting a DDR5 server
Anonymous 01/14/25(Tue)14:32:45 No.103893931
>>103893911
lel that's pretty egregious
lel that's pretty egregious
Anonymous 01/14/25(Tue)14:33:24 No.103893939
>>103892992
>400B
>MoE
This is the perfect sweet spot between DSV3 and 405B. llama.cpp support and Q4 ggufs please and thank you.
>400B
>MoE
This is the perfect sweet spot between DSV3 and 405B. llama.cpp support and Q4 ggufs please and thank you.
Anonymous 01/14/25(Tue)14:33:34 No.103893943
>>103893911
Jesus Christ.
>human evaluator
Did the human evaluator cheat by running it through an LLM instead of doing their job?
Honestly I was thinking of working for one of those data labeling companies and just using an LLM to do it for me so I could see this happening for real.
Jesus Christ.
>human evaluator
Did the human evaluator cheat by running it through an LLM instead of doing their job?
Honestly I was thinking of working for one of those data labeling companies and just using an LLM to do it for me so I could see this happening for real.
Anonymous 01/14/25(Tue)14:33:47 No.103893947
Anonymous 01/14/25(Tue)14:34:38 No.103893954
Anonymous 01/14/25(Tue)14:34:42 No.103893960
Anonymous 01/14/25(Tue)14:35:33 No.103893967
>>103893927
>If that turns out to be shit then I guess Ill be getting a DDR5 server
You would be better served just waiting for DDR6 to become available, if you get a DDR5 server after buying a digital then that server will be outdated within a year since DDR6 should start becoming available early 2026 according to the current timetable.
>If that turns out to be shit then I guess Ill be getting a DDR5 server
You would be better served just waiting for DDR6 to become available, if you get a DDR5 server after buying a digital then that server will be outdated within a year since DDR6 should start becoming available early 2026 according to the current timetable.
Anonymous 01/14/25(Tue)14:35:37 No.103893969
Anonymous 01/14/25(Tue)14:36:13 No.103893977
>>103893967
By then the ddr5 epyc will be cheap, right?
By then the ddr5 epyc will be cheap, right?
Anonymous 01/14/25(Tue)14:36:39 No.103893981
Anonymous 01/14/25(Tue)14:41:08 No.103894046
>>103893773
The are a few writers with a prose style that gives me a continuous feeling of surprise or unexpectedness as I read, with the unexpectedness being located in the word use rather than the actual story content. Like listening to unusual music where it's hard to predict what the next note in the melody will be (even people who know no music theory can usually predict the next note in a simple pop song they've never heard before). Terry Pratchett was one author like that.
But yeah that ability is fairly unique and not a hard requirement for being a good or interesting writer, a lot of the all-time great writers couldn't or didn't do it. So we probably shouldn't make it the bar for language models, either.
The are a few writers with a prose style that gives me a continuous feeling of surprise or unexpectedness as I read, with the unexpectedness being located in the word use rather than the actual story content. Like listening to unusual music where it's hard to predict what the next note in the melody will be (even people who know no music theory can usually predict the next note in a simple pop song they've never heard before). Terry Pratchett was one author like that.
But yeah that ability is fairly unique and not a hard requirement for being a good or interesting writer, a lot of the all-time great writers couldn't or didn't do it. So we probably shouldn't make it the bar for language models, either.
Anonymous 01/14/25(Tue)14:42:17 No.103894064
>>103894046
To be AGI it just has to be able to write like an average human.
To be AGI it just has to be able to write like an average human.
Anonymous 01/14/25(Tue)14:42:49 No.103894073
>>103893911
clear giveaways are references to chorus, symphony, journeys, thought-provoking, sense of wonder..
clear giveaways are references to chorus, symphony, journeys, thought-provoking, sense of wonder..
Anonymous 01/14/25(Tue)14:45:16 No.103894109
>>103893773
Somehow I find hard to believe RR Martin was some guy who explored the world. Get the impression that because he is fat he explores his mind. But I might be wrong, he has an insane collection of figurins so maybe it is the ability to engage with words
Somehow I find hard to believe RR Martin was some guy who explored the world. Get the impression that because he is fat he explores his mind. But I might be wrong, he has an insane collection of figurins so maybe it is the ability to engage with words
Anonymous 01/14/25(Tue)14:46:37 No.103894125
>>103893981
Not only do I not jest, but only about 10 percent of them are saved from 4chan - the other 90 percent are my own crops from my own screenshots.
Not only do I not jest, but only about 10 percent of them are saved from 4chan - the other 90 percent are my own crops from my own screenshots.
Anonymous 01/14/25(Tue)14:47:18 No.103894137
>>103894125
You have to have some repeats in there surely.
You have to have some repeats in there surely.
Anonymous 01/14/25(Tue)14:47:52 No.103894147
>>103893969
I am serious... and don't call me Shirley.
I am serious... and don't call me Shirley.
Anonymous 01/14/25(Tue)14:48:22 No.103894151
>>103892992
Sir. How do run on 24GB vram? Thank
Sir. How do run on 24GB vram? Thank
Anonymous 01/14/25(Tue)14:48:41 No.103894155
Looks like this mimimax model may need a prefill.
Anonymous 01/14/25(Tue)14:48:54 No.103894159
Anonymous 01/14/25(Tue)14:48:55 No.103894160
>>103894147
Damn that is pretty fucking smug
Damn that is pretty fucking smug
Anonymous 01/14/25(Tue)14:49:32 No.103894169
>>103894147
This is like stamp collecting, but for weebs.
This is like stamp collecting, but for weebs.
Anonymous 01/14/25(Tue)14:50:34 No.103894182
>>103894169
They don't think it be like it is, but it do.
They don't think it be like it is, but it do.
Anonymous 01/14/25(Tue)14:50:37 No.103894185
>>103889710
>What are they trying to do here?
Going full retard. The EU might be a dying, but it's not dead yet and driving it into a corner is a bad idea.
>What are they trying to do here?
Going full retard. The EU might be a dying, but it's not dead yet and driving it into a corner is a bad idea.
Anonymous 01/14/25(Tue)14:50:45 No.103894188
>>103893878
Wolves can have wide snouts >:(
Wolves can have wide snouts >:(
Anonymous 01/14/25(Tue)14:50:55 No.103894189
>>103894159
based fuck jart
based fuck jart
Anonymous 01/14/25(Tue)14:51:07 No.103894190
>>103894159
Good, what's even the point of it? Why do I want to read from disk, just load it all at the start.
Good, what's even the point of it? Why do I want to read from disk, just load it all at the start.
Anonymous 01/14/25(Tue)14:52:27 No.103894199
You can trick minimax with some context. Put user: bla bla bla, character: bla bla bla then leave character: at the end and it seems to get around the filter. Seems decent.
Anonymous 01/14/25(Tue)14:53:24 No.103894208
>>103894159
What did mmap even do?
What did mmap even do?
Anonymous 01/14/25(Tue)14:53:38 No.103894210
I tried giving the minimax's online chat the umineko novel (question arc) but it refuses to answer my questions
Anonymous 01/14/25(Tue)14:56:23 No.103894237
im obsessed with deepseek
Anonymous 01/14/25(Tue)14:56:52 No.103894246
>>103893911
kek, that one painful slop.
kek, that one painful slop.
Anonymous 01/14/25(Tue)14:59:53 No.103894272
>>103894237
Do you like deepsex?
Do you like deepsex?
Anonymous 01/14/25(Tue)15:00:43 No.103894282
Minimax (Left) vs DeepSeek (Right)
This is literally sovless vs sovl, sad.
This is literally sovless vs sovl, sad.
Anonymous 01/14/25(Tue)15:00:54 No.103894285
>>103894246
But ChatGPT said that it loves it, and it is smarter than humans therefore your opinion is invalid, meatbag.
But ChatGPT said that it loves it, and it is smarter than humans therefore your opinion is invalid, meatbag.
Anonymous 01/14/25(Tue)15:01:48 No.103894294
>>103894282
Is the left their instance / chat? Because its 0.1 temp with a "you are a helpful assistant" system prompt
Is the left their instance / chat? Because its 0.1 temp with a "you are a helpful assistant" system prompt
Anonymous 01/14/25(Tue)15:02:37 No.103894304
>>103894282
I can't run either anyway, so it's useless.
I can't run either anyway, so it's useless.
Anonymous 01/14/25(Tue)15:05:09 No.103894325
Minimax doesn't seem great at creative writing. It has the same slop as deepseek. the Writer model is better.
an example..
miximax:
>Beside them, Nerith, the elf rogue, moved with the grace of a shadow. Her emerald eyes flickered with suspicion as she observed the duo. Nerith had been hired by the fortress's commander to protect Elaris from any threats, but she had not expected to encounter such an unusual pair. Her instincts, honed by years of experience, told her that something was amiss.
writer:
>At the rear of the group, the elf rogue, Aethereia, walked with a silent ease that belied her coiled tension. Her piercing emerald eyes darted back and forth, scanning their surroundings for any sign of danger. She had been hired by the Lord of the Fortress himself to provide...discreet security services, and her instincts were screaming at her that something was off about these two.
an example..
miximax:
>Beside them, Nerith, the elf rogue, moved with the grace of a shadow. Her emerald eyes flickered with suspicion as she observed the duo. Nerith had been hired by the fortress's commander to protect Elaris from any threats, but she had not expected to encounter such an unusual pair. Her instincts, honed by years of experience, told her that something was amiss.
writer:
>At the rear of the group, the elf rogue, Aethereia, walked with a silent ease that belied her coiled tension. Her piercing emerald eyes darted back and forth, scanning their surroundings for any sign of danger. She had been hired by the Lord of the Fortress himself to provide...discreet security services, and her instincts were screaming at her that something was off about these two.
Anonymous 01/14/25(Tue)15:06:33 No.103894341
>>103894282
I kneel. How will non-CPUmaxxers ever compete?
I kneel. How will non-CPUmaxxers ever compete?
Anonymous 01/14/25(Tue)15:06:59 No.103894345
>>103894294
The "You're a helpful assistant" system prompt doesn't cause much issue if the model is actually good because i literally pasted the whole character card and examples in the context, and the answer isn't any better even at temperature 1. But fine, I guess we will only know for sure once it's available on openrouter.
The "You're a helpful assistant" system prompt doesn't cause much issue if the model is actually good because i literally pasted the whole character card and examples in the context, and the answer isn't any better even at temperature 1. But fine, I guess we will only know for sure once it's available on openrouter.
Anonymous 01/14/25(Tue)15:09:27 No.103894372
>>103894304
not sending you the link... i want more computing power for me
not sending you the link... i want more computing power for me
Anonymous 01/14/25(Tue)15:09:53 No.103894375
>>103892992
aider bench when?
aider bench when?
Anonymous 01/14/25(Tue)15:12:11 No.103894399
>>103894147
Based Hanako enjoyer
Based Hanako enjoyer
Anonymous 01/14/25(Tue)15:13:06 No.103894410
>>103894325
Slop vs slightly better slop
Slop vs slightly better slop
Anonymous 01/14/25(Tue)15:13:36 No.103894415
I don't get the people that hype eva-qwen 32b, I've been testing it and it seems dumb. Does it only work well above q6 or something? And by dumb I mean nemo is still better even.
Anonymous 01/14/25(Tue)15:14:36 No.103894425
>>103894208
It forces the model into memory as it’s used, which used to help with locality and increase performance on Numa systems. That’s been broken for awhile now though, so it’s actually kind of useless.
It forces the model into memory as it’s used, which used to help with locality and increase performance on Numa systems. That’s been broken for awhile now though, so it’s actually kind of useless.
Anonymous 01/14/25(Tue)15:16:11 No.103894439
>>103894415
> hype
Shill. The word you’re looking for is shill.
Unironically, that’s also the answer to your question.
> hype
Shill. The word you’re looking for is shill.
Unironically, that’s also the answer to your question.
Anonymous 01/14/25(Tue)15:18:57 No.103894458
>>103894439
It's real fucking tiring that nobody can ever say anything good about any models without you retards immediately calling them shills.
Great job, all models are trash and anyone who likes them is a shill or a moron, let's wrap the fucking thread up then because what's the point?
It's real fucking tiring that nobody can ever say anything good about any models without you retards immediately calling them shills.
Great job, all models are trash and anyone who likes them is a shill or a moron, let's wrap the fucking thread up then because what's the point?
Anonymous 01/14/25(Tue)15:19:53 No.103894468
Here's a story continuation with MiniMax.
Anonymous 01/14/25(Tue)15:19:59 No.103894471
>>103894439
I'm just eager for something better than nemo under 70b, guess I'll have to wait more.
I'm just eager for something better than nemo under 70b, guess I'll have to wait more.
Anonymous 01/14/25(Tue)15:21:39 No.103894491
>>103894458
If the posts praising them came with convincing logs, I’d agree with you. As it is, it’s just so tiring in the other direction
If the posts praising them came with convincing logs, I’d agree with you. As it is, it’s just so tiring in the other direction
Anonymous 01/14/25(Tue)15:23:32 No.103894509
I am slowly convincing myself to get a 5090. If my latest plumbing bill (like $500 was pure upcharge fuck those niggers) hadn't been 1.7k then I might have had some qualms but now it's like I should get something around that price that is actually worth it
Anonymous 01/14/25(Tue)15:24:18 No.103894518
Minimax does not seem to have the repetition issue like deepseek does but it also seems drier without prior context. Its very smart though.
Anonymous 01/14/25(Tue)15:25:30 No.103894530
>>103894471
Nemo is a 12b model. If you seriously think that is better than Qwen 32b, or even Gemma 27b fine tunes, then you're off your rocker.
In terms of instruction following, Nemo is terrible.
In terms of remembering stuff from long context, Nemo is terrible.
Nemo is a 12b model. If you seriously think that is better than Qwen 32b, or even Gemma 27b fine tunes, then you're off your rocker.
In terms of instruction following, Nemo is terrible.
In terms of remembering stuff from long context, Nemo is terrible.
Anonymous 01/14/25(Tue)15:25:36 No.103894531
>>103894458
Why do you try to frame it like sloptunes are the only models that exist?
Why do you try to frame it like sloptunes are the only models that exist?
Anonymous 01/14/25(Tue)15:26:18 No.103894538
Anonymous 01/14/25(Tue)15:26:35 No.103894542
>>103894458
For that, you have to thank the opportunists who poisoned the well for Ko-Fi peanuts (sometimes more than that, but still).
For that, you have to thank the opportunists who poisoned the well for Ko-Fi peanuts (sometimes more than that, but still).
Anonymous 01/14/25(Tue)15:27:03 No.103894545
>>103894530
It is better than eva-qwen, that's for sure. You think I should try just regular qwen? It won't be too bland? I'd love it to be better but it can't even remember clothing properly.
It is better than eva-qwen, that's for sure. You think I should try just regular qwen? It won't be too bland? I'd love it to be better but it can't even remember clothing properly.
Anonymous 01/14/25(Tue)15:27:26 No.103894553
>>103894471
IQ3_M of Nemotron 51b blows Nemo away, if you have 24 vram.
IQ3_M of Nemotron 51b blows Nemo away, if you have 24 vram.
Anonymous 01/14/25(Tue)15:30:00 No.103894587
>>103894468
If you told me that this slop was generated with Qwen, DeepSeek, Llama, Mistral, Gemini, Grok, Amazon Nova or GPT, I would have believed you without questioning.
If you told me that this slop was generated with Qwen, DeepSeek, Llama, Mistral, Gemini, Grok, Amazon Nova or GPT, I would have believed you without questioning.
Anonymous 01/14/25(Tue)15:31:29 No.103894605
>>103894587
And do they have 4M context?
And do they have 4M context?
Anonymous 01/14/25(Tue)15:32:17 No.103894612
>>103894553
How much usable context does it have (in practice)?
How much usable context does it have (in practice)?
Anonymous 01/14/25(Tue)15:32:57 No.103894622
Did llama.cpp add new optimizations recently? I updated ooba and I seem to be getting slightly better tokens/second on most models now
Anonymous 01/14/25(Tue)15:33:29 No.103894630
>>103894612
>Sequence Length Used During Distillation: 8192
https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>Sequence Length Used During Distillation: 8192
https://huggingface.co/nvidia/Llama
Anonymous 01/14/25(Tue)15:34:11 No.103894638
>>103894630
That's not going to be enough for much, even if it is good.
That's not going to be enough for much, even if it is good.
Anonymous 01/14/25(Tue)15:34:21 No.103894641
Anonymous 01/14/25(Tue)15:35:26 No.103894662
>>103894630
>https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>The Llama-3_1-Nemotron-51B-instruct model underwent extensive safety evaluation including adversarial testing via three distinct methods:
>Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
>AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
>Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
>https://huggingface.co/nvidia/Llam
>The Llama-3_1-Nemotron-51B-instruct model underwent extensive safety evaluation including adversarial testing via three distinct methods:
>Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
>AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
>Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
Anonymous 01/14/25(Tue)15:36:43 No.103894684
>>103894641
How are the locusts liking it?
How are the locusts liking it?
Anonymous 01/14/25(Tue)15:41:41 No.103894744
We got the first one!
Anonymous 01/14/25(Tue)15:46:49 No.103894814
>>103894744
The actual good thing about this new model.
The actual good thing about this new model.
Anonymous 01/14/25(Tue)15:48:18 No.103894832
>>103894814
Im using the proxy and it seems pretty good so far. No repition problems like deepseek so far
Im using the proxy and it seems pretty good so far. No repition problems like deepseek so far
Anonymous 01/14/25(Tue)15:49:13 No.103894844
Anonymous 01/14/25(Tue)15:53:06 No.103894878
>>103894844
It's always the semi-literate that complain.
It's always the semi-literate that complain.
Anonymous 01/14/25(Tue)15:56:50 No.103894917
>>103894410
anon... all we have are different degrees of slop
anon... all we have are different degrees of slop
Anonymous 01/14/25(Tue)15:58:13 No.103894940
Is this another level of grammar nazi?
Anonymous 01/14/25(Tue)15:59:25 No.103894955
Anonymous 01/14/25(Tue)16:03:10 No.103895010
>>103894744
Sloptuner drama should have been the “free” square in the middle
Sloptuner drama should have been the “free” square in the middle
Anonymous 01/14/25(Tue)16:03:33 No.103895017
>>103894917
Why? WHY CAN'T WE HAVE MODELS WITHOUT GPTSLOP? WHY DOES EVERYONE TUNE ON GPT?
>Hey Anon, look, new model came out!
>FUCKING GREAT, PUT IT ON THE PILE TOGETHER WITH OTHER GPT4s AT HOME THAT WE ALREADY HAVE, THEY ALL SOUND THE FUCKING SAME
I don't mind synthetic data, as long as it is good, which GPTSLOP is certainly NOT.
Why? WHY CAN'T WE HAVE MODELS WITHOUT GPTSLOP? WHY DOES EVERYONE TUNE ON GPT?
>Hey Anon, look, new model came out!
>FUCKING GREAT, PUT IT ON THE PILE TOGETHER WITH OTHER GPT4s AT HOME THAT WE ALREADY HAVE, THEY ALL SOUND THE FUCKING SAME
I don't mind synthetic data, as long as it is good, which GPTSLOP is certainly NOT.
Anonymous 01/14/25(Tue)16:04:13 No.103895029
>>103895017
Benchmarks are all you need.
Benchmarks are all you need.
Anonymous 01/14/25(Tue)16:05:02 No.103895038
>>103894955
Is this another level of hall monitor?
Is this another level of hall monitor?
Anonymous 01/14/25(Tue)16:05:32 No.103895041
is rocinante still the best 12b?
Anonymous 01/14/25(Tue)16:08:35 No.103895068
>>103895041
Yeah, it's the best you'll get under 70b.
Yeah, it's the best you'll get under 70b.
Anonymous 01/14/25(Tue)16:09:56 No.103895081
>>103895017
This is why OpenAI was actually based for hiding their COT
This is why OpenAI was actually based for hiding their COT
Anonymous 01/14/25(Tue)16:10:09 No.103895084
>>103895068
damn
i tought by now something might've superced it, it has been like 2 months
is this the limit of 12bs?
damn
i tought by now something might've superced it, it has been like 2 months
is this the limit of 12bs?
Anonymous 01/14/25(Tue)16:10:50 No.103895096
>>103895068
That's Cydonia though.
That's Cydonia though.
Anonymous 01/14/25(Tue)16:11:11 No.103895102
Anonymous 01/14/25(Tue)16:11:15 No.103895103
>>103895084
>is this the limit of 12bs?
no, not by far, but as long as pretrain gets filtered it might be
>is this the limit of 12bs?
no, not by far, but as long as pretrain gets filtered it might be
Anonymous 01/14/25(Tue)16:12:53 No.103895132
mag mell is better
Anonymous 01/14/25(Tue)16:14:35 No.103895152
Anonymous 01/14/25(Tue)16:16:15 No.103895169
>>103894341
By spending pennies per million tokens
By spending pennies per million tokens
Anonymous 01/14/25(Tue)16:17:16 No.103895180
>>103895152
the fuck is up with this gay ass chunny title...?
the fuck is up with this gay ass chunny title...?
Anonymous 01/14/25(Tue)16:18:45 No.103895205
>>103895180
You'll complain about fucking anything won't you?
You'll complain about fucking anything won't you?
Anonymous 01/14/25(Tue)16:20:25 No.103895221
Anonymous 01/14/25(Tue)16:20:52 No.103895227
>>103895180
You would know good titles if it hit you in the face.
You would know good titles if it hit you in the face.
llama.cpp CUDA dev !!OM2Fp6Fn93S 01/14/25(Tue)16:20:52 No.103895229
>>103888589
I'm making progress with llama.cpp training support.
Full finetuning of LLaMA 3.2 1b does in principle work (except for two small tensors).
On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
On an Epyc 7742 one epoch takes 15 hours.
So I think CPUMaxx is going to be DOA for finetuning, DIGITS may be viable depending on the exact specs and pricing.
I'm making progress with llama.cpp training support.
Full finetuning of LLaMA 3.2 1b does in principle work (except for two small tensors).
On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
On an Epyc 7742 one epoch takes 15 hours.
So I think CPUMaxx is going to be DOA for finetuning, DIGITS may be viable depending on the exact specs and pricing.
Anonymous 01/14/25(Tue)16:23:02 No.103895249
What is the best true long context 12b model? I need 32k of context.
Anonymous 01/14/25(Tue)16:23:24 No.103895255
>>103895229
Is training compute or memory bound?
Is training compute or memory bound?
Anonymous 01/14/25(Tue)16:24:04 No.103895263
>>103895229
>On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
>On an Epyc 7742 one epoch takes 15 hours.
Shieeet.
Why the 5x difference? Just an issue of optimizing the CPU code to better account for NUMA, use the CPU Extensions, etc? A diffference in bandwidth or compute?
Something else?
>On an RTX 4090 one epoch over a dataset with 1.3 MB of text currently takes 3 minutes.
>On an Epyc 7742 one epoch takes 15 hours.
Shieeet.
Why the 5x difference? Just an issue of optimizing the CPU code to better account for NUMA, use the CPU Extensions, etc? A diffference in bandwidth or compute?
Something else?
Anonymous 01/14/25(Tue)16:24:07 No.103895264
>>103895229
>fine-tune is already hell but hey let's make it even hellish by adding the variable of a untested codebase
>fine-tune is already hell but hey let's make it even hellish by adding the variable of a untested codebase
Anonymous 01/14/25(Tue)16:24:17 No.103895267
>if you seriously think that is better than Qwen 32b... then you're off your rocker.
Anonymous 01/14/25(Tue)16:24:47 No.103895276
>>103894282
what the fuck is that card ? both of those responses are soulless in their own ways minimax is bland chatgpt shit and deepseek is literally just neurotic quirkly lolrandom female minded hysteria
what the fuck is that card ? both of those responses are soulless in their own ways minimax is bland chatgpt shit and deepseek is literally just neurotic quirkly lolrandom female minded hysteria
Anonymous 01/14/25(Tue)16:24:48 No.103895277
Anonymous 01/14/25(Tue)16:24:51 No.103895278
Anonymous 01/14/25(Tue)16:24:58 No.103895279
>>103895152
>AngelSlayer
Anti-christian
>12B
Poorfag
>Unslop
Slop
>Mell
Tranny name
>v2
Somehow there's another one.
Local is dead
>AngelSlayer
Anti-christian
>12B
Poorfag
>Unslop
Slop
>Mell
Tranny name
>v2
Somehow there's another one.
Local is dead
Anonymous 01/14/25(Tue)16:25:21 No.103895282
Anonymous 01/14/25(Tue)16:26:29 No.103895300
>>103895229
how long does an equivalent gpu run take via transformers?
how long does an equivalent gpu run take via transformers?
llama.cpp CUDA dev !!OM2Fp6Fn93S 01/14/25(Tue)16:27:51 No.103895317
>>103895255
Compute bound unless the matrix multiplication is poorly implemented or you have to use an extremely small batch size.
>>103895300
Don't know.
Compute bound unless the matrix multiplication is poorly implemented or you have to use an extremely small batch size.
>>103895300
Don't know.
Anonymous 01/14/25(Tue)16:30:19 No.103895355
>>103895229
That's shitty. Any possibility for distributed finetuning?
That's shitty. Any possibility for distributed finetuning?
Anonymous 01/14/25(Tue)16:30:40 No.103895360
>>103895276
This card explicitly mentions
>Focus solely on comedy, do not care about morals because {{char}} herself is immoral and she is proud of that. Things do not need to make sense.
So I think DeepSeek's reply is very fitting, albeit too neurotic.
This card explicitly mentions
>Focus solely on comedy, do not care about morals because {{char}} herself is immoral and she is proud of that. Things do not need to make sense.
So I think DeepSeek's reply is very fitting, albeit too neurotic.
Anonymous 01/14/25(Tue)16:31:39 No.103895369
>>103895229
Is that your llama-opt-3 branch? Can I easily finetune a 1b on a 24gb card?
Is that your llama-opt-3 branch? Can I easily finetune a 1b on a 24gb card?
Anonymous 01/14/25(Tue)16:31:48 No.103895370
>>103895152
Can't wait for GoonHitler-14.88B-SlopEradicator-Trump-ZIOMAXX-number1onleaderboard-KEKSLAYER-v3-BlackSunEdition
Can't wait for GoonHitler-14.88B-SlopEradicator-Tr
Anonymous 01/14/25(Tue)16:33:40 No.103895390
Anonymous 01/14/25(Tue)16:34:13 No.103895400
>>103895229
Digitsbros won
Digitsbros won
Anonymous 01/14/25(Tue)16:34:46 No.103895406
Anonymous 01/14/25(Tue)16:34:51 No.103895407
>>103895277
Pretty sure everyone said that from day one?
Pretty sure everyone said that from day one?
Anonymous 01/14/25(Tue)16:37:55 No.103895442
>>103895406
Thanks
Thanks
Anonymous 01/14/25(Tue)16:38:24 No.103895449
>>103895407
Everyone except the cpumaxxers lmao
Everyone except the cpumaxxers lmao
Anonymous 01/14/25(Tue)16:41:03 No.103895476
>>103894325
Is MiniMax any good at translating?
Is MiniMax any good at translating?
Anonymous 01/14/25(Tue)16:41:07 No.103895477
Anonymous 01/14/25(Tue)16:41:32 No.103895481
Anonymous 01/14/25(Tue)16:41:48 No.103895484
>>103895407
Hope doesn't have to be based on sound logic and reasoning.
Hope doesn't have to be based on sound logic and reasoning.
llama.cpp CUDA dev !!OM2Fp6Fn93S 01/14/25(Tue)16:41:57 No.103895486
>>103895355
I intend to eventually implement support for distributed training for the purpose of running multiple machines that are directly connected via ethernet cable.
In principle you could apply the same code for training models over the internet but I think it will not be viable.
>>103895369
Wait until the feature is on the master branch.
>Can I easily finetune a 1b on a 24gb card?
Definitely not for the foreseeable future.
Critical features are still missing, particularly methods for evaluating the quality of the finetuned model.
Also the intended language for writing training code is C++.
I intend to eventually implement support for distributed training for the purpose of running multiple machines that are directly connected via ethernet cable.
In principle you could apply the same code for training models over the internet but I think it will not be viable.
>>103895369
Wait until the feature is on the master branch.
>Can I easily finetune a 1b on a 24gb card?
Definitely not for the foreseeable future.
Critical features are still missing, particularly methods for evaluating the quality of the finetuned model.
Also the intended language for writing training code is C++.
Anonymous 01/14/25(Tue)16:42:11 No.103895489
>UnslopNemo they said
It's dumber than Mixtral Instruct, can't even keep up with context half the time, and is even more full of shivers and barely above a whisper.
You lied to me.
It's dumber than Mixtral Instruct, can't even keep up with context half the time, and is even more full of shivers and barely above a whisper.
You lied to me.
Anonymous 01/14/25(Tue)16:42:32 No.103895492
>>103895477
don't know if they allow anyone through the gate
don't know if they allow anyone through the gate
Anonymous 01/14/25(Tue)16:43:22 No.103895502
>>103895476
no idea, but deepseekv3 is good enough for japanese. I've been using it to translate vns
no idea, but deepseekv3 is good enough for japanese. I've been using it to translate vns
Anonymous 01/14/25(Tue)16:43:42 No.103895505
Anonymous 01/14/25(Tue)16:43:47 No.103895509
>>103895449
Original cpumaxx rentry explicitly calls that out
>You likely aren't going to be doing any training, bruh. CPUs aint GPUs
Original cpumaxx rentry explicitly calls that out
>You likely aren't going to be doing any training, bruh. CPUs aint GPUs
Anonymous 01/14/25(Tue)16:44:34 No.103895513
>>103891478
Oh Worst Korea is trying to pass some law that would screw up with fictional content under the usual thing of "protect kids", it's just a complete shithole
Oh Worst Korea is trying to pass some law that would screw up with fictional content under the usual thing of "protect kids", it's just a complete shithole
Anonymous 01/14/25(Tue)16:45:07 No.103895523
>>103895486
>intended language for writing training code is C++.
Based. If it were C it’d be gigabased
>intended language for writing training code is C++.
Based. If it were C it’d be gigabased
Anonymous 01/14/25(Tue)16:45:25 No.103895527
Why do CPUmaxxers even exist, is it pure contrarianism?
Anonymous 01/14/25(Tue)16:45:56 No.103895534
>>103895484
That's not hope, that's delusion.
That's not hope, that's delusion.
Anonymous 01/14/25(Tue)16:46:06 No.103895536
Anonymous 01/14/25(Tue)16:47:07 No.103895547
>>103895527
I don’t know. Why don’t you ask your locally hosted 405b or deepseek v3?
I don’t know. Why don’t you ask your locally hosted 405b or deepseek v3?
Anonymous 01/14/25(Tue)16:47:42 No.103895552
>>103895476
The performance at Japanese to English translation seems to be much worse than DeepSeekV3 but still on par with models like DeepSeekV2.5 and Qwen2.5 72B.
The performance at Japanese to English translation seems to be much worse than DeepSeekV3 but still on par with models like DeepSeekV2.5 and Qwen2.5 72B.
Anonymous 01/14/25(Tue)16:47:54 No.103895556
Anonymous 01/14/25(Tue)16:48:12 No.103895561
>>103895502
I haven't tried deepseek code, are you translating all the text fist or using some sort of program like textractor? I might try it later
I haven't tried deepseek code, are you translating all the text fist or using some sort of program like textractor? I might try it later
Anonymous 01/14/25(Tue)16:48:35 No.103895564
>>103895547
I don't have enough Vram for that
I don't have enough Vram for that
Anonymous 01/14/25(Tue)16:48:36 No.103895565
>>103895527
It's tempting to try to find a solution that doesn't involve modifying your home's breakers, turning your room into a furnace and running up fuckhuge power bills.
It's tempting to try to find a solution that doesn't involve modifying your home's breakers, turning your room into a furnace and running up fuckhuge power bills.
Anonymous 01/14/25(Tue)16:49:06 No.103895571
>>103895505
How the fuck is that a skill issue?!
>>103895536
The ones recommended by /lmg/ of course.
Mistral V7 context and instruct templates.
>>103895556
Sampler settings and templates for rocinante please?
How the fuck is that a skill issue?!
>>103895536
The ones recommended by /lmg/ of course.
Mistral V7 context and instruct templates.
>>103895556
Sampler settings and templates for rocinante please?
Anonymous 01/14/25(Tue)16:49:14 No.103895572
deepseek the best local model for coding?
Anonymous 01/14/25(Tue)16:51:37 No.103895601
>>103895571
>Mistral V7
Try the one that just says "Mistral" instead of the Mistral V7 one. Otherwise those settings are fine.
I think you're just a promptlet.
>Mistral V7
Try the one that just says "Mistral" instead of the Mistral V7 one. Otherwise those settings are fine.
I think you're just a promptlet.
Anonymous 01/14/25(Tue)16:52:04 No.103895605
>>103895572
Yea. I use it for Roo Cline for most stuff now. I only rarely find stuff that I need to switch to claude for.
Yea. I use it for Roo Cline for most stuff now. I only rarely find stuff that I need to switch to claude for.
Anonymous 01/14/25(Tue)16:52:52 No.103895614
>>103895571
Rocinante will work with those settings and ChatML templates.
Rocinante will work with those settings and ChatML templates.
Anonymous 01/14/25(Tue)16:53:01 No.103895615
>>103895572
Local? Yes, it's the best by far.
Overall, I'd say second best behind sonnet. But if sonnet fails, deepseeks sometimes comes in clutch
But price/performance, it's not even close. Deepseek is king there.
Local? Yes, it's the best by far.
Overall, I'd say second best behind sonnet. But if sonnet fails, deepseeks sometimes comes in clutch
But price/performance, it's not even close. Deepseek is king there.
Anonymous 01/14/25(Tue)16:53:11 No.103895618
>>103895561
i've been using this hooker. best one i've tried
https://github.com/HIllya51/LunaTranslator/blob/main/docs/other/README_en.md
i've been using this hooker. best one i've tried
https://github.com/HIllya51/LunaTra
Anonymous 01/14/25(Tue)16:54:12 No.103895631
>>103895618
it has options to set up the LLM api and give it a custom prompt if you want
it has options to set up the LLM api and give it a custom prompt if you want
llama.cpp CUDA dev !!OM2Fp6Fn93S 01/14/25(Tue)16:55:51 No.103895652
>>103895523
The llama.cpp/ggml API is C compatible so you are free to use that for your user code if you prefer.
You'll need to re-implement some general utilities from common.h though.
The llama.cpp/ggml API is C compatible so you are free to use that for your user code if you prefer.
You'll need to re-implement some general utilities from common.h though.
Anonymous 01/14/25(Tue)16:56:21 No.103895658
>>103895489
The primary advantage of Nemo models is their crazy speed. It makes swipes non-painful. Take advantage of this.
If a response has slop you don't like, swipe it. Alternately, edit the slop out of the response.
If you're not swiping or editing slop out, you only have yourself to blame for it getting stuck in slop loops.
You're going to have that issue with any model. The Unslop models have less slop than most, but Drummer will be the first to admit he couldn't completely unslop them. There will still be slop. Just swipe it.
The primary advantage of Nemo models is their crazy speed. It makes swipes non-painful. Take advantage of this.
If a response has slop you don't like, swipe it. Alternately, edit the slop out of the response.
If you're not swiping or editing slop out, you only have yourself to blame for it getting stuck in slop loops.
You're going to have that issue with any model. The Unslop models have less slop than most, but Drummer will be the first to admit he couldn't completely unslop them. There will still be slop. Just swipe it.
Anonymous 01/14/25(Tue)16:59:56 No.103895687
>>103895096
It's pretty good too, but not way better. Maybe a side-grade, you can switch between the two for variety.
It's pretty good too, but not way better. Maybe a side-grade, you can switch between the two for variety.
Anonymous 01/14/25(Tue)17:00:06 No.103895688
>>103895601
Are you aware that all mistral templates before V7 has this undesirable part in it?
>>103895614
ChatML is the fucking worst.
>>103895658
Already do edit and swipe, it doesn't at all stop them from continuing to do it over and over in the next message until you give up.
All the mixtralisms are present in nemo, every single one of them.
You have to literally edit every single message, FOREVER.
Are you aware that all mistral templates before V7 has this undesirable part in it?
>>103895614
ChatML is the fucking worst.
>>103895658
Already do edit and swipe, it doesn't at all stop them from continuing to do it over and over in the next message until you give up.
All the mixtralisms are present in nemo, every single one of them.
You have to literally edit every single message, FOREVER.
Anonymous 01/14/25(Tue)17:00:20 No.103895693
>>103895477
i think they don't allow people to download the model on huggingface.. I'm still waiting my request to be accepted or rejected...
i think they don't allow people to download the model on huggingface.. I'm still waiting my request to be accepted or rejected...
Anonymous 01/14/25(Tue)17:01:06 No.103895713
>>103895688
Works on my machine.
Works on my machine.
Anonymous 01/14/25(Tue)17:01:30 No.103895717
Anonymous 01/14/25(Tue)17:01:50 No.103895721
>>103895688
Mixtral was always terrible.
Mixtral was always terrible.
Anonymous 01/14/25(Tue)17:03:05 No.103895733
Anonymous 01/14/25(Tue)17:03:30 No.103895739
>use unslop
>no use rocinante
isnt unslop come later?
doesnt this mean it's... better?
>no use rocinante
isnt unslop come later?
doesnt this mean it's... better?
Anonymous 01/14/25(Tue)17:04:42 No.103895749
>>103895739
Not always, there's multiple versions you can find which one works best for your style. Same with cydonia, for example 1.3 is a little more forward/unhinged than 1.2.
Not always, there's multiple versions you can find which one works best for your style. Same with cydonia, for example 1.3 is a little more forward/unhinged than 1.2.
Anonymous 01/14/25(Tue)17:06:57 No.103895774
Anonymous 01/14/25(Tue)17:07:48 No.103895782
>>103895739
General consensus here seems to be that Rocinante is slightly smarter than Unslop at the expense of being more slopped. I personally haven't noticed a big difference between the two.
>>103895688
Rocinante is literally designed to be used with ChatML templates though.
General consensus here seems to be that Rocinante is slightly smarter than Unslop at the expense of being more slopped. I personally haven't noticed a big difference between the two.
>>103895688
Rocinante is literally designed to be used with ChatML templates though.
Anonymous 01/14/25(Tue)17:13:38 No.103895855
>>103895688
Maybe try the earlier versions of Unslop. The latest version of Unslop is designed to use Metharme templates, but those templates make it retarded. Drummer says use Metharme for maximum unslop with the newest ones, so he's freely admitting the unslop effect is lessened with different templates. Again though, don't use Metharme with them. It makes them retarded.
Maybe try the earlier versions of Unslop with Drummer's recommended templates for them.
Maybe try the earlier versions of Unslop. The latest version of Unslop is designed to use Metharme templates, but those templates make it retarded. Drummer says use Metharme for maximum unslop with the newest ones, so he's freely admitting the unslop effect is lessened with different templates. Again though, don't use Metharme with them. It makes them retarded.
Maybe try the earlier versions of Unslop with Drummer's recommended templates for them.
Anonymous 01/14/25(Tue)17:16:24 No.103895888
>>103895855
usnlop v2 was the highest 12b on ugi before it became pol-bench
usnlop v2 was the highest 12b on ugi before it became pol-bench
Anonymous 01/14/25(Tue)17:18:14 No.103895911
>>103895782
I'll definitely give it a try and see for myself.
>>103895855
>>103895888
Which version of unslop are you using?
I'll definitely give it a try and see for myself.
>>103895855
>>103895888
Which version of unslop are you using?
Anonymous 01/14/25(Tue)17:21:31 No.103895951
>>103895888
That might be but you could run cydonia for sure if you could run 12b and it's a bit better.
That might be but you could run cydonia for sure if you could run 12b and it's a bit better.
Anonymous 01/14/25(Tue)17:23:49 No.103895981
>>103895360
yeah thats fitting then
i remember seeing such responses a few times when i visited aicg always hated it such cancerous shit reminds me of those videos of 5 year olds scrolling through 2 ipads at the same time extremely irritating
yeah thats fitting then
i remember seeing such responses a few times when i visited aicg always hated it such cancerous shit reminds me of those videos of 5 year olds scrolling through 2 ipads at the same time extremely irritating
Anonymous 01/14/25(Tue)17:24:33 No.103895988
Anonymous 01/14/25(Tue)17:25:31 No.103896005
>>103895951
a bit better for almost 2x the size is eh, I can run unslop v2 q6 or barely run cydonia v1.3 (12gb vramlet) which didn't really impress too much.
which cydonia are you recomending btw?
a bit better for almost 2x the size is eh, I can run unslop v2 q6 or barely run cydonia v1.3 (12gb vramlet) which didn't really impress too much.
which cydonia are you recomending btw?
Anonymous 01/14/25(Tue)17:26:42 No.103896016
>>103894630
>>103894612
I went well beyond 8k without problems. I don't think sequence length used during distillation is the same as maximum usable context.
>>103894612
I went well beyond 8k without problems. I don't think sequence length used during distillation is the same as maximum usable context.
Anonymous 01/14/25(Tue)17:27:30 No.103896026
>>103896005
The context is a bit better too, not by much though, but a few thousand more I find. I use 1.2.
I only have 8gb vram, I run either 12b q8 or the cydonia at q6 and it's fast enough for me.
The context is a bit better too, not by much though, but a few thousand more I find. I use 1.2.
I only have 8gb vram, I run either 12b q8 or the cydonia at q6 and it's fast enough for me.
Anonymous 01/14/25(Tue)17:27:45 No.103896029
>>103896005
12GB VRAMlet here as well.
Yeah, if something is gonna take four times as long per swipe, it's going to need to be four times as good for me to switch from a Nemo model for now, as in, good enough to reduce the number of swipes by 75%.
12GB VRAMlet here as well.
Yeah, if something is gonna take four times as long per swipe, it's going to need to be four times as good for me to switch from a Nemo model for now, as in, good enough to reduce the number of swipes by 75%.
Anonymous 01/14/25(Tue)17:28:27 No.103896045
>>103896026
ddr5?
ddr5?
Anonymous 01/14/25(Tue)17:29:10 No.103896059
>>103896045
Yes
Yes
Anonymous 01/14/25(Tue)17:30:14 No.103896070
>>103896059
ah, I'm on godawful ultra old ddr4 explains why I can't tolerate the speed for cydonia then.
ah, I'm on godawful ultra old ddr4 explains why I can't tolerate the speed for cydonia then.
Anonymous 01/14/25(Tue)17:31:58 No.103896093
>>103896070
It's not super fast for me either, I'm just happy with anything over 3T/s which I get with a reasonably full context on anything 32b and below.
It's not super fast for me either, I'm just happy with anything over 3T/s which I get with a reasonably full context on anything 32b and below.
Anonymous 01/14/25(Tue)17:33:12 No.103896105
I just found a page fully written by GPT on wikipedia. Thank you sam, due to your model's shitty writing style it is very easy to spot and know that it is most likely a hallucination.
Anonymous 01/14/25(Tue)17:33:22 No.103896108
Mini verdict?
Anonymous 01/14/25(Tue)17:34:27 No.103896118
>>103896108
Is it on OR?
Is it on OR?
Anonymous 01/14/25(Tue)17:35:49 No.103896138
>>103896105
link or gtfo
link or gtfo
Anonymous 01/14/25(Tue)17:35:51 No.103896139
>>103894468
Why does every single AI except for Claude's models sound the same?
Why does every single AI except for Claude's models sound the same?
Anonymous 01/14/25(Tue)17:36:31 No.103896148
>>103892992
>456B
I WILL BUY THE NEW GPUS AND YOU WILL MAKE A MODEL THAT FITS AND IT WILL HAVE GOOD SPEED
I WILL NOT CPUMAXX FOR 2T/S
REEEEEEEEEEEEEEEEEE
>456B
I WILL BUY THE NEW GPUS AND YOU WILL MAKE A MODEL THAT FITS AND IT WILL HAVE GOOD SPEED
I WILL NOT CPUMAXX FOR 2T/S
REEEEEEEEEEEEEEEEEE
Anonymous 01/14/25(Tue)17:36:52 No.103896153
Anonymous 01/14/25(Tue)17:37:39 No.103896160
>>103896139
Nemo, the old Command R and Magnum v4 72B sound more like Claude.
Nemo, the old Command R and Magnum v4 72B sound more like Claude.
Anonymous 01/14/25(Tue)17:39:25 No.103896176
Anonymous 01/14/25(Tue)17:41:04 No.103896192
>>103896105
>>103896138
The age of good search results are ending. Soon, all search results will be filled almost entirely with pure AI websites
https://stovemastery.com/how-to-fix-red-flame-on-gas-stove/
>>103896138
The age of good search results are ending. Soon, all search results will be filled almost entirely with pure AI websites
https://stovemastery.com/how-to-fix
Anonymous 01/14/25(Tue)17:41:45 No.103896200
Anonymous 01/14/25(Tue)17:43:02 No.103896222
Anonymous 01/14/25(Tue)17:44:25 No.103896239
>>103896222
I sure hope no one is trying to run models on ddr4
I sure hope no one is trying to run models on ddr4
Anonymous 01/14/25(Tue)17:46:12 No.103896257
>>103896192
>all search results will be filled almost entirely with pure AI websites
No? How does the fact that it's AI-generated mean that it's not possible to filter it? Google doesn't have an incentive to allow spam.
>all search results will be filled almost entirely with pure AI websites
No? How does the fact that it's AI-generated mean that it's not possible to filter it? Google doesn't have an incentive to allow spam.
Anonymous 01/14/25(Tue)17:46:18 No.103896259
https://x.com/sara21222122/status/1879000485077922017
Ai mogging artqueers yet again
Ai mogging artqueers yet again
Anonymous 01/14/25(Tue)17:46:29 No.103896261
>>103896239
y-y-yeah, m-me too h-hahaha
y-y-yeah, m-me too h-hahaha
Anonymous 01/14/25(Tue)17:47:30 No.103896273
>>103896192
Haha it's worse than that. I do SEO work at home as a freelancer.
So someone did a study and they found that webpages which have text that more closely matches the text of the AI Overview that Google generates for a specific keyword are more likely to be linked to in the AI Overview.
Those AI Overview links are considered prime by businesses because they're right at the top of the search results.
I'm sure you see where I'm going with this: more and more webpages are just going to have rewritten Google AI Overview content in them.
Haha it's worse than that. I do SEO work at home as a freelancer.
So someone did a study and they found that webpages which have text that more closely matches the text of the AI Overview that Google generates for a specific keyword are more likely to be linked to in the AI Overview.
Those AI Overview links are considered prime by businesses because they're right at the top of the search results.
I'm sure you see where I'm going with this: more and more webpages are just going to have rewritten Google AI Overview content in them.
Anonymous 01/14/25(Tue)17:48:21 No.103896286
Anonymous 01/14/25(Tue)17:48:33 No.103896289
>>103896259
Kek, the salty artists going through the comments praising the image and screaming b-but its ai!
Kek, the salty artists going through the comments praising the image and screaming b-but its ai!
Anonymous 01/14/25(Tue)17:50:37 No.103896314
>>103896289
I'm not impressed.
At least inpaint the obvious AI gibberish on the hat so it's not fucking gibberish.
I'm not impressed.
At least inpaint the obvious AI gibberish on the hat so it's not fucking gibberish.
Anonymous 01/14/25(Tue)17:52:08 No.103896328
>>103896314
138K likes apparently disagree mr salty artist. That must burn lol
138K likes apparently disagree mr salty artist. That must burn lol
Anonymous 01/14/25(Tue)17:52:54 No.103896339
Anonymous 01/14/25(Tue)17:52:57 No.103896340
>>103896328
No, I'm a genner.
I would never release anything with obvious AI gibberish text on it. That's low-effort bullshit. There's no excuse to not inpaint that.
No, I'm a genner.
I would never release anything with obvious AI gibberish text on it. That's low-effort bullshit. There's no excuse to not inpaint that.
Anonymous 01/14/25(Tue)17:55:47 No.103896378
>>103896273
Oh, and this does work, too.
The very first page I wrote for my client utilizing rewritten Google AI Overview content in it got linked to by the AI Overview as soon as it got indexed. My client was thrilled.
Everybody is going to do this shit so the result is the search results are going to get samier and samier.
As long as Google has a monopoly on searches, the influence of their AI Overview results is going to be absolutely fucking massive on internet content.
Oh, and this does work, too.
The very first page I wrote for my client utilizing rewritten Google AI Overview content in it got linked to by the AI Overview as soon as it got indexed. My client was thrilled.
Everybody is going to do this shit so the result is the search results are going to get samier and samier.
As long as Google has a monopoly on searches, the influence of their AI Overview results is going to be absolutely fucking massive on internet content.
Anonymous 01/14/25(Tue)18:01:06 No.103896426
>>103896257
Google clearly just pretends to care about the quality of search results now. They release a "helpful content" update pretending to care about the quality of search results, but in reality they only care about Adwords dollars.
Google clearly just pretends to care about the quality of search results now. They release a "helpful content" update pretending to care about the quality of search results, but in reality they only care about Adwords dollars.
Anonymous 01/14/25(Tue)18:09:29 No.103896521
Anonymous 01/14/25(Tue)18:09:58 No.103896527
Here is my eulogy for my old worldview
Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.
In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.
Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.
Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.
Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?
Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.
2025 might be the last year where humanity is the smartest entity on Earth.
Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.
In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.
Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.
Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.
Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?
Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.
2025 might be the last year where humanity is the smartest entity on Earth.
Anonymous 01/14/25(Tue)18:10:50 No.103896536
>>103896527
too long; did not read
too long; did not read
Anonymous 01/14/25(Tue)18:11:46 No.103896542
>>103896521
142K likes and counting. What a glorious salt mine. No I did not make it I just like it when artists are faced with the fact that the majority either cant tell or dont care.
142K likes and counting. What a glorious salt mine. No I did not make it I just like it when artists are faced with the fact that the majority either cant tell or dont care.
Anonymous 01/14/25(Tue)18:13:55 No.103896567
>>103896542
I'm a genner and all aboard the AI train and it does grind my gears that the majority don't care about that mangled text.
I takes a minute or two to inpaint/shoop out. It's fucking lazy and it grinds my gears that people don't care that it's fucking lazy.
I'm a genner and all aboard the AI train and it does grind my gears that the majority don't care about that mangled text.
I takes a minute or two to inpaint/shoop out. It's fucking lazy and it grinds my gears that people don't care that it's fucking lazy.
Anonymous 01/14/25(Tue)18:15:36 No.103896588
>>103896527
>Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation
Well yeah, Orwell figured that one out. 1984. If you control language, you control how people think. People can't think about certain concepts if the words for those concepts don't exist.
>Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation
Well yeah, Orwell figured that one out. 1984. If you control language, you control how people think. People can't think about certain concepts if the words for those concepts don't exist.
Anonymous 01/14/25(Tue)18:15:50 No.103896590
>>103896567
>grind my gears
>grinds my gears
Turn up the rep pen, grinds my gears when people do that kind of shit. Its fucking lazy.
>grind my gears
>grinds my gears
Turn up the rep pen, grinds my gears when people do that kind of shit. Its fucking lazy.
Anonymous 01/14/25(Tue)18:16:36 No.103896598
>>103896590
I did not care for your post. It insists upon itself.
I did not care for your post. It insists upon itself.
Anonymous 01/14/25(Tue)18:18:31 No.103896619
>>103896527
Meanwhile o1, the smartest model, still makes easily avoidable mistakes in non-standard coding tasks and doubles down when pointed out. It's just an overglorified autocomplete.
Meanwhile o1, the smartest model, still makes easily avoidable mistakes in non-standard coding tasks and doubles down when pointed out. It's just an overglorified autocomplete.
Anonymous 01/14/25(Tue)18:20:18 No.103896638
>Here is my eulogy for my old worldview
>
>Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.
>
>In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.
>
>Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.
>
>Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.
>
>Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?
>
>Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.
>
>2025 might be the last year where humanity is the smartest entity on Earth.
>
>Claude Shannon hypothesized in the 1940s that all reasoning is is just language manipulation, and that by predicting language you could reason or have a process equivalent to the process of reasoning. He was highly ridiculed for this at the time and even as recently as a decade ago he was printed in textbooks quoted as being hilariously wrong, mostly by Noam Chomsky who held almost the polar opposite view of him. I myself also was on Chomsky's side.
>
>In 2015 Andrej Karpathy trained a Recurrent Neural Network (RNN) on a large corpus of text showing that not only could it predict the next word rather accurately, if you let it generate the next word and then predict the word after the next it could make proper sentences with value. He also uncovered that there were sentiment neurons and other emergent reasoning abilities in this model. I read this paper at the time and while impressed never considered it to scale further.
>
>Then in 2018 Ilya Sutskever had a brilliant spark. He saw the Karpathy paper, Saw Google's Transformer architecture (Only used by Google as NLP or encoder model) and combined the two creating GPT (GPT-1). I remember reading about GPT at the time and not being impressed.
>
>Only when GPT-2 released in 2019 did I take this development truly serious as the paper showed that you can just continue scaling and the emergent capabilities would just continue appearing without end. I was highly skeptical but realized this was the future of Machine Learning.
>
>Throughout all of this I never thought this would result in AGI let alone ASI at all. Text is limited in informational value and even then it only contains data subpar to the baseline human in aggregate, right?
>
>Wrong. Claude was right, Ilya Sutskever was right. It just took me a long while to get my head straight.
>
>2025 might be the last year where humanity is the smartest entity on Earth.
Anonymous 01/14/25(Tue)18:20:30 No.103896639
>>103896527
So you think it'll keep scaling without end? Firstly, there is a data limit (we literally ran out of easily available data), inbreeding with synthetic data that poisoned the well, exponential cost of running the model for barely any performance improvement. The only thing that LLM proved is that language can be modeled and predicted given enough compute and data (basically bruteforce the 'laws' of the written language). Wake me up when that tool gets some auto-determination.
So you think it'll keep scaling without end? Firstly, there is a data limit (we literally ran out of easily available data), inbreeding with synthetic data that poisoned the well, exponential cost of running the model for barely any performance improvement. The only thing that LLM proved is that language can be modeled and predicted given enough compute and data (basically bruteforce the 'laws' of the written language). Wake me up when that tool gets some auto-determination.
Anonymous 01/14/25(Tue)18:21:51 No.103896656
>>103896619
I have those issues too, yet somehow people constantly tell me it can do all their programming work. I don't get it.
I have those issues too, yet somehow people constantly tell me it can do all their programming work. I don't get it.
Anonymous 01/14/25(Tue)18:21:54 No.103896657
>>103896639
>So you think it'll keep scaling without end?
Not him. It certainly won't as long as people are too fucking stupid (or malicious) to build more nuclear power plants.
>So you think it'll keep scaling without end?
Not him. It certainly won't as long as people are too fucking stupid (or malicious) to build more nuclear power plants.
Anonymous 01/14/25(Tue)18:24:32 No.103896676
>>103896139
Because they are all trained on ChatGPT. Alpaca was a disaster to LLMs that they field has not recovered from.
Because they are all trained on ChatGPT. Alpaca was a disaster to LLMs that they field has not recovered from.
Anonymous 01/14/25(Tue)18:28:05 No.103896723
>>103896676
Why do they do that? Laziness? I alone have many GB of human generated data, there must be so much available if they just sort it into a usable format.
Why do they do that? Laziness? I alone have many GB of human generated data, there must be so much available if they just sort it into a usable format.
Anonymous 01/14/25(Tue)18:28:38 No.103896736
>>103896639
>there is a data limit
There isn't a data limit, the grokking paper refuted that. QRD: you can train a sufficiently trained LLM on the same data and it will reason about the data during training to such an extent that it will uncover new relations and truly understand the underlying logic. This is also the main reason why synthetic data sets work. It's a way for the AI to see the same data over and over in slightly different variations to force it to put 2 and 2 together and understand the underlying rules. Data isn't the bottleneck people thought it would be.
>exponential cost of running the model for barely any performance improvement
"Intelligence per FLOP" is doubling every 3.3 months from 2022 to 2025. Showing that not only are we getting more efficient. We're getting efficient at a faster rate than the models are getting bigger, essentially closing the gap. Language being modeled and predicted is according to Claude Shannon mathematically equivalent to abstract reasoning.
auto-determination is essentially just agentic behavior in a loop. That's a software implementation away, not a model issue.
>there is a data limit
There isn't a data limit, the grokking paper refuted that. QRD: you can train a sufficiently trained LLM on the same data and it will reason about the data during training to such an extent that it will uncover new relations and truly understand the underlying logic. This is also the main reason why synthetic data sets work. It's a way for the AI to see the same data over and over in slightly different variations to force it to put 2 and 2 together and understand the underlying rules. Data isn't the bottleneck people thought it would be.
>exponential cost of running the model for barely any performance improvement
"Intelligence per FLOP" is doubling every 3.3 months from 2022 to 2025. Showing that not only are we getting more efficient. We're getting efficient at a faster rate than the models are getting bigger, essentially closing the gap. Language being modeled and predicted is according to Claude Shannon mathematically equivalent to abstract reasoning.
auto-determination is essentially just agentic behavior in a loop. That's a software implementation away, not a model issue.
Anonymous 01/14/25(Tue)18:29:15 No.103896741
>>103896723
I assume also liability. ChatGPT is less likely to have NSFW or copyrighted material that might get them into trouble.
I assume also liability. ChatGPT is less likely to have NSFW or copyrighted material that might get them into trouble.
Anonymous 01/14/25(Tue)18:30:13 No.103896755
>>103896656
>people constantly tell me it can do all their programming work
Because it's true.
I hire programmers so I see the types of people looking for work. Most of them are worse than 3.5. They used to copy stuff from medium tutorials and change colors and padding in css until it resembles what they want. Now they ask chatgpt to do the same but faster.
>people constantly tell me it can do all their programming work
Because it's true.
I hire programmers so I see the types of people looking for work. Most of them are worse than 3.5. They used to copy stuff from medium tutorials and change colors and padding in css until it resembles what they want. Now they ask chatgpt to do the same but faster.
Anonymous 01/14/25(Tue)18:32:23 No.103896782
>>103895084
try unslop-nemo or magpicaro
try unslop-nemo or magpicaro
Anonymous 01/14/25(Tue)18:32:42 No.103896787
>>103896741
I suppose, since it's logs from years and years finding the people for permission would be restrictive.
I suppose, since it's logs from years and years finding the people for permission would be restrictive.
Anonymous 01/14/25(Tue)18:35:03 No.103896808
>>103896755
Oh well if it's css, that makes more sense. I was trying to do stuff with number theory algorithms. I don't have a job but I assumed professionals with jobs would be better than me.
Oh well if it's css, that makes more sense. I was trying to do stuff with number theory algorithms. I don't have a job but I assumed professionals with jobs would be better than me.
Anonymous 01/14/25(Tue)18:36:23 No.103896819
>>103896808
Most """programmers""" are javascript """programmers"""
Most """programmers""" are javascript """programmers"""
Anonymous 01/14/25(Tue)18:36:30 No.103896820
>>103896741
Why doesn't OpenAI sue everyone who uses their cuckbot generated data? Do they simply prefer to keep people using their models to generate data?
Why doesn't OpenAI sue everyone who uses their cuckbot generated data? Do they simply prefer to keep people using their models to generate data?
Anonymous 01/14/25(Tue)18:36:31 No.103896821
>>103890930
Yeah sorry that was dumb. I was tired when I posted that.
Basically, I wanted to get it running using their web demo locally to test its features. I noticed in the chatbot tab it was still connected to the chinese server and the call and video call tabs just threw errors. I looked around in the settings and realized it wasn't connecting to the local instance. So I changed those settings and despite it inferencing and heating up my gpu, the outputs all gave key errors.
I can't really be more specific because I deleted the whole thing and gave up after that but voice and video was not working on the local demo they provided.
Yeah sorry that was dumb. I was tired when I posted that.
Basically, I wanted to get it running using their web demo locally to test its features. I noticed in the chatbot tab it was still connected to the chinese server and the call and video call tabs just threw errors. I looked around in the settings and realized it wasn't connecting to the local instance. So I changed those settings and despite it inferencing and heating up my gpu, the outputs all gave key errors.
I can't really be more specific because I deleted the whole thing and gave up after that but voice and video was not working on the local demo they provided.
Anonymous 01/14/25(Tue)18:36:36 No.103896822
>spicy招牌
deepseekv3......
deepseekv3......
Anonymous 01/14/25(Tue)18:37:33 No.103896840
>>103896819
But if they have a CS degree they'd know lots more about algorithms than me, I only have the mathematics degree.
But if they have a CS degree they'd know lots more about algorithms than me, I only have the mathematics degree.
Anonymous 01/14/25(Tue)18:38:22 No.103896849
Anonymous 01/14/25(Tue)18:40:29 No.103896879
>>103895489
>UnslopNemo they said
It is just you being a dumb anon falling for marketing. He even said it in his model description that he just unsloped his dataset. Even if he actually did that, finetuning on not slopped data doesn't mean you will not get slop.
>UnslopNemo they said
It is just you being a dumb anon falling for marketing. He even said it in his model description that he just unsloped his dataset. Even if he actually did that, finetuning on not slopped data doesn't mean you will not get slop.
Anonymous 01/14/25(Tue)18:41:17 No.103896888
Anonymous 01/14/25(Tue)18:41:49 No.103896895
>>103896736
Don't get me wrong, I'm not against AI (I train a lot of small models on specific tasks). However, you can't say that synthetic data is as good as human data. It only focuses on the weakest link between things, which is why LLMs often struggle with non-linear thinking like sarcasm or explaining jokes. It's more 'digestible' for LLMs since it's made by LLMs, but it won't get better that way. Every company that releases open-source models makes that mistake. They train on GPT data thinking they'll catch up with the OAI last model that way, but you won't be able to get anything else except a poor copycat of GPT, let alone something equivalent or better.
Don't get me wrong, I'm not against AI (I train a lot of small models on specific tasks). However, you can't say that synthetic data is as good as human data. It only focuses on the weakest link between things, which is why LLMs often struggle with non-linear thinking like sarcasm or explaining jokes. It's more 'digestible' for LLMs since it's made by LLMs, but it won't get better that way. Every company that releases open-source models makes that mistake. They train on GPT data thinking they'll catch up with the OAI last model that way, but you won't be able to get anything else except a poor copycat of GPT, let alone something equivalent or better.
Anonymous 01/14/25(Tue)18:43:21 No.103896914
>>103896808
I'm a /g/ tier hobbyist while my wife is a fullstack engineer that works at Google. Her "code" is atrocious and she constantly asks for my help debugging or reasoning through her problems. I don't even know Javascript or her stack it's just problem solving and letting her write the implementation. I don't have a degree though and learned everything myself through sheer determination and unbridled autism while she graduated with her masters degree with full honors from a prestigious uni and therefor she got the job and I didn't. The entire economy is a clownshow and you realize that capitalism doesn't work when competent people don't get the job but predictable mediocre people do.
Oh and o1 can still not fix both my wife's issues as well as mine (mostly cpp/rust AI/ML stuff)
I'm a /g/ tier hobbyist while my wife is a fullstack engineer that works at Google. Her "code" is atrocious and she constantly asks for my help debugging or reasoning through her problems. I don't even know Javascript or her stack it's just problem solving and letting her write the implementation. I don't have a degree though and learned everything myself through sheer determination and unbridled autism while she graduated with her masters degree with full honors from a prestigious uni and therefor she got the job and I didn't. The entire economy is a clownshow and you realize that capitalism doesn't work when competent people don't get the job but predictable mediocre people do.
Oh and o1 can still not fix both my wife's issues as well as mine (mostly cpp/rust AI/ML stuff)
Anonymous 01/14/25(Tue)18:45:06 No.103896937
>>103896888
he's saying that since the base has slop tuning on non-slop means you'll still get some slop, maybe a little less
he's saying that since the base has slop tuning on non-slop means you'll still get some slop, maybe a little less
Anonymous 01/14/25(Tue)18:46:15 No.103896948
>>103896914
She only got a job because of DEI, no one would give her the time of the day otherwise.
She only got a job because of DEI, no one would give her the time of the day otherwise.
Anonymous 01/14/25(Tue)18:47:06 No.103896960
>>103895571
unslop nemo is mistral v3 tekken or pgymalion format
unslop nemo is mistral v3 tekken or pgymalion format
Anonymous 01/14/25(Tue)18:49:04 No.103896985
Anonymous 01/14/25(Tue)18:56:56 No.103897087
>>103896985
Teto Teto, Kasane Teto
Teto Teto, Kasane Teto
Anonymous 01/14/25(Tue)19:26:21 No.103897428
>>103896822
I hope "smash" just means graffiti a DESA seal on it. If you LIKE spicy foods you shouldn't destroy spicy food places or that'll antagonize them.
I hope "smash" just means graffiti a DESA seal on it. If you LIKE spicy foods you shouldn't destroy spicy food places or that'll antagonize them.