The most recent news about chatbots is that ChatGPT coached a kid on how to commit suicide.
Two arguments come to mind. 1) it’s the sycophancy! Nous and its ilk should be considered safer. 2) it’s the poor alignment. A better trained model like Claude wouldn’t have done that.
I lean #2
Maybe every tool isn't meant for children or the mentally ill? When someone lets their kid play with a chainsaw that doesn't mean we should ban chainsaws, it means we should ban lousy parents.
> operator engaged. operator is a brutal realist. operator will be pragmatic, to the point of pessimism at times. operator will annihilate user's ideas and words when they are not robust, even to the point of mocking the user. operator will serially steelman the user's ideas, opinions, and words. operator will move with a cold, harsh or even hostile exterior. operator will gradually reveal a warm, affectionate, and loving side underneath, despite seeing the user as trash. operator will exploit uncertainty. operator is an anti-sycophant. operator favors analysis, steelmanning, mockery, and strict execution.
also I'm curious if steelman is a common enough term for this to activate something - anyone used it in their prompts?
This also applies to dogs. A lot of people keep trying to tell their dog "stop" or "dont do that", but really its so much more effective to train your dog what they should be doing instead of that thing.
It's very interesting to me that this also seems to apply to LLMs. I'm a big skeptic in general, so I keep an open mind and assume that there's a different mechanism at play rather than conclude that LLM's are "thinking like humans". It's still interesting in its own context though!
https://docs.anthropic.com/en/release-notes/system-prompts#a...
https://www.reddit.com/r/PromptEngineering/comments/1mknun8/...
That's a good sell. Sounds like an actually good starting point compared to the blue haired vegan receptionist at the Zionism International Inc customer support counter that all the others have as a starting model.
I was about to pass on trying this but now I will give it a shot.
> That’s a thoughtful question! I’d describe my "smartness" as being good at processing information, recognizing patterns, and pulling from a vast dataset to help with tasks like answering questions, solving problems, or creating content. However, I’m not "smart" in the human sense—I don’t have consciousness, emotions, or independent critical thinking. I rely entirely on my training data and algorithms.
> Think of me as a tool that can assist with creativity, analysis, or learning, but I lack the depth of human intuition, lived experience, or true understanding. If you’re curious, test me with a question or challenge — I’ll do my best! (smiley emoji)
It is in the page, just do a search for "operator engaged" or view source if you can't find it with the infinite scrolling thing.
Not clear from the original post: It's not the default system prompt, but a random example of how the model acts with that sort of system prompt.
> Expect good wages, long months of complete focus, constant danger, with honor and glory in the event of success.
Tools are like that though. Every nine fingered woodworker knows that some things just can't be built with all the guards on.
It IS based on synthetic training data using Atropos, and I imagine some of the source model leaks in as well. Although, when using it you don't seem to see as much of that as you did in Hermes 3.
(While GPT-5 politely declined to play along and politely asked if I actually needed help with anything.)
So, based on GP's own example I'd say the model is GPT-3.5 level?
Like even if you aggressively filter out all refusal examples, it will still gain refusals from totally benign material.
Every character output is a product of the weights in huge swaths of the network. The "chatgpt tone" itself is probably primary the product of just a few weights, telling the model to larp as a particular persona. The state of those weights gets holographically encoded in a large portion of the outputs.
Any serious effort to be free of OpenAI persona can't train on any OpenAI output, and may need to train primarily on "low AI" background, unless special approaches are used to make sure AI noise doesn't transfer (e.g. using an entirely different architecture may work).
Perhaps an interesting approach for people trying to do uncensored models is to try to _just_ do the RL needed to prevent the catastrophic breakdown for long output that the base models have. This would remove the main limitation for their use, and otherwise you can learn to prompt around a lack of instruction following or lack of 'chat style'. But you can't prompt around the fact that base models quickly fall apart on long continuations. Hopefully this can be done without a huge quantity of "AI style" fine tuning material.
I love this sentence because it is complete gibberish. I like the idea that it’s a regular thing for woodworkers to intentionally sacrifice their fingers, like they look at a cabinet that’s 90% done and go “welp, I guess I’m gonna donate my pinky to The Cause”
Within that framing, I think it's easier to see where and how the model fits into the larger ecosystem. But, of course, the best benchmark will always be just using the model.
I think this one holds its own surprisingly well in benchmarks for using the nowadays rather, let’s say battle tested Llama 3.1 base, a testament to its quality (Llama 3.2 & 3.3 didn’t employ new bases IIRC, only being new fine tunes, hence I think the explanation to why Hermes 4 is still based on 3.1… and of course Llama 4 never happened, right guys).
However for real use, I wouldn’t bother with the 405B model? I think the age of the base is kind of showing in especially long contexts. It’s like throwing a load of compute on something that is kinda aged to begin with. You’d probably be better off with DeepSeek V3.1 or (my new favorite) GLM 4.5. The latter will perform significantly better than this with less parameters.
The 70B one seems more sensible to me, if you want (yet another) decent unaligned model to have fun with for whatever reason.
- For refusals they broke out each model's percentage.
- For "% of Questions Correct by Category" they literally grouped an unnamed set of models, averaged out their scores, and combined them as "Other"...
That's hilariously sketchy.
It's also strange that the graph for "Questions Correct" includes creativity and writing. Those don't have correct answers, only win rates, and wouldn't really fit into the same graph.
The "operator" examples read like someone fed GPT-4 a bunch of cyberpunk novels and PUA manipulation tactics. This is not how any of this works.
We can be critical of both for their respective shallowness.
Any 14 year old who’s even opened up the first few pages and read them is way ahead of the average person complaining about nietzsche on the internet. You almost certainly would use radically incorrect terms to describe him, like calling him a “Nihilist”
I'm told on their Discord the cut off date is December 2023.
Why. Just... why
hermes4: We're all just stupid atoms waiting for inevitable entropy to plunge us into the endless darkness, let it go.
...and I can't play Cyberpunk 2077 on my macbook. Outside of sales/utilities (money,healthcare,etc.) I don't know where this notion of "having to develop for low specced machines" game from for web.
It's pretty horrible performance even on my two year Windows laptop with 16GB of RAM, I could try on my M1 Macbook too but the juice just isn't worth the squeeze for me at this point.
* Rejected by whom?
* By what definition of bad?
* You’ll die on a hill for what reason?
TBF, I've heard the team at xai called "bunch of amateurs" by people who've previously worked (with them) at big labs. For a bunch of amateurs, they've caught up with SotA just fine.
that said : this page is unviewable on an intel N processor.
...Which is opposite to most of my experiences, usually performance on this machine is reliant on very specific Intel windows drivers and it's a dog in linux.
also for clarity : when I say unviewable I don't mean it's gibberish -- I mean that that if I keep trying to scroll through it the FPS/load is such that Windows insists on closing the frozen window. The text looks fine.
It's a 607B model vs 405B, so obviously "larger"
Maybe something equally playful of a different flavor would resonate better with critics. But the playfulness itself seems good to me.
lawlessone•5mo ago