> The ~23,000 tokens in the system prompt – taking up just over 1% of the available context window
Am I missing something or is this a typo?
Has anybody been working on better ways to prevent the model from telling people how to make a dirty bomb from readily available materials besides putting "dont do that" in the prompt?
It may feel like the company censoring users at this stage, but there will come a stage where we’re no longer really driving the bus. That’s what this stuff is ultimately for.
That's what Anthropic's "constitutional AI" approach is meant to solve: https://www.anthropic.com/research/constitutional-ai-harmles...
These are matrixes of tokens that produce other tokens based on training.
These do not understand the world. existing, or human beings, beyond words. period.
How do we get HGI (human general intelligence) to understand this? We've not solved the human alignment problem.
I think that's the best shot here as well. You want the first AGIs and the most powerful AGIs and the most common AGIs to understand it. Then when we inevitably get ones that don't, intentionally or unintentionally, the more-aligned majority can help stop the misaligned minority.
Whether that actually works, who knows. But it doesn't seem like anyone has come up with a better plan yet.
These llms won't be magically more moral than humans are, even in best case (and I have hard time believing such case is realistic, too much power in these). Humans are deeply flawed creatures, easy to manipulate via emotions, shooting themselves in their feet all the time and happy to even self-destruct as long as some dopamine kicks keep coming.
Imagine if the rm command refused to delete a file because Trump deemed it could contain secrets of the Democrats. That's where we are and no one is bothered. Hackers are dead and it's sad.
Seems like we are already here today with cybersecurity.
Learning how malicious code works is pretty important to be able to defend against it.
Motivated by the link to Metamorphosis of Prime Intellect posted recently here on HN, I grabbed the HTML, textified it and ran it through api.openai.com/v1/audio/speech. Out came a rather neat 5h30m audio book. However, there was at least one paragraph that ended up saying "I am sorry, I can not help with that", meaning the "safety" filter decided to not read it.
So, the infamous USian "beep" over certain words is about to be implemented in synthesized speech. Great, that doesn't remind me about 1984 at all. We don't even need newspeak to prevent certain things from being said.
Today they won’t let me drive 200mph on the freeway. Tomorrow it could be putting speed bumps in the fast lane. The next day combat aircraft will shoot any moving vehicles with Hellfire missiles and we’ll all have to sit still in our cars and starve to death. That’s why we must allow drivers to go 200mph.
Claude’s “Golden Gate” experiment shows that precise behavioral changes can be made around specific topics, as well. I assume this capability is used internally (or a better one has been found), since it has been demonstrated publicly.
What’s more difficult to prevent are emergent cases such as “a model which can write good non-malicious code appears to also be good at writing malicious code”. The line between malicious and not is very blurry depending on how and where the code will execute.
They probably don't give Claude.ai's prompt too much attention anyway, it's always been weird. They had many glaring bugs over time ("Don't start your response with Of course!" and then clearly generated examples doing exactly that), they refer to Claude in third person despite first-person measurably performing better, they try to shove everything into a single prompt, etc.
>I assume this capability is used internally (or a better one has been found)
By doing so they would force users to rewrite and re-eval their prompts (costly and unexpected, to put it mildly). Besides, they admitted it was way too crude (and found a slightly better way indeed), and from replication of their work it's known to be expensive and generally not feasible for this purpose.
“Is this thing dangerous?”
> Nope.
So it’ll needed to be contained, and it’ll find its way to the warez groups, rinse, repeat.
1.) A model with a system prompt: "you are a specialist in USDA dairy regulations". 2.) A model fine tuned to know a lot about USDA regulations related to dairy production.
The fine tuned model is going to be a lot more effective at dealing with milk related topics. In general the system prompt gets diluted quickly as context grows, but the fine tuning is baked into the model.
And this repo provides no documentation about how they were extracted, which would be useful at least to try to verify them by replication.
[1] https://pubs.aip.org/physicstoday/online/5748/Navigating-a-c...
It’s still curious that things like these needs prompting, instead of having an awareness mechanism from which this would be obvious to the LLM (given that the LLM knows its knowledge cutoff, in the above case).
Of course, I can imagine many things.
The base models are eerie. People have done some amazing creative work with them, but I honestly think the base models are so disconcerting as to effectively force nearly every R&D lab out there to run to instruction tuning and otherwise avoid having to work with base models.
I think it's so frustrating and uncanny valley and alien dealing with the edge cases of the good, big base models that we're missing a lot of fun and creative use cases.
The performance hit from fine-tuning is what happens when the instruct tuning and alignment post-training datasets distort the model of reality learned by the AI, and there are all sorts of unintended consequences, ranging from full on Golden Gate Claude levels of delusion to nearly imperceptible biases.
Robopsychology is in its infancy, and I can't wait for the nuanced and skillful engineering of minds to begin.
There is a 3-level hierarchy:
System prompt > Developer prompt > User chat
You provide that middle level.
if you haven’t read the system prompts before, you should.
might change how you see things. might change what you see.
hammock•1d ago
pkaye•1d ago
https://docs.anthropic.com/en/release-notes/system-prompts
srivmo•21h ago
This is the 24k tokens, unofficial Claude 3.7 system prompt (as claimed) https://github.com/asgeirtj/system_prompts_leaks/blob/main/A...