I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer
Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.
Alike writing a script and having the attitude "yeah, I am good at it, I don't need to actually run it to know if works" - well, likely, it won't work. Maybe because of a trivial mistake.
Technically not, we just don't have it high enough
You're doing exactly what you said you wouldn't though. Betting that network requests are more reliable than an LLM: fixing probability with more probability.
Not saying anything about the code - I didn't look at it - but just wanted to highlight the hypocritical statements which could be fixed.
LLM provides inputs to your system like any human would, so you have to validate it. Something like pydantic or Django forms are good for this.
Anyway, I've written a library in the past (way way before LLMs) that is very similar. It validates stuff and outputs translatable text saying what went wrong.
Someone ported the whole thing (core, DSL and validators) to python a while ago:
https://github.com/gurkin33/respect_validation/
Maybe you can use it. It seems it would save you time by not having to write so many verifiers: just use existing validators.
I would use this sort of thing very differently though (as a component in data synthesis).
> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.
Must be satire, right?
When Steer catches a failure (like an agent wrapping JSON in Markdown), it doesn’t just crash.
Say you are using AI slop without saying you are using AI slop.
> It's not X, it's Y.
Your investment is justified! I promise! There's no way you've made a devastating financial mistake!
I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.
I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.
o Claude goes away for 15 minutes, doesn't profile anything, many code changes.
o Announces project now performs much better, saving 70% CPU.
- Claude, test the performance.
o Performance is 1% _slower_ than previous.
- Claude, can I have a refund for the $15 you just wasted?
o [Claude waffles], "no".
This is the loop (and honestly, I predicted it way before it started):
1) LLMs can generate code from "natural language" prompts!
2) Oh wait, I actually need to improve my prompt to get LLMs to follow my instructions...
3) Oh wait, no matter how good my prompt is, I need an agent (aka a for loop) that goes through a list of deterministic steps so that it actually follows my instructions...
4) Oh wait, now I need to add deterministic checks (aka, the code that I was actually trying to avoid writing in step 1) so that the LLM follows my instructions...
5) <some time in the future>: I came up with this precise set of keywords that I can feed to the LLM so that it produces the code that I need. Wait a second... I just turned the LLM into a compiler.
The error is believing that "coding" is just accidental complexity. "You don't need a precise specification of the behavior of the computer", this is the assumption that would make LLM agents actually viable. And I cannot believe that there are software engineers that think that coding is accidental complexity. I understand why PMs, CEOs, and other fun people believe this.
Side note: I am not arguing that LLMs/coding agents are nice. T9 was nice, autocomplete is nice. LLMs are very nice! But I am starting to be a bit too fed up to see everyone believing that you can get rid of coding.
Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.
jqpabc123•3d ago
Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
steerlabs•3d ago
My thesis isn't that we can stop the hallucinating (non-determinism), but that we can bound it.
If we wrap the generation in hard assertions (e.g., assert response.price > 0), we turn 'probability' into 'manageable software engineering.' The generation remains probabilistic, but the acceptance criteria becomes binary and deterministic.
jqpabc123•2d ago
Unfortunately, the use-case for AI is often where the acceptance criteria is not easily defined --- a matter of judgment. For example, "Does this patient have cancer?".
In cases where the criteria can be easily and clearly stipulated, AI often isn't really required.
steerlabs•15h ago
My thesis is that even in those "fuzzy" workflows, the agent's process is full of small, deterministic sub-tasks that can and should be verified.
For example, before the AI even attempts to analyze the X-ray for cancer, it must: 1/ Verify it has the correct patient file (PatientIDVerifier). 2/ Verify the image is a chest X-ray and not a brain MRI (ModalityVerifier). 3/ Verify the date of the scan is within the relevant timeframe (DateVerifier).
These are "boring," deterministic checks. But a failure on any one of them makes the final "judgment" output completely useless.
steer isn't designed to automate the final, high-stakes judgment. It's designed to automate the pre-flight checklist, ensuring the agent has the correct, factually grounded information before it even begins the complex reasoning task. It's about reducing the "unforced errors" so the human expert can focus only on the truly hard part.
malfist•34m ago
jennyholzer•20m ago
multjoy•27m ago
scotty79•49m ago
Which is kind of crazy because we don't even treat people as databases. Or at least we shouldn't.
Maybe it's one of those things that will disappear form culture one funeral at a time.
hrimfaxi•27m ago
squidbeak•33m ago
philipallstar•32m ago
The overwhelming majority of what?
Davidzheng•40m ago
some_furry•20m ago
pixl97•16m ago
loloquwowndueo•14m ago
rthrfrd•20m ago
jennyholzer•17m ago
fzeindl•25m ago
"Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful - treating all inputs uniformly - is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything."
- https://www.schneier.com/crypto-gram/archives/2025/1115.html...
zahlman•24m ago
HarHarVeryFunny•22m ago
You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.
toddmorey•10m ago
DoctorOetker•21m ago
When numeric models are fit to say scientific measurements, they do quite a good job at modeling the probability distribution. With a corpus of text we are not modeling truths but claims. The corpus contains contradicting claims. Humans have conflicting interests.
Source-aware training (which can't be done as an afterthought LoRA tweak, but needs to be done during base model training AKA pretraining) could enable LLM's to express according to which sources what answers apply. It could provide a review of competing interpretations and opinions, and source every belief, instead of having to rely on tool use / search engines.
None of the base model providers would do it at scale since it would reveal the corpus and result in attribution.
In theory entities like the European Union could mandate that LLM's used for processing government data, or sensitive citizen / corporate data MUST be trained source-aware, which would improve the situation, also making the decisions and reasoning more traceable. This would also ease the discussions and arguments about copyright issues, since it is clear LLM's COULD BE MADE TO ATTRIBUTE THEIR SOURCES.
I also think it would be undesirable to eliminate speculative output, it should just mark it explicitly:
"ACCORDING to <source(s) A(,B,C,..)> this can be explained by ...., ACCORDING to <other school of thought source(s) D,(E,F,...)> it is better explained by ...., however I SUSPECT that ...., since ...."
If it could explicitly separate the schools of thought sourced from the corpus, and also separate its own interpretations and mark them as LLM-speculated-suspicions, then we could still have the traceable references, without losing the potential novel insights LLM's may offer.
jennyholzer•15m ago
DoctorOetker•12m ago
https://arxiv.org/abs/2404.01019
"Source-Aware Training Enables Knowledge Attribution in Language Models"
sweezyjeezy•15m ago
I don't think using deterministic / stochastic as the dividing property is useful here if we're talking about a tool to mimic humans. Describing a human coder as 'deterministic' doesn't seem right - if you give one the same tasks under different environmental conditions, I don't think you get exactly the same outputs either. I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.
pydry•14m ago