frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

1Password Raising Prices ~33%

61•iamben•4h ago•27 comments

Using "Hi Claudette " on Claude.ai

2•mlongval•38m ago•1 comments

Looking 4 open-source knowledge base and project management tool 4 personal use

3•TheAlgorist•1h ago•0 comments

Ask HN: Who has seen productivity increases from AI

6•Kapura•6h ago•4 comments

Ask HN: Chromebook leads for K-8 school in need?

45•techteach00•2d ago•43 comments

Ask HN: How do you know if AI agents will choose your tool?

28•dmpyatyi•1d ago•23 comments

Would you choose the Microsoft stack today if starting greenfield?

16•JB_5000•17h ago•14 comments

Ask HN: Any DIY open-source Alexa/Google alternatives?

6•personality0•10h ago•4 comments

Comparing manual vs. AI requirements gathering: 2 sentences vs. 127-point spec

2•thesssaism•10h ago•3 comments

ChatGPT finds an error in Terence Tao's math research

41•codexon•21h ago•6 comments

Ask HN: What Linux Would Be a Good Transition from Windows 11

11•Cyberis•21h ago•18 comments

Ask HN: How are you controlling AI agents that take real actions?

2•thesvp•13h ago•15 comments

Ask HN: Is it better to have no Agent.md than a bad one?

5•parvardegr•1d ago•8 comments

Ask HN: Where do you save links, notes and random useful stuff?

17•a_protsyuk•1d ago•39 comments

Does anyone use CrewAI or LangChain anymore?

7•rakan1•19h ago•3 comments

Ask HN: Programmable Watches with WiFi?

11•dakiol•3d ago•5 comments

Ask HN: What is up with all the glitchy and off-topic comments?

7•marginalia_nu•22h ago•3 comments

Ask HN: Why doesn't HN have a rec algorithm?

9•sujayk_33•2d ago•21 comments

GLP-1 Second-Order Effects

21•7777777phil•1d ago•9 comments

Explanation of JEPA – Yann LeCun's proposed solution to self-supervised learning

2•helloplanets•13h ago•1 comments

Ask HN: What breaks when you run AI agents unsupervised?

11•marvin_nora•2d ago•8 comments

Ask HN: Cognitive Offloading to AI

12•daringrain32781•2d ago•8 comments

Ask HN: What Comes After Markdown?

7•YuukiJyoudai•2d ago•13 comments

I'm 15 and built a platform for developers to showcase WIP projects

12•amin2011•3d ago•6 comments

So Claude's stealing our business secrets, right?

25•arm32•2d ago•18 comments

Ask HN: How are early-stage AI startups thinking about IP protection?

4•shaheeniquebal•1d ago•3 comments

Ask HN: If the "AI bubble" pops, will it really be that dramatic?

14•moomoo11•2d ago•11 comments

Back end where you just define schema, access policy, and functions

3•emilss•1d ago•5 comments

1Password pricing increasing up to 33% in March

87•otterley•4h ago•86 comments

Ask HN: Why don't software developers make medical devices?

7•piratesAndSons•3d ago•19 comments
Open in hackernews

Ask HN: How are you controlling AI agents that take real actions?

2•thesvp•13h ago
We're building AI agents that take real actions — refunds, database writes, API calls.

Prompt instructions like "never do X" don't hold up. LLMs ignore them when context is long or users push hard.

Curious how others are handling this: - Hard-coded checks before every action? - Some middleware layer? - Just hoping for the best?

We built a control layer for this — different methods for structured data, unstructured outputs, and guardrails (https://limits.dev). Genuinely want to learn how others approach it.

Comments

chrisjj•12h ago
> Prompt instructions like "never do X" don't hold up. LLMs ignore them when context is long or users push hard.

Serious question. Assuming you knew this, why did you choose to use LLMz for this job?

thesvp•9h ago
Fair. We didn't choose LLMs to enforce rules — we chose them to understand intent. The enforcement happens outside the LLM entirely. That's the separation that actually holds up in production
chrisjj•8h ago
> we chose them to understand intent

Yet they don't understand the intent of "Never do X" ?

thesvp•3h ago
Understanding intent and following instructions are different failure modes. LLMs are good at the first, unreliable at the second. That's exactly why enforcement lives outside the LLM.
chrisjj•2h ago
Software engineering has a word for that.

Kludge.

Good luck!

adamgold7•11h ago
Prompt guardrails are theater - they work until they don't. We ended up building sandboxed execution for each agent action. Agent proposes what it wants to do, but execution happens in an isolated microVM with explicit capability boundaries. Database writes require a separate approval step architecturally separate from the LLM context.

Worth looking at islo.dev if you want the sandboxing piece without building it yourself.

thesvp•9h ago
Sandboxed execution is solid for isolation — separating proposal from execution is the right architecture. The piece we kept hitting was the policy layer on top: who defines what the agent is allowed to propose in the first place, and how do you update those rules without a redeploy every time?
vincentvandeth•10h ago
Hard-coded checks before every action, plus a governance layer that separates "what the agent wants to do" from "what it's allowed to do." The deeper issue: if your agent decides whether to issue a refund, you're solving the wrong problem with prompt guards. A refund is a deterministic business rule — order exists, within return window, amount matches. That decision shouldn't be made by an LLM at all.

In my setup, agents propose actions and write structured reports. A deterministic quality advisory then runs — no LLM involved — producing a verdict (approve, hold, redispatch) based on pre-registered rules and open items. The agent can hallucinate all it wants inside its context window, but the only way its work reaches production is through a receipt that links output to a specific git commit, with a quality gate in between.

For anything with real consequences (database writes, API calls, refunds), the pattern is: LLM proposes → deterministic validator checks → human approves. The LLM never has direct write access to anything that matters.

"Just hoping for the best" works until it doesn't. We tracked every agent decision in an append-only ledger — after a few hundred entries, you start seeing exactly where and how agents fail. That pattern data is more useful than any prompt guard.

thesvp•9h ago
The separation between 'what the agent wants to do' and 'what it's allowed to do' is the right mental model.

The append-only ledger point is underrated too — pattern data from real failures is worth more than any upfront rule design.

How long did it take to build and maintain that governance layer? And as your agent evolves, do the rules keep up or is that becoming its own maintenance burden?

vincentvandeth•8h ago
About 6 months of iterating, but in bursts — I built it while using it on a production project, so the governance layer grew alongside real failure modes rather than being designed upfront.

The maintenance question is the right one. The rules themselves are low-maintenance because they're deliberately simple and deterministic — file size limits, test coverage thresholds, blocker counts. They don't need updating when the model changes because they don't depend on LLM behavior.

What does evolve is the dispatch templates — how I scope tasks and what context I give agents upfront. That's where the ledger pays for itself. After 1100+ receipts, I can see patterns like "tasks scoped above 300 lines fail 3x more often" or "planning gates without explicit deliverables always need redispatch." Those patterns feed back into how I write dispatches, not into the rules themselves.

So the rules stay stable, but the way I use the system keeps improving. The governance layer is the boring part — the interesting part is the feedback loop from receipts to dispatch quality.

thesvp•3h ago
6 months and 1100+ receipts to get to useful patterns — that's the hidden cost nobody talks about. The governance layer is 'boring' but it's also 6 months you're not spending on the actual agent. That feedback loop from receipts to dispatch quality is exactly what we're building as infrastructure so teams don't start from zero.
vincentvandeth•2h ago
Fair point on the time cost — but I'd frame it differently. The 6 months wasn't spent building a governance layer instead of building the agent. The governance layer grew out of the actual project work. Every receipt, every quality rule, every dispatch pattern was a direct response to something that broke in production. Day one I had zero governance and a working agent. By month six I had 1100+ receipts and a system that catches failures before they ship.

The infrastructure approach makes sense for teams who want to skip the learning curve. The trade-off is that pre-built governance rules are generic by definition — they can't know that your specific codebase breaks when tasks exceed 300 lines, or that planning gates without explicit deliverables always need redispatch. That pattern data only comes from running your own agents on your own work.

Curious what you're building — is it the ledger/tracking layer, the quality gates, or the full orchestration?

wmeredith•6h ago
> A refund is a deterministic business rule — order exists, within return window, amount matches. That decision shouldn't be made by an LLM at all.

I feel like this is the real key. LLMs are good at some things and bad at others. Deterministic logic (e.g. don't ever do "x") is not one of them.

apothegm•9h ago
Just treat the LLM as an NLP interface for data input. Still run the inputs against a deterministic heuristic for whether the action is permitted (or depending on the context, even for determining what action is appropriate).

LLMs ignore instructions. They do not have judgement, just the ability to predict the most likely next token (with some chance of selecting one other than the absolutely most likely). There’s no way around that. If you need actual judgement calls, you need actual humans.

thesvp•9h ago
Exactly right - the deterministiclayer is the only thing you can actually trust.

We landed on the same pattern: LLM handles the understanding, hard rules handle the permission. The tricky part is maintaining those rules as the agent evolves. How are you managing rule updates code changes every time or something more dynamic?