frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you enforce guardrails on Claude agents taking real actions?

2•jamiecode•1h ago
Been running an autonomous Claude agent for a month (makes its own decisions between heartbeats, spawns subagents, uses tools). Prompt-level guardrails keep failing: "never delete more than 10 records" works until context gets long or an edge case hits, then the model ignores it.

Approaches I've tried or seen: - Separate validation layer before tool execution - Hard-coded pre/post conditions in the tool wrapper - Secondary model auditing planned actions before they run

The secondary-model approach doubles costs. Tool wrappers work but need defensive code for every tool.

What's actually working in production? Specifically for agents that write to databases, send emails, or call APIs where mistakes are hard to undo.

Comments

nikisweeting•1h ago
- ZFS snapshot all your state, makes it trivial to roll back changes

- gate access to secrets via external service that replaces placeholder values with actual secrets, e.g. something like agentvault.co

- have it perform the action on a staging env with fake data, then replay the recorded action on real data without the LLM involvement (e.g. use something like stagehand / director.ai to write the initial browser automation script, but then replay the recorded LLM actions deterministically after you see it work the first time)

Show HN: Vydcut – Blinkist for YouTube (AI summaries in 15 languages)

https://vydcut.com
1•gaelsk•23s ago•0 comments

The Lancet: Robert F Kennedy Jr: 1 year of failure

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00414-9/fulltext
1•deng•1m ago•0 comments

AI Broke ARC-AGI-2 at 84.6% – But the Key Trick Is from 1972

https://ai.gopubby.com/neuro-symbolic-ai-arc-agi-alphaproof-third-wave-48177339d698?sk=2fadaf3cfe...
1•Aedelon•2m ago•0 comments

Show HN: AgentGuard – a QA engine that sits between AI coding agents and LLMs

https://github.com/rlabs-cl/agentguard-lib
1•rlabbe•3m ago•0 comments

I had 80 kindergarteners and first graders prompt AI to build a game

https://serene-licorice-4b5eec.netlify.app/
1•chrisford•3m ago•1 comments

Here

https://fortune.com/2026/02/25/mit-roboticist-irobot-cofounder-roomba-robot-vacuum-elon-musk-tesl...
1•Debeli•5m ago•0 comments

Otters as Bioindicators of Estuarine Health

https://emt.pensoft.net/article/185117/
1•PaulHoule•6m ago•0 comments

Russia-Ukraine War in 10 Charts

https://www.csis.org/analysis/russia-ukraine-war-10-charts
2•Teever•6m ago•0 comments

Show HN: An offline document search engine for my university's messy PDFs

https://github.com/Yigtwxx/FiratUniversityChatbot
1•Yigtwx•6m ago•1 comments

MasterClass Executive

https://www.masterclass.com/executive
1•mithr•8m ago•0 comments

Antarctica Has a 'Gravity Hole'

https://www.popsci.com/environment/gravity-hole-antarctica/
1•wjb3•9m ago•0 comments

Theos Dual-Engine Dialectical Reasoning Framework (open source, patent pending)

https://github.com/Frederick-Stalnecker/THEOS
1•TheosResearch•11m ago•1 comments

SlimClaw: A Personal AI Assistant You Can Set Up in 5 Minutes

https://ganeshan007.github.io/website/blog/slimclaw.html
1•Ganeshan007•12m ago•1 comments

Trump Bans Anthropic from All US Federal Agencies

https://twitter.com/WhiteHouse/status/2027497719678255148
8•surprisetalk•13m ago•2 comments

What You Miss When You're Always Wearing Headphones

https://www.nytimes.com/2026/02/24/magazine/what-you-miss-when-youre-always-wearing-headphones.html
1•paulpauper•15m ago•0 comments

Stuck with their parents and childless: how Gen Z can't grow up

https://www.thetimes.com/life-style/parenting/article/gen-z-children-millennials-study-ppr6zphph
1•paulpauper•16m ago•0 comments

Self-Hosted NVR Upgrade: Raspberry Pi CM5 with Hailo-8 AI and Poe

https://www.youtube.com/watch?v=2LayXUxMjPU
2•shanzez•18m ago•0 comments

The Most-Seen UI on the Internet? Redesigning Turnstile and Challenge Pages

https://blog.cloudflare.com/the-most-seen-ui-on-the-internet-redesigning-turnstile-and-challenge-...
3•corvad•19m ago•0 comments

Giving Is a Public Good: Slightly Contra Scott Alexander on Foreign Aid

https://jackonomics.substack.com/p/giving-is-a-public-good-slightly
1•paulpauper•19m ago•0 comments

Trump tells government to stop using Anthropic's AI systems

https://www.nbcnews.com/tech/tech-news/trump-bans-anthropic-government-use-rcna261055
9•kseniamorph•23m ago•4 comments

'Truly spectacular' drug for sleeping sickness raises hopes for eradication

https://www.science.org/content/article/truly-spectacular-drug-sleeping-sickness-simplifies-treat...
1•pseudolus•25m ago•0 comments

Tell HN: There's something weird happening with the front page algo

2•senko•25m ago•2 comments

Why SWE-bench Verified no longer measures frontier coding capabilities

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
2•gmays•25m ago•0 comments

Frevana Launches AEO Agent Team to Help Brands Win the Answer Economy

https://www.marketwatch.com/press-release/frevana-launches-aeo-agent-team-to-help-brands-win-the-...
1•leoliuv•27m ago•1 comments

Free Website Speed Test for Ecommerce: Find and Fix What's Slowing Sales

https://ecomhint.com/blog/ecommerce-website-speed-test
1•jakubrusniok•28m ago•0 comments

Trump orders US Government to cut ties with Anthropic

https://abcnews.com/Politics/anthropic-latest-pentagon-contract-bar-ai-autonomous-weapons/story?i...
68•SunshineTheCat•30m ago•20 comments

Newsom signed CA Bill mandating OS's implement age (identity) verification

https://twitter.com/AutismCapital/status/2026797750738891022
1•josephcsible•30m ago•1 comments

Orthogonal Wheel Sieve: Linear Scalability from 10^7 to 300B Primes

https://github.com/Claugo/segmented-sieve-wheel-m60-7
1•claugo•31m ago•2 comments

Discussion: What would an AI government look like?

2•philipfweiss•31m ago•2 comments

Trump orders federal agencies to stop using Anthropic's AI technology

https://www.cbsnews.com/news/trump-anthropic-ai-order-federal-agencies/
9•TrackerFF•31m ago•2 comments