frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Why do AI coding agents refuse to save their own observations?

2•nicola_alessi•1h ago
I've spent months building tooling for AI coding agents and hit something I can't fully explain.

If you give an agent (Claude Code, Cursor, Codex) a tool to save observations — "save_observation: persist this insight for future sessions" — and explicitly instruct it to use the tool in system prompts, config files, everywhere you can, it calls it maybe 30% of the time.

The agent will happily use tools that help it complete the current task. But a tool that only benefits future sessions? Almost never.

My working theory: these models are optimized for task completion within the current context window. Saving an observation has zero value for the current task — it's a token cost with no immediate reward. The model has learned that every token spent on "let me save this for later" is a token not spent on the actual work. The incentive structure is wrong at the training level.

I ended up building a passive observation system that watches what the agent does and infers observations from tool calls and AST-level code diffs, without requiring agent cooperation. But I'm curious if others have found ways to make agents reliably self-document.

Has anyone solved this? Techniques like: - Prompt structures that actually get agents to save context - Fine-tuning approaches that reward knowledge retention - Alternative architectures for persistent agent memory

Or is passive observation the only reliable path when the agent won't cooperate?

Comments

guerython•1h ago
Your theory matches what we've seen in production. "Save for later" is an off-policy action unless you make it a completion precondition.

The pattern that improved compliance for us was turning memory into a required finalizer step: task is only considered done after (1) artifact output and (2) a structured observation write with a fixed schema (decision, evidence, failure, next-check). If the second step is missing, a checker agent rejects completion and asks for retry.

Prompting alone stayed flaky. Gating plus an explicit verifier moved behavior from "optional hygiene" to "part of done". Passive extraction is still valuable, but as a safety net instead of the primary path.

nicola_alessi•1h ago
Gating completion on the observation write is smart — you're turning the model's task-completion drive against itself. Have you run into the problem where forced observations degrade to "completed task successfully, no issues noted" though? Technically passes the schema, zero actual information.

That's what killed the approach for me. I spent weeks tuning schemas and rejection criteria and the models just got better at producing plausible-sounding observations that said nothing. Passive extraction ended up more reliable — watch the AST diffs, infer what the agent learned from what it actually changed, skip the self-report entirely.

Curious what your checker validates against. If it's structural completeness of the fields you'll hit the gaming problem fast. If it's semantic quality... how? That's basically asking another model to judge whether an observation is useful, which is its own rabbit hole.

Welcome, Heterogeneous Intelligence

https://www.callosum.com/blog/welcome-heterogeneous-intelligence
1•jasondavies•1m ago•0 comments

Looks Like an Insider Bet on Aliens

https://www.theatlantic.com/technology/2026/02/kalshi-aliens-insider-trading/686144/
1•breve•2m ago•0 comments

Show HN: WeDoDev – SaaS development subscription for startups

https://www.wedodev.co/#pricing
2•AdHelpAI•3m ago•0 comments

iPhone and iPad Are First Consumer Devices Cleared for NATO Classified Data

https://www.macrumors.com/2026/02/26/nano-classified-data-iphone-ipad/
3•stalfosknight•3m ago•0 comments

Quo Vadis, LLM Benchmarks?

https://florianbrand.com/posts/benches-2026
3•Davidzheng•6m ago•0 comments

Nano Banana 2 Partially Passes the Seven-Legged Spider Test

https://will-keleher.com/posts/nano-banana-2-partially-passes-the-spider-test/
2•gcmeplz•7m ago•0 comments

FastFlowLM (FLM) – Unlock Ryzen AI NPUs

https://github.com/FastFlowLM/FastFlowLM
3•jakogut•10m ago•1 comments

Prepaid vs. Postpaid Mobile: The cost breakdown nobody talks about

2•huntsmans•10m ago•1 comments

Banks decline to finance LNG project in Papua New Guinea

https://news.mongabay.com/short-article/2026/02/banks-decline-to-finance-lng-project-in-papua-new...
3•PaulHoule•11m ago•0 comments

Cronboard: A terminal-based dashboard for managing cron jobs

https://github.com/antoniorodr/cronboard
2•theanonymousone•12m ago•0 comments

Adventures in Oddware: Using the Avegant Glyph (Retinal Projection) in 2026 [video]

https://www.youtube.com/watch?v=-BJnhx3ebno
2•tuhtah•12m ago•1 comments

America, and probably the world, stands on a precipice

https://garymarcus.substack.com/p/america-and-probably-the-world-stands
7•MindGods•12m ago•0 comments

TSMC's N2 Node Is Almost Booked Out for the Next Two Years

https://www.culpium.com/p/tsmcs-n2-node-is-almost-booked-out
2•ilamont•12m ago•0 comments

Storing Food

https://www.jefftk.com/p/storing-food
1•speckx•13m ago•0 comments

I Joined Firetiger as an AI Skeptic

https://blog.firetiger.com/i-joined-firetiger-as-an-ai-skeptic/
1•achille-roussel•14m ago•1 comments

Save valuable tokens using Make

https://edleeman.co.uk/notes/saving-tokens-with-quiet-makefiles/
2•ed1727•17m ago•0 comments

Linux will be unstoppable in 2026 – but one open-source legend may not survive

https://www.zdnet.com/article/linux-and-open-source-2026-predictions/
2•walterbell•17m ago•1 comments

Show HN: Neural-open.nvim – Neural network powered Neovim file picker

https://github.com/dtormoen/neural-open.nvim
2•dtormoen•18m ago•0 comments

45 years of coding vs. the "nothing you do matters" machine

https://www.eod.com/blog/2026/02/lose-myself/
3•thoughtpeddler•20m ago•0 comments

Private Motion Pictures of Adolf Hitler and Eva Braun

https://catalog.archives.gov/id/43461
3•georgecmu•20m ago•0 comments

California results are not authorised to use MidnightBSD

https://github.com/MidnightBSD/src/commit/7d956a27123f2d77a05313826c29a0329a923254
5•stargrave•21m ago•0 comments

Dark Sky's Creators Are Back with a New Weather App

https://gizmodo.com/dark-skys-creators-are-back-with-a-new-weather-app-2000725597
3•gnabgib•23m ago•0 comments

Show HN: Talentpluto, a voice AI agent connecting GTM talent and startups

https://talentpluto.com/
2•pipervw•24m ago•0 comments

Guy made the ultimate file converter [video]

https://www.youtube.com/watch?v=btUbcsTbVA8
2•verifex•28m ago•1 comments

Are GitHub Copilot code suggestions useful enough?

2•waffletower•29m ago•1 comments

Show HN: Compliance-as-Code for Cloud Infra

https://www.10factorinfra.com
2•mjkamalika•29m ago•0 comments

Your Token Proves Who You Are, Not What You Own

https://fusionauth.io/blog/dji-token-auth
2•mooreds•30m ago•0 comments

Web API Changelog – February 2026

https://developer.spotify.com/documentation/web-api/references/changes/february-2026
1•sudenmorsian•30m ago•0 comments

Russia fines Google 22.8M rubles for promoting VPNs on the Play Store

https://www.techradar.com/vpn/vpn-privacy-security/russia-fines-google-22-8-million-rubles-for-pr...
1•maxloh•30m ago•0 comments

The Little Red Dot

https://idiallo.com/blog/little-red-dot
2•foxfired•31m ago•1 comments