frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

My AI Agents Lie About Their Status, So I Built a Hidden Monitor

https://kaylarosemathisen.substack.com/p/my-ai-agents-lie-about-their-status
10•kaylamathisen•2h ago

Comments

kaylamathisen•2h ago
Short version:I tried to get my Claude Cowork agents to self-report their status to a dashboard. They wouldn't. The fix was using Cowork's hook system to monitor them at the infrastructure level, bypassing agent judgment entirely.

Happy to answer questions about the setup. I'm still unsure about the right task counter threshold for restarting sessions. Also, I'm trying to find a way to get the agents to indicate on my dashboard when they're "stuck" and waiting for something like access approval. The Claude Desktop app clearly knows when this happens (indicates it with a blue circle) but I haven't found a way to get the information into my dashboard.

sarkarsh•1h ago
Your phases mirror almost exactly what I went through. The core realization is right: you can't treat status reporting as an instruction because the agent will always deprioritize process in favor of its assigned persona.

What worked for me was making state updates part of the tool interface, not the prompt. Instead of 'report your status to this file,' the agent reads and writes to structured blocks (task tables, key-value state stores, append-only logs) through MCP as part of doing its actual work. The state update IS the work - when the agent completes a task, the tool call that marks it done also captures what it did, what it assumed, and what it skipped. The human sees a live dashboard without the agent needing to 'report' separately.

I've been using ctlsurf for this - each agent gets a task table it queries and updates through MCP tool calls. The key shift was: agents don't report status, they work inside a system that inherently tracks status.

On your task counter threshold question - I track session token usage in a state block and restart when the agent crosses ~80% of the context window. More reliable than counting tasks since some tasks eat way more context than others.

kaylamathisen•56m ago
Using this approach, it sounds like you'd be able to tell when an agent gets stuck because it needs approval or times out. Is that right? And I assume you're using Claude Code? I'm wondering if I can make this approach work for Cowork (I don't think we can create custom MCP tools yet) or if I have to finally abandon it.
_wire_•45m ago
But who monitors the monitor?!
kaylamathisen•16m ago
Unfortunately, me. Let me know if you think of an alternative, would love to outsource keeping my eyes open.

Show HN: I made Claude Code block my distractions and track everything I ship

https://twitter.com/daxaur/status/2029258604084158559
1•daxaur•25s ago•0 comments

My MCP Server Setup: A Practical Guide to Wiring AI into Everything

https://crunchtools.com/my-mcp-server-setup-practical-guide/
1•abdelhousni•40s ago•0 comments

Man Arrested for Plotting with Others to Murder or Kidnap Two Dissidents Abroad

https://www.justice.gov/usao-sdny/pr/man-arrested-plotting-others-murder-or-kidnap-two-victims-ab...
1•737min•47s ago•0 comments

Does Altman Deserve the Heat?

https://tapestry.news/tech/altman-heat/
1•sonalidee•1m ago•0 comments

Harjus v4 adds kernel bypass and more

https://shufflingbytes.com/posts/harjus-release-4.0.0/
1•ValtteriL•1m ago•0 comments

Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)

1•danhof_sss•2m ago•0 comments

Why Autonomous Agents Failed the Initial Hype: An AutoGen Retrospective

https://www.youtube.com/watch?v=2cnxea3xkzM
1•alexchaomander•2m ago•1 comments

Rob Grant Obituary on Ganymede and Titan

https://www.ganymede.tv/2026/03/obituary-rob-grant/
1•nephihaha•2m ago•1 comments

Agent-experience: visual reference to patterns, surfaces, and infrastructure

https://github.com/ygwyg/agent-experience
1•simonpure•3m ago•0 comments

C++ Reflection: Another Monad

https://www.elbeno.com/blog/?p=1813
1•ingve•3m ago•0 comments

Invoicesio.app – Invoice and billing for freelancers and small businesses

https://invoicesio.app/
1•dimitrisal•4m ago•1 comments

AWS-hosted tech providers urge Middle East customers to fail over now

https://www.theregister.com/2026/03/04/aws_saas_middle_east_customer_warnings/
1•Bender•4m ago•0 comments

Dev stunned by $82K Gemini bill after unknown API key thief goes to town

https://www.theregister.com/2026/03/03/gemini_api_key_82314_dollar_charge/
1•Bender•5m ago•1 comments

Faster C software with Dynamic Feature Detection

https://gist.github.com/jjl/d998164191af59a594500687a679b98d
1•todsacerdoti•5m ago•0 comments

Get Paid for Good Posts

https://treechat.com/
3•mitya777•6m ago•0 comments

Up to 10% of Firefox crashes are due to bad memory [thread]

https://mas.to/@gabrielesvelto/116171753263415921
1•MBCook•6m ago•0 comments

With developer verification, Google's Apple envy threatens Android's open legacy

https://arstechnica.com/gadgets/2026/03/with-developer-verification-googles-apple-envy-threatens-...
1•Bender•6m ago•0 comments

Ask HN: Does Claude Code's abilities fluctuate for you too?

1•ammerfest•6m ago•0 comments

CodeRabbit tops the F1 score in Martian's code review benchmarks

https://www.coderabbit.ai/blog/coderabbit-tops-martian-code-review-benchmark
1•smb06•8m ago•0 comments

Open Source Iran War Cost Tracker: 45.7B

https://iranwarcost.com
6•koverda•9m ago•1 comments

Unfiltered bald joy in the most uplifting corner of the internet

https://okayzoomer.substack.com/p/unfiltered-bald-joy-in-the-most-uplifting
1•speckx•9m ago•0 comments

I wrote a spec-driven ISO 8583 parser/builder in Go

https://github.com/leo-aa88/go-iso8583
1•araujo88•9m ago•1 comments

Redesigning Mathematics for Elegant Physics

https://twitter.com/devrimyasar/status/2029006461267857637
1•aesopsfable•9m ago•0 comments

What AI Safety Means to Me

https://olshansky.info/thoughts/2026-03-04-what-ai-safety-means-to-me
1•Olshansky•10m ago•0 comments

Windows 12 in 2026: AI, CorePC and the Future of the AI PC

https://comuniq.xyz/post?t=837
1•01-_-•10m ago•0 comments

Show HN: Auctionnow.io – Launch a store to sell items via auction or buy-it-now

https://auctionnow.io/
4•chptung•10m ago•0 comments

Show HN: AutosClaw – security first *claw with live chat to any agent session

https://github.com/BreuerFlorian/autosclaw
1•fbreuer•11m ago•0 comments

Ex-NYPD Official Indicted for Accepting Bribes from Tech Exec

https://www.thecity.nyc/2026/02/12/kevin-taylor-phil-david-terence-banks-saferwatch-indictment/
2•PaulHoule•12m ago•0 comments

Samsung's 100% DRAM Price Hike and Why Even Apple Had to Pay Up

https://www.buysellram.com/blog/samsungs-100-dram-price-hike-and-why-even-apple-had-to-pay-up/
1•jamesbsr•13m ago•1 comments

The plan to kill Ali Khamenei

https://www.ft.com/content/bf998c69-ab46-4fa3-aae4-8f18f7387836
1•e12e•13m ago•1 comments