I built FailWatch because I couldn't trust my financial AI agent with a production wallet. No matter how much I optimized the system prompt (e.g., "Do not refund > $500"), the LLM would occasionally hallucinate or drift logically.
The scariest part wasn't the hallucination itself, but the failure mode: if my external validation service crashed or timed out, the default behavior in many frameworks was to "fail-open" and execute the tool anyway.
FailWatch is a Python middleware that sits between the agent and the tool execution to enforce fail-closed safety:
Math > Prompts: It uses deterministic Python logic (Pydantic/Regex) for hard constraints.
Fail-Closed Architecture: If the guard server is unreachable, the action is blocked by default.
Logic Drift Detection: It can optionally inspect the agent's "chain of thought" steps to detect intent mismatch before execution.
It's open source (MIT). I'd love to hear feedback on the architecture or how you handle "safety-critical" tool calls in your agents.
Sheeplover•1d ago
I built FailWatch because I couldn't trust my financial AI agent with a production wallet. No matter how much I optimized the system prompt (e.g., "Do not refund > $500"), the LLM would occasionally hallucinate or drift logically.
The scariest part wasn't the hallucination itself, but the failure mode: if my external validation service crashed or timed out, the default behavior in many frameworks was to "fail-open" and execute the tool anyway.
FailWatch is a Python middleware that sits between the agent and the tool execution to enforce fail-closed safety:
Math > Prompts: It uses deterministic Python logic (Pydantic/Regex) for hard constraints.
Fail-Closed Architecture: If the guard server is unreachable, the action is blocked by default.
Logic Drift Detection: It can optionally inspect the agent's "chain of thought" steps to detect intent mismatch before execution.
It's open source (MIT). I'd love to hear feedback on the architecture or how you handle "safety-critical" tool calls in your agents.