AgentUQ uses provider logprobs to localize brittle action-bearing spans in an agent step and route actions like continue, retry, verify, ask for confirmation, or block.
The claim is intentionally narrow. It isn’t trying to determine truth. I know uncertainty work can feel theoretical, so I wanted to test whether a smaller operational use of it could be useful in practice.
My bigger belief is that agents need infrastructure that learns from production history (failed runs as well as unconfident ones) instead of just accumulating patches. This is one concrete experiment in that direction.