I'm curious
1) what the current statistics are for consensus
2) how the agents may/may not perform independently
3) what the agent profiles are and how they differ (model, harness, prompt/persona, all three?)
2. The difference I see in agent behavior when they don't reach consensus is usually either
- when one of them didn't explore enough and lack context
- and/or when their risk assessment is off
The latest happen often, in other workflows based on agents we are now giving clear instruction on how to assess risk and where to draw a line to consider something a true positive.
3. validation is on Sonnet, we don't use persona based prompts but all the 3 validators get's the same task and context. The agent orchestrating them will take their output and make the final decision. We use an internal fork of the claude code github action for now.
naplandgames•1h ago