Hey HN — Dave here. While building a multi-agent system recently, I kept noticing that the diagrams of agents passing feedback to each other looked just like the electrical circuit diagrams from control theory. I got curious whether the established math transferred, and to my surprise it did. LoopGain is the first product of that research: an open-source library that replaces the max_iterations=N cap on agent loops with an actual measurement of whether the loop is still improving.
The cap is how nearly every verify-revise loop gets stopped today, and it's wrong in both directions. Stop too early and you clip a loop that was still improving. Stop too late and you pay for iterations after the loop already found its best answer — and ship the final attempt, which is sometimes worse than one it already had.
On a 2,000-trial benchmark (paired real-API runs, five loop patterns across six framework adapters, three model providers, pre-registered protocol with kill criteria), LoopGain cut total API spend by 92.8% vs max_iterations=20 ($27.05 → $1.94) and median wall-clock ~15× (30.9s → 2.1s). A cross-vendor judge preferred LoopGain's outputs on the weighted average (0.678 across 1,800 pairwise comparisons) — mostly because LoopGain returns the iteration with the lowest error it saw, while a fixed cap ships whatever the final iteration produced. The raw data and methodology are public, and the full run is browsable at dashboard.loopgain.ai/benchmark.
How it works: each iteration, LoopGain takes the ratio of the current error to the previous error — the loop's empirical loop gain (Aβ, borrowed from control theory). Aβ<1 means the error shrank — the loop is improving. Aβ≥1 means it held or grew — the loop is stuck or making things worse. A trajectory classifier reads the recent Aβ values, labels the loop (FAST_CONVERGE / CONVERGING / STALLING / OSCILLATING / DIVERGING), and decides whether to keep going, stop here, or stop and roll back to the lowest-error output so far.
Integration is a few lines around any loop that produces an error signal:
from loopgain import LoopGain
lg = LoopGain(target_error=0.0)
output = generate(task) # first attempt
while lg.should_continue():
errors = verify(output) # e.g. count of failing tests
lg.observe(errors, output=output) # the only LoopGain call in the loop
output = revise(output, errors)
result = lg.result # best_output, outcome, convergence_profile, savings_vs_fixed_cap
Honest limits, because they matter more than the headline: LoopGain detects convergence, not correctness — it inherits your verifier's blind spots. I re-graded my own benchmark and 4.5% of "converged" code-gen runs passed every check the loop ran but failed a fuller held-out test suite. And savings depend on workload: failure-heavy loops save ~78–84%, not 92.8%. There's a writeup on designing verifiers strong enough to trust on the blog.
Apache-2.0, pip install loopgain. Adapters for LangGraph, CrewAI, AutoGen, LangChain, OpenAI Agents SDK, and the Claude Agent SDK; the raw API works for anything with a measurable error.
What I'd really love from HN: if you run production agent loops, I'm interested in whether the stop decisions match what you see empirically — and what your loops' error signals actually look like, since the verifier is what makes or breaks the stop. Happy to answer anything.
fitz2882•1h ago
The cap is how nearly every verify-revise loop gets stopped today, and it's wrong in both directions. Stop too early and you clip a loop that was still improving. Stop too late and you pay for iterations after the loop already found its best answer — and ship the final attempt, which is sometimes worse than one it already had.
On a 2,000-trial benchmark (paired real-API runs, five loop patterns across six framework adapters, three model providers, pre-registered protocol with kill criteria), LoopGain cut total API spend by 92.8% vs max_iterations=20 ($27.05 → $1.94) and median wall-clock ~15× (30.9s → 2.1s). A cross-vendor judge preferred LoopGain's outputs on the weighted average (0.678 across 1,800 pairwise comparisons) — mostly because LoopGain returns the iteration with the lowest error it saw, while a fixed cap ships whatever the final iteration produced. The raw data and methodology are public, and the full run is browsable at dashboard.loopgain.ai/benchmark.
How it works: each iteration, LoopGain takes the ratio of the current error to the previous error — the loop's empirical loop gain (Aβ, borrowed from control theory). Aβ<1 means the error shrank — the loop is improving. Aβ≥1 means it held or grew — the loop is stuck or making things worse. A trajectory classifier reads the recent Aβ values, labels the loop (FAST_CONVERGE / CONVERGING / STALLING / OSCILLATING / DIVERGING), and decides whether to keep going, stop here, or stop and roll back to the lowest-error output so far.
Integration is a few lines around any loop that produces an error signal:
Honest limits, because they matter more than the headline: LoopGain detects convergence, not correctness — it inherits your verifier's blind spots. I re-graded my own benchmark and 4.5% of "converged" code-gen runs passed every check the loop ran but failed a fuller held-out test suite. And savings depend on workload: failure-heavy loops save ~78–84%, not 92.8%. There's a writeup on designing verifiers strong enough to trust on the blog.Apache-2.0, pip install loopgain. Adapters for LangGraph, CrewAI, AutoGen, LangChain, OpenAI Agents SDK, and the Claude Agent SDK; the raw API works for anything with a measurable error.
What I'd really love from HN: if you run production agent loops, I'm interested in whether the stop decisions match what you see empirically — and what your loops' error signals actually look like, since the verifier is what makes or breaks the stop. Happy to answer anything.