To manage operations for my independent video game studio, I built a trust system that works more like onboarding a new hire. Agents start in draft mode (every action needs approval), and earn autonomy over time based on their track record in specific task categories.
The core idea: each agent maintains a separate Beta distribution per task category (support triage, expense reports, publisher emails, etc.). A Beta distribution is basically a track record parameterized by successes and failures. But raw E[p] = α/(α+β) can't tell the difference between "9 successes, 0 failures" and "90 successes, 10 failures" since both give E[p] = 0.90. So I use Jøsang's Subjective Logic to map these to opinion tuples that explicitly separate belief from uncertainty. High uncertainty means "not enough data yet," which is different from "we know this agent is bad."
Every action passes through a gate:
VoI = stakes × (1 - trust) × uncertainty
Low VoI = auto-execute. High VoI = draft for human review. Static trust thresholds set the maximum autonomy level an agent can reach (Auto-Execute, Soft-Execute, Draft, Restricted), and VoI acts as a secondary gate that can restrict it further based on context — an agent might qualify for auto-execute in general, but a high-stakes situation still gets flagged.Three things that made the biggest difference:
1. Edit distance feedback. If you rewrite half an email before hitting "approve," the system notices. A 0% edit = full trust credit. A 71%+ rewrite = penalty. This single change prevented agents from reaching auto-execute on work users were quietly fixing.
2. Time-based decay. Trust scores decay daily for inactive categories (λ = 0.95). If an agent hasn't done a task in two months, it gets supervised again. This also handles model upgrades, since the track record was earned on a different model.
3. Weakest-link chains. Multi-step workflows (send welcome email → create project → schedule meeting → notify team) use a weakest-link model. If any step needs approval, the whole chain surfaces as one inbox item. Nothing runs until you approve the full picture.
The core mapping from track record to opinion looks like this:
def beta_to_opinion(alpha, beta, base_rate=0.5):
n = alpha + beta
return Opinion(
belief=(alpha - 1) / n,
disbelief=(beta - 1) / n,
uncertainty=2 / n,
base_rate=base_rate,
)
The math is all well-established (Beta distributions, Subjective Logic, Value of Information). The part that worked was combining them into something that mirrors how trust actually develops between people.Article with full implementation details, code examples, and diagrams: https://kenschachter.substack.com/p/earned-autonomy