This introduces a specific failure mode: AI can generate text that constitutes an irreversible business commitment (refunds, credits, billing changes, contractual promises).
Once emitted, the commitment exists. Detection after delivery is irrelevant.
This project implements a hard authority boundary.
Model -------- AI systems propose messages. They do not decide whether those messages are allowed to commit the company.
A gateway enforces that decision.
AI drafts message ↓ Authority Gateway ↓ Send | Block → Approval
Behavior --------- For each outbound message:
Inspect text for commitment signals
Classify outcome as reversible or irreversible
If reversible → allow
If irreversible → block and require explicit approval
Log decision and evidence
No attempt is made to assess advice quality, intent, or correctness. Only enforceability is considered.
API surface ------------ /v1/messages/send Enforces authority on outbound messages
/v1/support/messages/decide Decision-only endpoint for support systems Returns structured reasons and a safe fallback reply
Properties ---------- Deterministic enforcement
Idempotent execution
Explicit approval for irreversible actions
No reliance on prompt discipline or training hygiene
No agent autonomy
This is a sandbox implementation to test whether a narrow authority layer is useful in practice.
Feedback welcome from teams already routing AI-generated messages into real customer workflows.