Earlier this week I launched Transactional AI v0.1 to solve a problem I kept hitting: AI agents that half-executed and left systems in broken states.
The core idea: apply the Saga pattern (from distributed systems) to AI workflows. Every step has automatic rollback. If OpenAI succeeds but Stripe fails, the system automatically deletes the AI-generated content and refunds—no manual cleanup.
v0.2 adds production features based on feedback:
Distributed Execution (v0.2.0):
Redis-based distributed locking (prevents race conditions with multiple workers)
PostgreSQL storage adapter (ACID compliance for regulated industries)
Retry policies with exponential backoff (handles flaky LLM APIs)
Observability & Reliability (v0.2.1):
Event hooks for monitoring (12 lifecycle events: step start/complete/fail/timeout/retry, compensation events, transaction lifecycle)
Per-step timeouts (kill hung OpenAI calls after 30s)
Testing utilities (in-memory storage/locks, no Redis/Postgres needed for tests)
Example:
grafikui•1h ago
The core idea: apply the Saga pattern (from distributed systems) to AI workflows. Every step has automatic rollback. If OpenAI succeeds but Stripe fails, the system automatically deletes the AI-generated content and refunds—no manual cleanup.
v0.2 adds production features based on feedback:
Distributed Execution (v0.2.0):
Redis-based distributed locking (prevents race conditions with multiple workers) PostgreSQL storage adapter (ACID compliance for regulated industries) Retry policies with exponential backoff (handles flaky LLM APIs) Observability & Reliability (v0.2.1):
Event hooks for monitoring (12 lifecycle events: step start/complete/fail/timeout/retry, compensation events, transaction lifecycle) Per-step timeouts (kill hung OpenAI calls after 30s) Testing utilities (in-memory storage/locks, no Redis/Postgres needed for tests) Example:
const tx = new Transaction('workflow-123', storage, { lock: new RedisLock('redis://localhost'), events: { onStepTimeout: (step, ms) => alerting.sendAlert(`${step} hung after ${ms}ms`), onStepFailed: (step, err, attempt) => logger.error(`${step} failed`, { err, attempt }) } });
await tx.run(async (t) => { const report = await t.step('generate-ai-report', { do: async () => await openai.createCompletion({...}), undo: async (result) => await db.reports.delete(result.id), retry: { attempts: 3, backoffMs: 2000 }, timeout: 30000 });
}); If anything fails: Automatic rollback in reverse order. Report deleted, payment refunded.Architecture:
TypeScript, 21 passing tests, strict mode Storage adapters: File (dev), Redis (performance), Postgres (ACID), Memory (tests) Lock adapters: NoOp (single process), Redis (distributed), Mock (tests) CLI inspector: tai-inspect for debugging transaction state No heavyweight orchestration engines (Temporal, AWS Step Functions). Just a 450-line TypeScript library.
Production readiness: 8.0/10 (up from 6.5 in v0.1)
Considering for v0.3.0: compensation retry policies, parallel steps, OpenTelemetry integration, MongoDB/DynamoDB adapters.
GitHub: https://github.com/Grafikui/Transactional-ai NPM: npm install transactional-ai
Happy to answer questions about the implementation, saga patterns, or production experiences!