I was building agentic workflows for my CRM — Otter.ai recordings → Clay enrichment → CRM updates — and got tired of LLM-generated pipelines silently doing the wrong thing. A pipeline that "worked" was pushing contacts without validating email format, making API calls I didn't authorize, and failing silently when field names didn't match between steps.
The problem isn't that LLMs write bad code. It's that there's no contract between what you asked for and what runs. Structured outputs solve format. Guardrails AI solves content safety. Temporal solves execution. Nobody checks whether the workflow itself makes sense as a pipeline.
So I built a verification layer. The LLM outputs a workflow AST via structured outputs. Before anything executes, the engine type-checks data flow across steps, validates schemas at boundaries, and requires every side effect (API calls, DB writes, webhooks) to be explicitly declared. You get a manifest — "this workflow READs from Salesforce and WRITEs to HubSpot" — that a compliance system can review without reading code.
~800 lines of Python, zero deps beyond Pydantic, MIT licensed. Would especially love feedback from folks building agentic systems in production — the schema library for domain-specific patterns is the most obvious area for contributions.
ConvertlyAI•1h ago
I love this approach to verification. I literally just launched my own AI formatting engine yesterday, and the hardest part wasn't the generation—it was building strict system-level guardrails to stop the model from outputting generic fluff words and breaking my slide formatting. Are you doing this pre-execution verification purely through secondary prompt checks, or are you running it through a separate smaller model first?
jaredwaxman•1h ago
Thanks! We're doing pre-execution verification through static analysis of the workflow AST — no secondary model involved. The verifier runs deterministically against declared effects and type constraints, so it catches issues before anything executes. Curious about your approach — are your guardrails rule-based or are you using a classifier?
jaredwaxman•1h ago
The problem isn't that LLMs write bad code. It's that there's no contract between what you asked for and what runs. Structured outputs solve format. Guardrails AI solves content safety. Temporal solves execution. Nobody checks whether the workflow itself makes sense as a pipeline.
So I built a verification layer. The LLM outputs a workflow AST via structured outputs. Before anything executes, the engine type-checks data flow across steps, validates schemas at boundaries, and requires every side effect (API calls, DB writes, webhooks) to be explicitly declared. You get a manifest — "this workflow READs from Salesforce and WRITEs to HubSpot" — that a compliance system can review without reading code.
~800 lines of Python, zero deps beyond Pydantic, MIT licensed. Would especially love feedback from folks building agentic systems in production — the schema library for domain-specific patterns is the most obvious area for contributions.