Show HN: Pre-execution verification for LLM-generated agentic workflows

https://github.com/le0li0n/workflow-verify

2•jaredwaxman•1h ago

Comments

jaredwaxman•1h ago

I was building agentic workflows for my CRM — Otter.ai recordings → Clay enrichment → CRM updates — and got tired of LLM-generated pipelines silently doing the wrong thing. A pipeline that "worked" was pushing contacts without validating email format, making API calls I didn't authorize, and failing silently when field names didn't match between steps.

The problem isn't that LLMs write bad code. It's that there's no contract between what you asked for and what runs. Structured outputs solve format. Guardrails AI solves content safety. Temporal solves execution. Nobody checks whether the workflow itself makes sense as a pipeline.

So I built a verification layer. The LLM outputs a workflow AST via structured outputs. Before anything executes, the engine type-checks data flow across steps, validates schemas at boundaries, and requires every side effect (API calls, DB writes, webhooks) to be explicitly declared. You get a manifest — "this workflow READs from Salesforce and WRITEs to HubSpot" — that a compliance system can review without reading code.

~800 lines of Python, zero deps beyond Pydantic, MIT licensed. Would especially love feedback from folks building agentic systems in production — the schema library for domain-specific patterns is the most obvious area for contributions.

ConvertlyAI•1h ago

I love this approach to verification. I literally just launched my own AI formatting engine yesterday, and the hardest part wasn't the generation—it was building strict system-level guardrails to stop the model from outputting generic fluff words and breaking my slide formatting. Are you doing this pre-execution verification purely through secondary prompt checks, or are you running it through a separate smaller model first?

jaredwaxman•1h ago

Thanks! We're doing pre-execution verification through static analysis of the workflow AST — no secondary model involved. The verifier runs deterministically against declared effects and type constraints, so it catches issues before anything executes. Curious about your approach — are your guardrails rule-based or are you using a classifier?

Düren's Hydrogen Bet: The Math Behind a Looming Liability

Using Structured Light Scanning and Photogrammetry in Cultural Heritage

Financial AGI announced – outperforms human experts on 12 professional exams

Most AI agent demos won't survive enterprise security review

Show HN: Experiment- enforcing accessibility guardrails during AI UI generation

Ask HN: Have you noticed how the number of 'Show HN' posts has skyrocketed?

CSUN Assistive Technology Conference 2026 files

Show HN: Chatddit.com Fresh off the vibe press

I'm a Coin Boy, Too (2023)

Formal Verification in the Age of AI

I Love Email (2023)

Case Study: lynnandtonic.com 2025 refresh

A Day in the Life of an Enshittificator [video]

Collabora at Embedded World 2026: Open-Source AI and Embedded Innovation

I built a real-time RER/train tracker for Paris commuters (PWA, no app store)

Arc Raiders – Discord SDK Data Exposure

Researchers discover Chickpeas can grow in moon dirt and make seeds

Asymmetric Goal Drift in Coding Agents Under Value Conflict

Is Web Development Returning to PHP?

Containers, but Without the Magic Part 1: Networking

Show HN: ColdPitch – Find anyone, get a personalized cold email in seconds

Why do elephants have such wrinkly skin?

Left-handed people may have a psychological edge in competition

The OpenAI Files

Show HN: I trained a small local model to translate natural language to CLI

How we fixed Postgres connection pooling on serverless with PgDog

No Cloud, No Waiting: Tool-Calling Agents on Consumer Hardware with LFM2-24B-A2B

Kaoslabs – My Linux VPS and self-hosting experiments

Type-safe, K-sortable, globally unique identifier inspired by Stripe IDs

Donald Trump insists there are no wind farms in China. Here are 20 in pictures