Hey everyone,
I built llm-schema-guard because LLMs are amazing at spitting out JSON... until they suddenly aren't. Even with JSON mode or function calling, you still get missing fields, wrong types, or just plain broken syntax that kills your agents, RAG flows, or any tool-calling setup.
This is a lightweight Rust HTTP proxy that sits in front of any OpenAI-compatible API (think Ollama, vLLM, LocalAI, OpenAI itself, Groq, you name it). It grabs the generated output, checks it against a JSON Schema you provide, and only lets it through if it's valid.
If it's invalid, strict mode kicks back a clean 400 with details. Permissive mode tries auto-retrying a few times by tweaking the prompt with a fix instruction and exponential backoff.
Everything else stays the same: full streaming support (it buffers the response to validate), Prometheus metrics so you can monitor validation fails, retries, latency, and more. Config is simple YAML for upstreams, schemas per model, rate limiting, caching, etc. There's even an offline CLI if you just want to test schemas locally.
It's built with Axum and Tokio for really low latency and high throughput, plus jsonschema-rs under the hood. Docker compose makes it dead simple to spin up with Ollama.
This grew out of my earlier schema-gateway project, and I'm happy to add stuff like Anthropic support, tool calling validation, or better streaming fixes if people find it useful.
Stars or contributions are very welcome!
Thanks for taking a look :)