I got tired of JSON parsing failures and manually switching between LLMs on task basis. So, I built Perf, a single endpoint that picks the right model and guarantees valid JSON back.
The problem: LLMs are unreliable with structured output. You end up writing retry logic, validation, fallback chains or you end up overpaying for a flagship model for trivial tasks.
What Perf does: You call perf.chat(messages) and it analyzes the prompt, routes to the best model for cost/quality, validates the output, and repairs or retries if needed.
Benchmark results (~500 prompts):
- 100% valid JSON (vs ~85-95% from direct API calls)
- 90% cheaper than GPT-4o baseline (with identical prompts and no retries)
Tradeoff: Routing and validation add 1-3s latency. Good for batch/async workloads, not ideal for real-time chat yet.
Live demo: https://demo.withperf.pro
Landing page: https://withperf.pro
Would love your feedback and questions.