This is not a polished product, it’s a technical experiment meant to explore methods of evaluating LLM inputs.
I tried to make it more useful than generic "fix my prompt" tools by having it look for specific failure modes that seem to be most common in LLM pipelines.
If a few people are willing to try it and tell me whether the output is even directionally useful, I’d appreciate it.
(After 5 prompts per day you'll start getting rate limit errors)