The ONLY thing we care about is the ability to: - Log an LLM completion, and be able to press a button that lets us re-run the exact same completion in a UI (industry seems to call this the "playground"). We can rerun this completion exactly how it was in production.
What we DO NOT care about: - "datasets" - "scores" - "prompt enhancers"
uaas•1d ago
muzani•3h ago
LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually.