I previously worked as an AI engineer at a Compliance AI startup and the worst part of the job was gathering data and tuning prompts. Evals were awful.
I'm currently exploring ideas and was surprised to find that a lot of engineers at accelerators, and even seed stage, are pretty much skipping any sort of LLM eval or prompt tuning.
Curious to hear the experience of others when it comes to tuning prompts/how they view the work.