Does this guide cover systematic eval at all?
For Chapter 5 on RAG, it goes through precision/recall (with emphasis typically on recall for RAG systems).
For Chapter 6, I show a demo of LLM as a judge (using structured outputs to have specific errors it looks for) to evaluate a more fuzzy objective (writing a report based on table output).
Schlagbohrer•1h ago