What are some ways to avoid common methological pitfalls when generating test cases for "groundedness" benchmarks with automation?
Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic.
this_steve_j•2h ago
Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic.