> This is the same distinction GroundEval makes for question answering agents.
> GroundEval treats agent behavior as something that can be tested against a state contract.
> That is the class of failure GroundEval is designed to catch.
this is an ad shaped like a blog post
evil-olive•51m ago
> This is the same distinction GroundEval makes for question answering agents.
> GroundEval treats agent behavior as something that can be tested against a state contract.
> That is the class of failure GroundEval is designed to catch.
this is an ad shaped like a blog post
jflynt76•37m ago