frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

ToolMisuseBench: A deterministic benchmark for tool-augmented Agents

https://huggingface.co/datasets/sigdelakshey/ToolMisuseBench
1•akgitrepos•1h ago

Comments

akgitrepos•1h ago
ToolMisuseBench is a deterministic, offline benchmark dataset for evaluating tool-using agents under realistic failure conditions, including schema misuse, execution failures, interface drift, and recovery under budget constraints.

This dataset is intended for reproducible evaluation of agent tool-use behavior, not for training a general-purpose language model.

akgitrepos•1h ago
GitHub Repo: https://github.com/akgitrepos/toolmisusebench