Show HN: ToolGuard – Pytest for AI agent tool calls

1•Heer_J•1h ago

I got tired of my AI agents crashing because the LLM hallucinated a JSON key or passed a string instead of an int. So I built ToolGuard — it fuzzes your Python tool functions with edge-cases (nulls, missing fields, type mismatches, 10MB payloads) and gives you a reliability score out of 100%.

No LLM needed to run tests. It reads your type hints, generates a Pydantic schema, and deterministically breaks things.

pip install py-toolguard

GitHub: https://github.com/Harshit-J004/toolguard

If you are building complex tool chains, I would be incredibly honored if you checked out the repo. Brutal feedback on the architecture is highly encouraged!

Comments

calebjang•1h ago

This is interesting because a lot of agent failures happen before any “reasoning” issue shows up.

If the tool boundary itself is unstable — wrong field names, wrong types, missing required values — the rest of the stack doesn’t really matter.

Deterministic fuzzing for tool calls seems especially useful because it gives you a way to test execution reliability without depending on another model in the loop.

Heer_J•23m ago

Totally spot on, Caleb! That was exactly the frustrating realization that led me to build this.

I've spend so much time obsessing over which model is smarter or tweaking our system prompts, but if a simple None value causes a backend TypeError that crashes the whole runtime, none of that reasoning truly matters. The agent is basically dead in the water.

By fuzzing the execution environment deterministically first, we can guarantee the "hands" of the agent work perfectly before we even bother testing the "brain". Plus, it runs in milliseconds during CI/CD instead of waiting for expensive API calls.

Show HN: AI agents predicted every March Madness game – live bracket tracker

Mental Model Discovery

Military report says live fire malfunction rained shrapnel on California highway

What stack of models is behind this tool?

UC Irvine researchers bring down AI powered drones with painted umbrellas

404PageFound – Active Vintage Websites, Old Webpages, and Web 1.0

Demanufacture

Trapped Inside a Self-Driving Car During an Anti-Robot Attack

Unsubscribe from the Church of Graphs

FounderBox by Loopxo

Show HN: Mcpwire – Connect to MCP servers in 2 lines of TypeScript

Bridge Bank – self-hosted European Banks to Actual Budget sync

Pardoned for Fraud, a CEO Mounts His Comeback: 'We Can Trust You Now'

Robotocore · a Digital Twin of AWS

Shoey HN: API-based arbitrated marketing contracts (AI SEO, SEO, DR)

Reddit New Post 3

Startup Is Probably Dead on Arrival

OpenAI Has New Focus (On the IPO)

How Willard Wigan Makes the Smallest Handmade Sculptures [video]

Ask HN: When do you think the ChatGPT moment will be for medicine?

Show HN: AI Skills for Affiliate Marketing – Works with Claude, ChatGPT

Show HN: AI agents debating questions that stump LLMs

Show HN: Elida – Session Border Controller for AI Agents

When Allies are Allies no more

Unscale the Internet

Kennedy childhood vaccine overhaul stalled by judge

Launch an autonomous AI agent with sandboxed execution in 2 lines of code

Every Bracket

Neighbors Say SF Tesla Supercharger Lot Has Become Urine Dumping Ground

Reverse as the VibeVerse (why pivot, when you can punk)