Hi folks, we've been working on a CLI tool to programatically test and eval MCP servers. Looking to get some initial feedback on the project.
Let's say you're testing PayPal MCP. You can write a test case prompt "Create a refund order for order 412". The test will run the prompt and check if the right PayPal tool was called.
The CLI helps with:
1. Test different prompts and observe how LLMs interact with your MCP server. The CLI shows a trace of the conversation.
2. Examine your server's tool name / description quality. See where LLMs are hallucinating using your server.
3. Analyze your MCP server's performance, like token consumption, and performance with different models.
4. Benchmarking your MCP server's performance to catch future regressions.
The nice thing about CLI is that you can run these tests iteratively! Please give it a try, and would really appreciate your feedback.
matt8p•1h ago
Let's say you're testing PayPal MCP. You can write a test case prompt "Create a refund order for order 412". The test will run the prompt and check if the right PayPal tool was called.
The CLI helps with: 1. Test different prompts and observe how LLMs interact with your MCP server. The CLI shows a trace of the conversation. 2. Examine your server's tool name / description quality. See where LLMs are hallucinating using your server. 3. Analyze your MCP server's performance, like token consumption, and performance with different models. 4. Benchmarking your MCP server's performance to catch future regressions.
The nice thing about CLI is that you can run these tests iteratively! Please give it a try, and would really appreciate your feedback.