frontpage.

Show HN: How we made MCP development feel good

https://manufact.com/blog/mcp-testing

5•pzullo•1h ago

Hey HN, I am Pietro from Manufact (https://manufact.com), we build open source dev tools and infrastructure for MCP.

You might know us for mcp-use (https://github.com/mcp-use/mcp-use) our open source full stack SDK to build MCP servers and clients.

At Manufact we gave ourselves the mission, and delight, to write as many MCP servers as we could, through this journey we could hone our SDK to offer the best possible developer/agent experience.

Testing/developing MCP servers is a pain because:

- Configuring MCPs in normal clients is not an easy feat. People complain that installing them is not easy, imagine having to refresh them every time you make a change - Testing does not only mean testing tools work one at a time, but making sure agents understand them and can call the tool in the right way/order - If installing an MCP locally is a challenge, it is even more on remote clients where people are going to actually use your products (claude.ai, chatgpt.com) - Model capabilities + system prompt (agent) that will end up using your server vary greatly. Some people might be using Opus 4.7 from Claude Code, some might use Instant on chatgpt.com, the model's ability to call your tool varies a lot. Testing on GPT5.5 locally and testing on ChatGPT with the same model yield very different experiences.

First: local development loop

Two things made web development frameworks like Next and Vite (etc.) better than anything else, HMR and preview on localhost.

What is the preview of an MCP ? In our opinion a chat, every time you npm run dev an mcp-use server we serve an inspector on localhost, automatically connected to your MCP server, it has a BYOK chat, a way to test tools one by one, and super detailed metadata about your MCP server to make sure it is compliant

Interesting technical challenge here was to make an MCP client that runs completely (or almost) in the browser.

About HMR: this was not super easy, there are a few ways to do this, we chose the hard but proper way. We implemented HMR using the protocol primitives, if you change a tool we do not hard refresh the server and cancel the previous MCP session, we send a notifications/tools/list_changed notification (in spec) to the client which knows it should reload the tools. As far as UI elements we use Vite HMR and we forward the UI changes across all elements of the inspector so for instance you can change the UI element your MCP returns and see the change live in the embedded chat. (This is pretty marvellous to look at)

This sped up the development of MCPs by a lot.

You can try it out our inspector by running npx @mcp-use/inspector or just by using our sdk.

Bonus: one thing I do often is launch Claude Code with --chrome enabled and tell it to go to the inspector URL to test the server, this creates a closed loop for the agents that make development of MCP with them much much more predictable

Second: testing on other clients (Disclaimer : this is a cloud feature)

Testing on actual client is possibly more painful. We created an automated testing feature, you define the test cases associated with an MCP server in the regular agent testing shape (user message, expected tool calls, rubrics). Since "Testing on GPT5.5 locally and testing on ChatGPT with the same model yield very different experiences." we need to test on the actual client so we use browser agents to install the app and start the tests directly on the clients themselves.

Once the session is over, you get the results and both screenshots and screen recordings of the conversations. These turned out to be super useful to share new versions of MCP apps between teams as well.

I'd love to hear thoughts and feedback and specifically know how (if) people are testing their MCP servers both in production and locally.

(I started writing MCPs in Feb 25, when no tool was available and hardly any support in clients, I'd love to see how people are doing this today)

What's your experience with AI in hiring?

Show HN: NURL – A programming language designed for language models

AI versus Microservices

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Redis and the Cost of Ambition

Show HN: Blober.io – The easiest way to transfer files between cloud providers

Automated Grading: The Fairness, Reliability, and Validity of AI Grades

Kiln – a free app and open-source library to build better AI products

Templ Beta Is Live. We Dare You to Steal Our Treasury

Prowl: Learning Through Discovery

Spotify confirms ongoing outage (2026)

Show HN: OpenClaw OS – OSS Claude Cowork Built on Top of OpenClaw

SQL: Incorrect by Construction

New proj: Scorpi – a Docker-like VM development platform for macOS

How the Blitz enhanced London's economy

Quack: The DuckDB Client-Server Protocol

The BeBox: BeOS Hardware, Photos, and the Apple Deal That Wasn't

Nvidia is buying the chip supply chain

Julia Set

AI Use Is Breaking My Brain

Dead.letter (CVE-2026-45185) Humans vs. LLM for Unauthenticated RCE Race on Exim

Aegis DQ – agentic data quality with LLM diagnosis

Veridex– A P2P decentralized knowledge chain where verified truth gets tokenized

Cross-platform Rust: how WhatsApp, Signal etc. are shipping Rust to billions

Spotify Is Down

The Deathbed Notes of Henry James (1968)

Google Cloud and DigitalOcean Behaved Differently Under Repeated Deployments

Strike Force Five Is and Always Will Be

AWS releases Semantic Entropy for AI and parallel agents

Deep Dive AAuth (Agent Auth) – Identity and Access Management for AI Agents