frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: How we made MCP development feel good

https://manufact.com/blog/mcp-testing
5•pzullo•1h ago
Hey HN, I am Pietro from Manufact (https://manufact.com), we build open source dev tools and infrastructure for MCP.

You might know us for mcp-use (https://github.com/mcp-use/mcp-use) our open source full stack SDK to build MCP servers and clients.

At Manufact we gave ourselves the mission, and delight, to write as many MCP servers as we could, through this journey we could hone our SDK to offer the best possible developer/agent experience.

Testing/developing MCP servers is a pain because:

- Configuring MCPs in normal clients is not an easy feat. People complain that installing them is not easy, imagine having to refresh them every time you make a change - Testing does not only mean testing tools work one at a time, but making sure agents understand them and can call the tool in the right way/order - If installing an MCP locally is a challenge, it is even more on remote clients where people are going to actually use your products (claude.ai, chatgpt.com) - Model capabilities + system prompt (agent) that will end up using your server vary greatly. Some people might be using Opus 4.7 from Claude Code, some might use Instant on chatgpt.com, the model's ability to call your tool varies a lot. Testing on GPT5.5 locally and testing on ChatGPT with the same model yield very different experiences.

First: local development loop

Two things made web development frameworks like Next and Vite (etc.) better than anything else, HMR and preview on localhost.

What is the preview of an MCP ? In our opinion a chat, every time you npm run dev an mcp-use server we serve an inspector on localhost, automatically connected to your MCP server, it has a BYOK chat, a way to test tools one by one, and super detailed metadata about your MCP server to make sure it is compliant

Interesting technical challenge here was to make an MCP client that runs completely (or almost) in the browser.

About HMR: this was not super easy, there are a few ways to do this, we chose the hard but proper way. We implemented HMR using the protocol primitives, if you change a tool we do not hard refresh the server and cancel the previous MCP session, we send a notifications/tools/list_changed notification (in spec) to the client which knows it should reload the tools. As far as UI elements we use Vite HMR and we forward the UI changes across all elements of the inspector so for instance you can change the UI element your MCP returns and see the change live in the embedded chat. (This is pretty marvellous to look at)

This sped up the development of MCPs by a lot.

You can try it out our inspector by running npx @mcp-use/inspector or just by using our sdk.

Bonus: one thing I do often is launch Claude Code with --chrome enabled and tell it to go to the inspector URL to test the server, this creates a closed loop for the agents that make development of MCP with them much much more predictable

Second: testing on other clients (Disclaimer : this is a cloud feature)

Testing on actual client is possibly more painful. We created an automated testing feature, you define the test cases associated with an MCP server in the regular agent testing shape (user message, expected tool calls, rubrics). Since "Testing on GPT5.5 locally and testing on ChatGPT with the same model yield very different experiences." we need to test on the actual client so we use browser agents to install the app and start the tests directly on the clients themselves.

Once the session is over, you get the results and both screenshots and screen recordings of the conversations. These turned out to be super useful to share new versions of MCP apps between teams as well.

I'd love to hear thoughts and feedback and specifically know how (if) people are testing their MCP servers both in production and locally.

(I started writing MCPs in Feb 25, when no tool was available and hardly any support in clients, I'd love to see how people are doing this today)

What's your experience with AI in hiring?

1•kathir05•26s ago•0 comments

Show HN: NURL – A programming language designed for language models

https://nurl-lang.org/
1•Hindurable•56s ago•0 comments

AI versus Microservices

https://www.michaelnygard.com/blog/2026/05/ai-versus-microservices/
1•felipehummel•1m ago•0 comments

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

https://github.com/cactus-compute/needle
1•HenryNdubuaku•1m ago•0 comments

Redis and the Cost of Ambition

https://charlesleifer.com/blog/redis-and-the-cost-of-ambition/
1•cptmurphy•2m ago•0 comments

Show HN: Blober.io – The easiest way to transfer files between cloud providers

https://blober.io/
1•mckabue•3m ago•1 comments

Automated Grading: The Fairness, Reliability, and Validity of AI Grades

https://www.mdpi.com/1999-4893/19/5/346
1•PaulHoule•4m ago•0 comments

Kiln – a free app and open-source library to build better AI products

https://github.com/Kiln-AI/Kiln
1•debo_•4m ago•0 comments

Templ Beta Is Live. We Dare You to Steal Our Treasury

https://twitter.com/TemplFun/status/2054199653072076867
1•marcoworms•5m ago•0 comments

Prowl: Learning Through Discovery

https://odyssey.ml/introducing-prowl
1•olivercameron•5m ago•0 comments

Spotify confirms ongoing outage (2026)

https://twitter.com/spotifystatus/status/2054242542082822587
1•ArmandoAP•7m ago•0 comments

Show HN: OpenClaw OS – OSS Claude Cowork Built on Top of OpenClaw

https://github.com/thesysdev/openclaw-os
2•zahlekhan•7m ago•0 comments

SQL: Incorrect by Construction

https://chreke.com/posts/sql-incorrect-by-construction
2•ingve•7m ago•0 comments

New proj: Scorpi – a Docker-like VM development platform for macOS

https://fuse-t.org/scorpi/
1•concerned_ctzn•9m ago•1 comments

How the Blitz enhanced London's economy

https://blogs.lse.ac.uk/businessreview/2018/06/25/how-the-blitz-enhanced-londons-economy/
1•noleary•9m ago•0 comments

Quack: The DuckDB Client-Server Protocol

https://duckdb.org/2026/05/12/quack-remote-protocol
2•aduffy•10m ago•0 comments

The BeBox: BeOS Hardware, Photos, and the Apple Deal That Wasn't

https://www.jdhodges.com/blog/bebox-beautifully-overbuilt-computer/
1•speckx•10m ago•0 comments

Nvidia is buying the chip supply chain

https://www.msn.com/en-us/money/technologyinvesting/nvidia-is-buying-the-chip-supply-chain/ar-AA2...
1•petethomas•10m ago•0 comments

Julia Set

https://en.wikipedia.org/wiki/Julia_set
1•tosh•10m ago•0 comments

AI Use Is Breaking My Brain

https://www.404media.co/your-ai-use-is-breaking-my-brain/
1•Brajeshwar•11m ago•0 comments

Dead.letter (CVE-2026-45185) Humans vs. LLM for Unauthenticated RCE Race on Exim

https://xbow.com/blog/dead-letter-cve-2026-45185-xbow-found-rce-exim
5•fedek_•11m ago•0 comments

Aegis DQ – agentic data quality with LLM diagnosis

https://github.com/aegis-dq/aegis-dq
1•shiva_koreddi•14m ago•0 comments

Veridex– A P2P decentralized knowledge chain where verified truth gets tokenized

https://www.veridax.app/
1•omegastar•15m ago•0 comments

Cross-platform Rust: how WhatsApp, Signal etc. are shipping Rust to billions

https://kerkour.com/rust-cross-platform-apps
1•stmw•16m ago•0 comments

Spotify Is Down

https://xcancel.com/SpotifyStatus/status/2054242542082822587
1•therepanic•18m ago•0 comments

The Deathbed Notes of Henry James (1968)

https://www.theatlantic.com/past/docs/unbound/flashbks/james/jnote.htm
1•Hooke•19m ago•0 comments

Google Cloud and DigitalOcean Behaved Differently Under Repeated Deployments

https://webbynode.com/articles/repeated-deployments-google-cloud-vs-digitalocean
1•gsgreen•20m ago•0 comments

Strike Force Five Is and Always Will Be

https://www.youtube.com/watch?v=iU3PSAAgbrU
1•johnbarron•20m ago•0 comments

AWS releases Semantic Entropy for AI and parallel agents

https://kiro.dev/blog/faster-smarter-specs/
3•nslog•21m ago•0 comments

Deep Dive AAuth (Agent Auth) – Identity and Access Management for AI Agents

https://blog.christianposta.com/exploring-aauth-agent-auth-identity-and-access-management-for-ai-...
1•mooreds•22m ago•0 comments