frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: mcp-recorder – VCR.py for MCP servers. Record, replay, verify

https://github.com/devhelmhq/mcp-recorder
6•caballeto•8h ago
Hi HN, I'm Vlad. I've been building MCP servers and related tooling for a while now, and I kept hitting a class of bug that no unit test caught: someone on the team renames a tool parameter or tweaks a tool description, all the tests pass, but the AI agent that was calling that tool silently breaks. This happens because the model reads tool descriptions and parameter schemas to decide which tool to call and how, so a renamed parameter or a reworded description isn't just a cosmetic change — it directly affects the model's behavior.

The MCP spec doesn't have tool versioning available yet, and there's no static artifact describing what a server exposes. The tools/list just returns whatever's in memory at runtime and there's nothing to commit or diff against, which means changes slip through that can break downstream workflows without noticing.

The same problem for HTTP was already solved a long time ago with VCR.py, and I realized the same pattern works here. mcp-recorder captures the full MCP interaction sequence — initialize, tools/list, tools/call — into a JSON cassette file. Because it records complete protocol exchanges rather than just schema snapshots, you're testing actual behavior: if a tool call that used to return a specific format now returns something different, or a capability quietly disappears during the handshake, the cassette catches it. From that single recording you can replay it as a mock server (no API keys, fully deterministic), or verify your changed server against it and catch any diff:

Verifying golden.json against node dist/index.js

  1. initialize [PASS]

  2. tools/list [PASS]

  3. tools/call [search] [FAIL]

    $.result.content[0].text: "old output" != "new output"

  4. tools/call [analyze] [PASS]
Result: 3/4 passed, 1 failed

Non-zero exit code on any mismatch, so it plugs straight into CI.

You can try it right now with minimal setup, there's a public demo server and a scenarios file included:

pip install mcp-recorder mcp-recorder record-scenarios scenarios.yml mcp-recorder verify --cassette cassettes/demo_walkthrough.json \ --target https://mcp.devhelm.io

It works with both HTTP and stdio transports. Scenarios are defined in YAML so it works with MCP servers in any language, and there's a pytest plugin if you want tighter integration. Secret redaction and environment variable interpolation are built in.

To make sure this actually works on real codebases, I submitted several PRs to production MCP servers: monday.com's MCP server (https://github.com/mondaycom/mcp/pull/222), Tavily's MCP server (https://github.com/tavily-ai/tavily-mcp/pull/113), and Firecrawl's MCP server (https://github.com/firecrawl/firecrawl-mcp-server/pull/175). They went from zero schema coverage to full tool surface verification with a clean schema diff available on each tool change. One big benefit is that you can do verification and replay with no API keys — deterministic responses, no live requests to real servers.

I wrote up a deeper dive into the schema drift problem and the VCR pattern for MCP here: https://devhelm.io/blog/regression-testing-mcp-servers

mcp-recorder is MIT-licensed and on PyPI. Source is at https://github.com/devhelmhq/mcp-recorder — issues and PRs are welcome.

I'm building more tooling around MCP and agent reliability, so if you're dealing with similar problems, I'd genuinely like to hear what's been painful for you.

Plan 9 from User Space

https://9fans.github.io/plan9port/
1•tosh•1m ago•0 comments

GPT-5.4 code-golfs GPT-2

https://twitter.com/hansonwng/status/2030000810894184808
1•tosh•1m ago•0 comments

Re-creating the complex cuisine of prehistoric Europeans

https://arstechnica.com/science/2026/03/recreating-the-complex-cuisine-of-prehistoric-europeans/
1•apollinaire•2m ago•0 comments

Oracle and OpenAI drop Texas data center expansion plan

https://www.reuters.com/business/oracle-openai-end-plans-expand-texas-data-center-site-bloomberg-...
2•speckx•2m ago•0 comments

Palera1n Jailbreak Compiled and Run on a Samsung Galaxy S3 (PostmarketOS, ARMv7)

https://github.com/noxbitx/s3ra1n/tree/main
1•noxbit•2m ago•0 comments

Eval awareness in Claude Opus 4.6's BrowseComp performance

https://www.anthropic.com/engineering/eval-awareness-browsecomp
1•gcampbell•3m ago•0 comments

Show HN: I built an international calling platform/Android App

https://voklit.app
1•ahmgeek•4m ago•0 comments

If flip-phones can make a comeback, can Flash do the same?

https://disassociated.com/flip-phones-comeback-can-flash/
1•speckx•4m ago•0 comments

An AI disaster is getting ever closer

https://www.economist.com/briefing/2026/03/05/an-ai-disaster-is-getting-ever-closer
3•bookofjoe•7m ago•1 comments

Ecological Imperialism

https://kschroeder.substack.com/p/my-library-ecological-imperialism
1•MaysonL•8m ago•0 comments

Python 'Chardet' Package Replaced with LLM-Generated Clone, Re-Licensed

https://developers.slashdot.org/story/26/03/06/1614252/python-chardet-package-replaced-with-llm-g...
1•jakobdabo•10m ago•0 comments

Asana: Scaling our invalidation pipeline (Part 1)

https://asana.com/inside-asana/scaling-invalidation-pipeline-part-1
1•Bringoff•10m ago•0 comments

Host Claude Artifacts on your own domain

https://artifact.ninja/
1•mooreds•10m ago•0 comments

Issue: The Consciousness Question Is Being Asked Wrong

https://medium.com/@sheldonksalmon/issue-the-consciousness-question-is-being-asked-wrong-0e6d2eae...
1•sheldonksalmon•11m ago•0 comments

Obstructive sleep apnoea costs UK and US economies £137B a year

https://www.theguardian.com/society/2026/feb/24/obstructive-sleep-apnoea-costs-uk-and-us-economie...
1•PaulHoule•11m ago•0 comments

Show HN: GPT-5.4 is interesting for one boring reason: fewer retries

https://clipnotebook.com/blog/gpt-5-4-fewer-retries-real-work
3•diddddy•15m ago•0 comments

Jank is off to a great start in 2026

https://jank-lang.org/blog/2026-03-06-great-start/
3•todsacerdoti•17m ago•0 comments

Swift at scale: building the TelemetryDeck analytics service

https://swift.org/blog/building-privacy-first-analytics-with-swift/
1•frizlab•17m ago•0 comments

GLP-1 drugs may fight addiction across every major substance

https://theconversation.com/glp-1-drugs-may-fight-addiction-across-every-major-substance-accordin...
2•gmays•18m ago•0 comments

Watch BYD's 5-min Flash Charging in action on the new Seal 07 EV

https://electrek.co/2026/03/06/byds-new-seal-07-ev-with-5-min-flash-charging-video/
1•breve•19m ago•0 comments

Reflections on Using Acme (2020)

https://blog.jacobvosmaer.nl/0006-acme/
1•tosh•19m ago•0 comments

Show HN: Graph-Oriented Generation – Beating RAG for Codebases by 89%

https://github.com/dchisholm125/graph-oriented-generation
1•dchisholm125•19m ago•0 comments

Most of My Coding Is Now Agentic

https://www.justinmath.com/most-of-my-coding-is-now-agentic/
2•speckx•22m ago•0 comments

Eating out of boredom isn't a thing

https://greyenlightenment.com/2026/01/28/eating-out-of-boredom-isnt-really-a-thing/
1•paulpauper•24m ago•0 comments

Claude Used to Hack Mexican Government

https://www.schneier.com/blog/archives/2026/03/claude-used-to-hack-mexican-government.html
2•Jimmc414•25m ago•0 comments

The Evolution of Go (2015) [video]

https://www.youtube.com/watch?v=0ReKdcpNyQg
1•tosh•25m ago•0 comments

3W for In-Browser AI: WebLLM and WASM and WebWorkers

https://blog.mozilla.ai/3w-for-in-browser-ai-webllm-wasm-webworkers/
1•hwclass•26m ago•0 comments

New (early) diabetes cure in China

https://hrnews1.substack.com/p/communist-china-just-cured-diabetes
1•donatello•26m ago•1 comments

Project Oberon Emulator in JavaScript and Java

https://schierlm.github.io/OberonEmulator/
1•tosh•27m ago•0 comments

White House autism briefing linked to Swift shifts in prescribing patterns

https://www.brown.edu/news/2026-03-05/autism-briefing-prescriptions
1•geox•28m ago•0 comments