I’m not talking about LLM evals here. I mean integration-style testing.
There are some workarounds like VCR-style record/replay (e.g., Python's VCR “cassettes”) and similar ideas for capturing LLM calls. They can work for certain frameworks when your test/runtime is the one making the HTTP requests.
Using local LLMs to save costs adds significant delay for each execution (+ doesn't solve nondeterminism).
However, it becomes significantly more challenging for MCP-style flows, where the interaction can be IDE-driven (stdio/JSON-RPC) and the HTTP calls may not even originate from your code.
I'm curious what people are using.