I'll give it a try, MCP apps are full of promises but protocols are so unstable that I wouldn't want to write the boiler plate myself.
Regarding the protocols being unstable, that's quite a fair point. Maybe it is possible to automate this? That is, detecting changes in the official docs automatically, and adapt the docs and tests automatically based on it via a Coding Agent.
sebderhy•1h ago
When I tried building MCP Apps [1], the official repos (https://github.com/openai/openai-apps-sdk-examples, https://github.com/modelcontextprotocol/ext-apps/tree/main/e...) were great starting points, but they're designed for human developers. When I used them with Claude Code, I ended up in the usual loop: agent writes code → I manually test the app on ChatGPT → describe errors back → repeat. Plus, we didn't know what the best practices are, and struggled to enforce them.
So I built an MCP App template designed for coding agents to work as autonomously as possible on an MCP app.
The key idea: orthogonal testing. 450+ tests parameterized across 12 widget modules that verify infrastructure (protocol compliance, best practices grade, browser rendering), not business logic. Modify widgets, change data, add features — the tests should still pass. Agents iterate freely and get feedback without a human in the loop.
Other features: - Hierarchical documentation that includes the MCP-App & OpenAI Apps SDK official llms.txt files - Local chat simulator app that works even without API keys via Puter.js - Visual testing of every widget: pnpm run ui-test --tool show_carousel → screenshot at /tmp/ui-test/screenshot.png - 12 working examples (QR codes to 3D solar system) gathered from the official repos mentioned above.
The repo includes an unedited ~15 min video of Claude Code building an app autonomously which worked directly within ChatGPT.
I'd love to hear how it goes if you try it. Or even better: ask for a feedback to your agent, and post it here!
[1] MCP Apps (https://modelcontextprotocol.io/docs/extensions/apps) let you build interactive widgets that run inside Claude, ChatGPT, VS Code, and other AI hosts. In contrast to smartphone apps, the same code can deploy to all platforms.