WebMCP Proposal

https://webmachinelearning.github.io/webmcp/

67•Alifatisk•2h ago

Comments

Flux159•1h ago

This was announced in early preview a few days ago by Chrome as well: https://developer.chrome.com/blog/webmcp-epp

I think that the github repo's README may be more useful: https://github.com/webmachinelearning/webmcp?tab=readme-ov-f...

Also, the prior implementations may be useful to look at: https://github.com/MiguelsPizza/WebMCP and https://github.com/jasonjmcghee/WebMCP

politelemon•1h ago

This GitHub readme was helpful in understanding their motivation, cheers for sharing it.

> Integrating agents into it prevents fragmentation of their service and allows them to keep ownership of their interface, branding and connection with their users

Looking at the contrived examples given, I just don't see how they're achieving this. In fact it looks like creating MCP specific tools will achieve exactly the opposite. There will immediately be two ways to accomplish a thing and this will result in a drift over time as developers need to take into account two ways of interacting with a component on screen. There should be no difference, but there will be.

Having the LLM interpret and understand a page context would be much more in line with assistive technologies. It would require site owners to provide a more useful interface for people in need of assistance.

bastawhiz•19m ago

> Having the LLM interpret and understand a page context

The problem is fundamentally that it's difficult to create structured data that's easily presentable to both humans and machines. Consider: ARIA doesn't really help llms. What you're suggesting is much more in line with microformats and schema.org, both of which were essentially complete failures.

LLMs can already read web pages, just not efficiently. It's not an understanding problem, it's a usability problem. You can give a computer a schema and ask it to make valid API calls and it'll do a pretty decent job. You can't tell a blind person or their screen reader to do that. It's a different problem space entirely.

mcintyre1994•1h ago

Wes Bos has a pretty cool demo of this: https://www.youtube.com/watch?v=sOPhVSeimtI

I really like the way you can expose your schema through adding fields to a web form, that feels like a really nice extension and a great way to piggyback on your existing logic.

To me this seems much more promising than either needing an MCP server or the MCP Apps proposal.

innagadadavida•20m ago

Demo I built 5 months ago: https://www.youtube.com/watch?v=02O2OaNsLIk This exposes ecommerce specific tool calls as regular javascript functions as it is more lightweight than going the MCP route.

It's great they are working on standardizing this so websites don't have to integrate with LLMs. The real opportunity seems to be able to automatically generate the tool calls / MCP schema by inspecting the website offline - I automated this using PLayright MCP.

jayd16•1h ago

Have any sickos tried to point AI at SOAP APIs with WSDL definitions, yet?

chopete3•51m ago

Likely no.

Every generation needs its own acronyms and specifications. If a new one looks like an old one likely the old one was ahead of its time.

vessenes•1h ago

I’m just personally really excited about building cli tools that are deployed with uvx. One line, instructions to add a skill, no faffing about with the mcp spec and server implementations. Feels like so much less dev friction.

baalimago•1h ago

Very cool! I imagine it'll be possible to start a static webserver + WebMCP app then use browser as virtualization layer instead of npm/uvx.

The browser has tons of functionality baked in, everything from web workers to persistence.

This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements

cadamsdotcom•1h ago

Great to see people thinking about this. But it feels like a step on the road to something simpler.

For example, web accessibility has potential as a starting point for making actions automatable, with the advantage that the automatable things are visible to humans, so are less likely to drift / break over time.

Any work happening in that space?

egeozcan•52m ago

As someone heavily involved in a11y testing and improvement, the status quo, for better or worse, is to do it the other way around. Most people use automated, LLM based tooling with Playwright to improve accessibility.

cadamsdotcom•34m ago

I certainly do - it’s wonderful that making your site accessible is a single prompt away!

jayd16•51m ago

In theory you could use a protocol like this, one where the tools are specified in the page, to build a human readable but structured dashboard of functionality.

I'm not sure if this is really all that much better than, say, a swagger API. The js interface has the double edge of access to your cookies and such.

wongarsu•12m ago

We've been here before: You can use XML to describe the data on the dashboard and the possible actions in a structured way. Then you attach a XMLT script that visual user agents use to turn that into pretty XHTML, and programmatic user agents can directly consume the XML.

A swagger API with react frontend is kind of the modern jsonified version of that, with better interactivity. But just wait five years and people will use a WebMCP backend rendered by WebGPT /s

thevinter•14m ago

We're building an app that automatically generates machine/human readable JSON by parsing semantic HTML tags and then by using a reverse proxy we serve those instead of HTML to agents

kekqqq•57m ago

Finally, I was hoping for this to be implemented in 2026. Rendered DOM is for humans, not for agents.

charcircuit•56m ago

This is coming late as skills have largely replaced MCP. Now your site can just host a SKILL.md to tell agents how to use the site.

ATechGuy•54m ago

Interesting. I'd appreciate an example. Thanks!

ednc•53m ago

check out https://moltbook.com/skill.md

Spivak•40m ago

I really like how the shell and regular API calls has basically wholesale replaced tools. Real life example of worse-is-better working in the real world.

Just give your AI agent a little linux VM to play around that it already knows how to use rather than some specialized protocol that has to predict everything an agent might want to do.

hnlmorg•49m ago

The purpose of this appears to be for sites that cannot be controlled via prompt instructions alone.

I do like agent skills, but I’m really not convinced by the hype that they make MCP redundant.

dvt•35m ago

I’m working on a DOM agent and I think MCP is overkill. You have a few “layers” you can imply by just executing some simple JS (eg: visible text, clickable surfaces, forms, etc). 90% of the time, the agent can imply the full functionality, except for the obvious edge cases (which trip up even humans): infinite scrolling, hijacking navigation, etc.

0x696C6961•27m ago

In what world is this simpler than just giving the agent a list of functions it can call?

Mic92•22m ago

So usually MCP tool calls a sequential and therefore waste a lot of tokens. There is some research from Antrophic (I think there was also some blog post from cloudflare) on how code sandboxes are actually a more efficient interface for llm agents because they are really good at writing code and combining multiple "calls" into one piece of code. Another data point is that code is more deterministic and reliable so you reduce the hallucination of llms.

foota•13m ago

What do the calls being sequential have to do with tokens? Do you just mean that the LLM has to think everytime they get a response (as opposed to being able to compose them)?

dvt•19m ago

Who implements those functions? E.g., store.order has to have its logic somewhere.

Mic92•25m ago

Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.

dvt•20m ago

I actually do use cross-platform accessibility shenanigans, but for websites this is rarely as good as just doing like two passes on the DOM, it even figures out hard stuff like Google search (where ids/classes are mangled).

Garlef•15m ago

Question: Are you writing this under the assumption that the proposed WebMCP is for navigating websites? If so: It is not. From what I've gathered, this is an alternative to providing an MCP server.

Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).

gavmor•25m ago

This seems backwards, somehow. Like you're asking for an nth view and an nth API, and services are being asked to provide accessibility bridges redundant with our extant offerings.

Sites are now expected duplicate effort by manually defining schemas for the same actions — like re-describing a button's purpose in JSON when it's already semantically marked up?

foota•15m ago

No, I don't think you're thinking about this right. It's more like hacker news would expose an MCP when you visit it that would present an alternative and parallel interface to the page, not "click button" tools.

wongarsu•18m ago

Now we just need a proxy server that automatically turns any API with published openapi spec into a WebMCP server, and we've completed the loop

behindsight•12m ago

I'm building this. Initially it was to do codegen for tools/sdks/docs but will incorporate webmcp as part of it.

I wanted to make FOSS codegen that was not locked behind paywalls + had wasm plugins to extend it.

nip•5m ago

The web was initially meant to be browsed by desktop computers.

Then came mobile phones and their touch control which forced the web to adapt: responsive design.

Now it’s the turn of agents that need to see and interact with websites.

Sure you could keep on feeding them html/js and have them write logic to interact with the page, just like you can open a website in desktop mode and still navigate it: but it’s clunky.

Don’t stop at the name “MCP” that is debased: it’s much bigger than that

Garlef•5m ago

I think this is a good idea.

The next one would be to also decouple the visual part of a website from the data/interactions: Let the users tell their in-browser agent how to render - or even offer different views on the same data. (And possibly also WHAT to render: So your LLM could work as an in-website adblocker for example; Similar to browser extensions such as a LinkedIn/Facebook feed blocker)

Ministry of Justice orders deletion of the UK's largest court reporting database

Use Protocols, Not Services

WebMCP Proposal

What Your Bluetooth Devices Reveal About You

Ghidra by NSA

Running My Own XMPP Server

Qwen3.5: Towards Native Multimodal Agents

How to take a photo with scotch tape (lensless imaging) [video]

Show HN: Simple org-mode web adapter

Looks: A Halide Mark III Preview

I’m joining OpenAI

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Robert Duvall Dead at 95

Rolling your own serverless OCR in 40 lines of code

History of AT&T Long Lines

UK Discord users were part of a Peter Thiel-linked data collection experiment

Show HN: 2D Coulomb Gas Simulator

planckforth: Bootstrapping a Forth interpreter from hand-written tiny ELF binary

PCB Rework and Repair Guide [pdf]

The Sideprocalypse

Show HN: Nerve: Stitches all your data sources into one mega-API

Show HN: Jemini – Gemini for the Epstein Files

Show HN: Maths, CS and AI Compendium

MessageFormat: Unicode standard for localizable message strings

Richard Carrington's first portrait has been found

Modern CSS Code Snippets: Stop writing CSS like it's 2015

"I Was a Director at Amex When They Started Replacing Us with $30K Workers" [video]

Anthropic tries to hide Claude's AI actions. Devs hate it

Vim-pencil: Rethinking Vim as a tool for writing

Magnus Carlsen Wins the Freestyle (Chess960) World Championship

Ministry of Justice orders deletion of the UK's largest court reporting database

Use Protocols, Not Services

WebMCP Proposal

What Your Bluetooth Devices Reveal About You

Ghidra by NSA

Running My Own XMPP Server

Qwen3.5: Towards Native Multimodal Agents

How to take a photo with scotch tape (lensless imaging) [video]

Show HN: Simple org-mode web adapter

Looks: A Halide Mark III Preview

I’m joining OpenAI

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Robert Duvall Dead at 95

Rolling your own serverless OCR in 40 lines of code

History of AT&T Long Lines

UK Discord users were part of a Peter Thiel-linked data collection experiment

Show HN: 2D Coulomb Gas Simulator

planckforth: Bootstrapping a Forth interpreter from hand-written tiny ELF binary

PCB Rework and Repair Guide [pdf]

The Sideprocalypse

Show HN: Nerve: Stitches all your data sources into one mega-API

Show HN: Jemini – Gemini for the Epstein Files

Show HN: Maths, CS and AI Compendium

MessageFormat: Unicode standard for localizable message strings

Richard Carrington's first portrait has been found

Modern CSS Code Snippets: Stop writing CSS like it's 2015

"I Was a Director at Amex When They Started Replacing Us with $30K Workers" [video]

Anthropic tries to hide Claude's AI actions. Devs hate it

Vim-pencil: Rethinking Vim as a tool for writing

Magnus Carlsen Wins the Freestyle (Chess960) World Championship

WebMCP Proposal

Comments