The examples focus mostly on extensions injecting clients at website load time, but you can ship a client with your server javascript. That being said, if the client and server live in the the same script I recommend just using the InMemoryTransports from the official SDK.
Or, in other words, it helps users get around all the bullshit that actually makes money for the business. Ads, upsells, cross-marketing opportunities. Engagement. LLMs help users avoid all that, and adding an MCP to your site makes it trivial for them.
Maybe I'm too cynical, but think about this: the technologies we needed for this level of automation became ubiquitous decades ago. This was, after all, also the original hype behind APIs, that burned bright on the web for a year or two - before everyone realized that letting people interact with webservices the way they want is bad for business, and everything closed down behind contracts backed by strong auth. Instead of User Agents connecting diverse resources of the web for the benefit of users, we got... Zapier. That's what the future of service-side MCPs on the web is.
But user scripts were always a way to let at least power users show a middle finger to "attention economy" and force some ergonomy out of web apps that desperately try to make users waste time. Giving users the ability to turn any website into MCP, regardless of whether that website wants it, will supercharge this. So for better or worse, that's where the future is.
Adversarial interoperability remains the name of the game.
MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.
Instead of visual parsing like Playwright, you get standard deterministic function calls.
You can see the blog post for code examples: https://mcp-b.ai/blogs
It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)
I mean all this MCP stuff certainly seems useful even though this example isn't so good, the bigger uses will be when larger APIs and interactions are offered by the website like "Make a purchase" or "sort a table" and the AI would have to implement very complex set of DOM operations and XHR requests to make that happen and instead of flailing to do that, it can call an MCP tool which is just a js function.
MCP-B doesn't do any DOM parsing. It exchanges data purely over browser events.
I doubt that, first and not least because Home Depot stocks lumber.
It’s manufacturing all the way down.
Don’t sell yourself short.
table()
Is the extension itself open source? Or only the extension-tools?
In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?
The extension itself is a MCP server which can be connected to by other extension over cross extension messaging. Since the extension is part of the protocol, I'd like for the community to pull from the same important parts of the extension (MCPHub, content script) so they are consistent across extension implementations.
Haven't had enough time to look through all the code there - interesting problem I guess since a single domain could have multiple accounts connected (ex: gmail w/ account 0 vs account 1 in different tabs) or just a single account (ex: HN).
Like you said there are some edge cases where two tabs of the same website expose different tool sets or have tools of the same name but would result in different outcomes when called.
Curios if you have any thoughts on how to handle this
MiguelsPizza | 3 commits | 89++ | 410--
claude | 2 commits | 31,799++ | 0--
https://github.com/MiguelsPizza/WebMCP/commit/26ec4a75354b1c...
MiguelsPizza / Alex Nahas
He admits it here https://news.ycombinator.com/item?id=44516104
Because for sure they didn't DMCAed their way to own that account, right? Right?
--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page
I'll need to look a bit more, but at a glance, MCP-B is more putting the onus of browser automation (i.e. how the agent will interact with the web page) on the website owner. They get to expose exactly the functionality they want to the agent
Would like to just provide a runtime for an LLM to solve captchas.
My main focus is (anti) bot detection.
There's space for both IMO. The more generic tool that figures it out on it's own, and the streamlined tool that accesses a site's guiderails. There's also the backend service of course which doesn't require the browser or UI, but as he describes this entails complexity around authentication and I would assume discoverability.
The bigger challenge I think is figuring out how to build MCPs easily for SaaS and other legacy portals. I see some push on the OpenAPI side of things which is promising but requires you to make significant changes to existing apps. Perhaps web frameworks (rails, next, laravel, etc) can agree on a standard.
The premise of MCP-B is that it's in fact not easy to reliably navigate websites today with LLMs, if you're just relying on DOM traversal or computer vision.
And when it comes to authenticated and read/write operations, I think you need the reliability and control that comes from something like MCP-B, rather than just trusting the LLM to figure it out.
Both Wordpress and Shopify allow users to heavily customize their front-end, and therefore ship garbage HTML + CSS if they choose to (or don't know any better). I certainly wouldn't want to rely on LLMs parsing arbitrary HTML if I'm trying to automate a purchase or some other activity that involves trust and/or sensitive data.
Some middle ground where an agent reverse engineers the api as a starting point would be cool, then is promoted to use the "official" mcp api if a site publishes it.
For react in particular, lots of the form ecosystem (react hook form) can be directly ported to MCP tools. I am currently working on a zero config react hook form integration.
But yes, MCP-B is more "work" than having the agent use the website like a user. The admission here is that it's not looking like models will be able to reliably do browser automation like humans for a while. Thus, we need to make an effort to build out better tooling for them (at least in the short term)
wonder if it was inspired by `broadcast-mcp` [1] (hackathon project by me and a friend from may based on the same concept but not fleshed out)
this concept is awesome- glad someone really fleshed it out
Only nitpick is that the home page says "cross-browser" at the bottom but the extension is only available for Chrome..
You can ask an agent to browse a web page and click a button etc. They will work out how to use a browser automation library.
But it’s not worth the cost, time spent waiting or the inconsistency between implementations.
MCP just offloads that overload, much like how they can use bash tools when they are quite capable of writing an implementation of grep etc.
We should be focusing on llms using self discovery to figure out information.
Can you expand? What does that mean, and why is the right (or better) pathTwo different goals.
Better get ready to quit your day job and get funded buddy, as my 30 years worth of tech instincts tell me this will take off vertically!
This basically leaves up to the user to establish authenticated session manually.
Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?
"The Auth problem At this point, the auth issues with MCP are well known. OAuth2.1 is great, but we are basically trying to re-invent auth for agents that act on behalf of the user. This is a good long term goal, but we are quickly realizing that LLM sessions with no distinguishable credentials of their own are difficult to authorize and will require a complete re-imagining of our authorization systems. Data leakage in multi-tenant apps that have MCP servers is just not a solved problem yet.
I think a very strong case for MCP is to limit the amount of damage the model can do and the amount of data it will ever have access to. The nice thing about client side APIs in multi-tenant apps is they are hopefully already scoped to the user. If we just give the model access to that, there's not much damage they can do.
It's also worth mentioning that OAuth2.1 is basically incompatible with internal Auth at Amazon (where I work). I won't go to much into this, but the implications of this reach beyond Amazon internal."
1. Oauth is not working in Amazon ==> need solution.
2. Oauth are difficult to authorize
3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".
I feel from a security side there is an issue here in this logic.
Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.
Oauth not implemented in Amazon, is not really an issue.
Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.
You might just copy your session ID/ Cookie and do the same with an MCP.
I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.
Paraphrasing: Connect your editor's assistant to your web framework runtime via MCP and augment your agentic workflows and chats with: Database integration; Logs and runtime introspection; Code evaluation; and Documentation context.
Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.
What do we have to do that's so important we need AI, and not a chat AI but AI on steroids (supposedly)?
Example:
1. An author has a website for their self-published book. It currently checks book availability with their database when add to cart is clicked.
2. The website publishes "check book availability" and "add to cart" as "tools", using this MCP-B protocol.
3. A user instructs ChatGPT or some AI agent to "Buy 3 copies of author's book from https://theirbooksite"
4. The AI agent visits the site. Finds that it's MCP-B compliant. Using MCP-B, it gets the list of available tools. It finds a tool called "check book availability", and uses it to figure out if ordering 3 copies is possible. If yes, it'll next call "add to cart" tool on the website.
The website here is actively cooperating with the agent/LLM and supplying structured data. Instead of being a passive collection of UI elements that AI chatbots have to figure out based on UI layouts or UI captions, which are generally very brittle approaches.
But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.
Has anyone tried any MCP for improving accessibility?
How so?
So people like ticket sales sites, eBay etc. It will make it easier for those sites to have all the tickets purchased or for auctions to be sniped etc.
FWIU, these sort of sites actually (currently at least) put on measures to try and stop bots using them for these reasons.
“Use it, but not like that” is not a legitimate position to take.
Sounds like a very strange world of robots fighting robots
I have a fundamental question though: how is it different from directly connecting my web app's JS APIs with tool calling functions and talking directly with a LLM server with tool-call support?
Is it the same thing, but with a protocol? or am I missing the bigger picture?
It's a protocol which allows the user to bring their own model to interact with the tools on your website
Meaning what? RSS remains ubiquitous. It’s rare to find a website which doesn’t support it, even if the owners don’t realise it or link to it on their page. RSS remains as useful as it ever was. Even if some websites only share partial post content via RSS, it’s still useful to know when they are available (and can be used as an automation hook to get the full thing).
RSS is alive and well. It’s like if you wrote “this will go the same way as the microwave oven”.
Google killed Google Reader. (Other products exist you can use instead.)
Facebook removed support for RSS feeds. (You can replace it with third party tools or API calls.)
It’s not dead dead, but it did seem to lose some momentum and support over time on several fronts.
It’s not dead, period. Not dead, dead dead, dead dead dead, or any other combination.
Yes, some integrations were removed, but on the whole you have more apps and services for it than ever. The death of the behemoth that was Google Reader was a positive there.
Maybe fewer people are using it, but the technology itself is fine and continues to be widely available and supported by most websites, which was the point.
Maybe Facebook and Instagram don’t have RSS access, but you can’t even navigate two pages on them without an account, anyway. They are closed to everything they don’t control, which has nothing to do with RSS.
"Real" RSS gives you the whole content. The blog platform I use does this, for example. They are not greedy people and just want to provide a blog platform so, they use the thing as it's supposed to be.
Remember the time when REST was the new hot thing, everyone started doing API-first design, and people thought it'll empower people by letting programs navigate services for them programmatically? Remember when "mashups" were the future?
It all died before it could come to pass, because businesses quickly remembered that all their money comes specifically from denying users those capabilities.
bustodisgusto•13h ago
This was an idea I had while trying to build MCP servers internally at Amazon. Today I am open sourcing it. TLDR it's an extension of the Model Context Protocol which allows you to treat your website as an MCP server which can be discovered and called by MCP-B compliant web extensions.
You can read a more detailed and breakdown here (with gifs): https://mcp-b.ai/blogs
bustodisgusto•13h ago