A2UI: A Protocol for Agent-Driven Interfaces

164•makeramen•1mo ago

Comments

tasoeur•1mo ago

In an ideal world, people would be implementing UI/UX accessibility in the first place, and a lot of those problems would be solved in the first place. But one can also hope that having the motivation to get agents running on those things could actually bring a lot of accessibility features to newer apps.

qsort•1mo ago

This is very interesting if used judiciously, I can see many use cases where I'd want interfaces to be drawn dynamically (e.g. charts for business intelligence.)

What scares me is that even without arbitrary code generation, there's the potential for hallucinations and prompt injection to hit hard if a solution like this isn't sandboxed properly. An automatically generated "confirm purchase" button like in the shown example is... probably something I'd not make entirely unsupervised just yet.

jy14898•1mo ago

I never want to unknowingly use an app that's driven this way.

However, I'm happy it's happening because you don't need an LLM to use the protocol.

mbossie•1mo ago

So there's MCP-UI, OpenAI's ChatKit widgets and now Google's A2UI, that I know of. And probably some more...

How many more variants are we introducing to solve the same problem. Sounds like a lot of wasted manhours to me.

MrOrelliOReilly•1mo ago

I agree that it's annoying to have competing standards, but when dealing with a lot of unknowns it's better to allow divergence and exploration. It's a worse use of time to quibble over the best way to do things when we have no meaningful data yet to justify any decision. Companies need freedom to experiment on the best approach for all these new AI use cases. We'll then learn what is great/terrible in each approach. Over time, we should expect and encourage consolidation around a single set of standards.

pscanf•1mo ago

> when dealing with a lot of unknowns it's better to allow divergence and exploration

I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.

For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).

¹ https://fly.io/blog/everyone-write-an-agent/

kridsdale3•1mo ago

I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.

The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.

askl•1mo ago

Obligatory https://xkcd.com/927/

zeroasterisk•1mo ago

Oh yes. I send this all the time. And I also see the irony.

I can justify A2UI as doing something not otherwise accomplishable in the market today, but you saw how long the blog post was trying to explain that. :shrug:

mystifyingpoi•1mo ago

> Sounds like a lot of wasted manhours to me

Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.

kridsdale3•1mo ago

I'm one of these people. We have to start working on the problem many months before the competition announces that they exist. So we are all just doing parallel evolution here. Everyone agrees that to sit and wait for a standard means you wouldn't waste energy, but you'd also have no influence.

Like you mentioned, its a good time to be employed.

hobofan•1mo ago

MCP-UI and OpenAI Apps are converging into the MCP Apps extension specification: https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-ap...

p_v_doom•1mo ago

We should make one new standard for everyone to use ...

shireboy•1mo ago

AGUI sounds similar: https://github.com/ag-ui-protocol/ag-ui

epec254•1mo ago

Same team! AGUI uses a2UI as the protocol under the hood.

swiftlyTyped•1mo ago

Hi, one of the AG-UI authors here.

AG-UI is a launch partner of A2UI, but it is a separate project by CopilotKit, not google.

We have a day-0 handshake between AG-UI & A2UI

zeroasterisk•1mo ago

And thank you CopilotKit team!

I think AG UI is great if you are building the UI and the Agent at the same time and want a high bandwidth sync between them and the UI supports AG UI as an adaptor layer (they have done a lot of work making this easier for folks).

A2UI is most interesting for it's LLM generation options (not tools but structured output), it's remote message passing options (if you don't own the UI), and it's general-purpose-ness (same fairly simple standard which can work for many models, transports, and renderers).

They do fit nicely together. Sorry the naming conventions are complicated. There are 2 hard things in computer science: naming things, cache invalidation, off by one errors.

meander_water•1mo ago

This provides a bit more detail on how they relate to each other

https://www.copilotkit.ai/ag-ui-and-a2ui

adamesque•1mo ago

Unlike many of those approaches which concern themselves with delivery of human-designed static UI, this seems to be a tool designed to support generative UIs. I personally think that's a non-starter and much prefer the more incremental "let the agent call a tool that renders a specific pre-made UI" approach of MCP UI/Apps, OpenAI Apps SDK, etc for now.

zeroasterisk•1mo ago

Legitimate curiosity - why?

Making an agent call a tool to manipulate a UI does feel like normal application development and an event driven interaction... I get that.

What else drives your preference?

raybb•1mo ago

Is there a standard protocol for the way things like Cline sometimes give you multiple choice buttons to click on? Or how does that compare to something like this?

codethief•1mo ago

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

(emphasis mine)

Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.

rockwotj•1mo ago

this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...

mentalgear•1mo ago

It still needs language-specific libraries [1] (and no sveltekit even announced yet :( ).

[1] https://a2ui.org/renderers/

ddrdrck_•1mo ago

Well it is open source and they expect the community to add more renderers. So if you are a sveltekit specialist this could actually be an opportunity.

epec254•1mo ago

Plus 1! We’d love community contributions here!

zeroasterisk•1mo ago

I tell you what, I'll add Svelte/Kit to the list we want to target:

https://github.com/google/A2UI/pull/352

Thanks for the recommendation.

hurturue•1mo ago

platform independent UIs exist - HTML and Electron

kridsdale3•1mo ago

Sure. HTML is a Markup-Language (it's in the acronym). Markdown is also a Markup Language. LLMs are super good at Markdown and just about every chatbot frontend now has a renderer built in.

A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.

I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.

My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.

codethief•1mo ago

HTML is not rendered to native widgets (or barely so, in a few <select> cases).

hulitu•1mo ago

They run in a browser.

giancarlostoro•1mo ago

I've thought about how to write a platform independent UI framework that doesn't care what language you write it in, and every time I find myself reinventing X.org or at least my gut tells me I'm just reinventing a cross-platform X server implementation.

observationist•1mo ago

Nope, it's just a repackaging of the same problem, except in this case, the problem is solved with APIs and CLI and not jumping through hoops in order to get the AI to do what humans do.

It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.

It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.

Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.

evalstate•1mo ago

I quite like the look of this one - seems to fit somewhere between the rigid structure of MCP Elicitations and the freeform nature of MCP-UI/Skybridge.

lowsong•1mo ago

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

Why the hell would anyone want this? Why on earth would you trust an LLM to output a UI? You're just asking for security bugs, UI impersonation attacks, terrible usability, and more. This is a nightmare.

vidarh•1mo ago

If done in chat, it's just an alternative to talking to you freeform. Consider Claude Code's multiple-choice questions, which you can trigger by asking it to invoke the right tool, for example.

DannyBee•1mo ago

None of the issues go away just because it's in chat?

Freeform looks and acts like text, except for a set of things that someone vetted and made work.

If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

Now, in this case, it's not arbitrary UI, but if you believe that the parsing/validation/rendering/two way data binding/incremental composition (the spec requires that you be able to build up UI incrementally) of these components: https://a2ui.org/specification/v0.9-a2ui/#standard-component...

as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.

Here, i'll sell it to you in gemini, just click a few times on the "totally safe text box" for me before you sign your name.

My friend once called something a babydoggle - something you know will be a boondoggle, but is still in its small formative stages.

This feels like a babydoggle to me.

vidarh•1mo ago

> None of the issues go away just because it's in chat?

There is a wast difference in risk between me clicking a button provided by Claude in my Claude chat, on the basis of conversations I have had with Claude, and clicking a random button on a random website. Both can contain a malicious. One is substantially higher risk. Separately, linking a UI constructed this way up to an agent and let third parties interact with it, is much riskier to you than to them.

> If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

In that scenario, the UI elements are irrelevant barring a buggy implementation (yes, I've read the rest, see below), as you can achieve the same things as you can do that way with just presenting the user with a basic link and telling them to press it.

> as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.

I very much doubt we'll see many implementations that won't just use a web view for this, and I very much doubt these issues will even fall in the top 10 security issues people will run into with AI tooling. Sure, there will be bugs. You can use this argument against anything that requires changes to client software.

But if you're concerned about the security of clients, mcp and hooks is a far bigger rats nest of things that are inherently risky due to the way they are designed.

wongarsu•1mo ago

I wouldn't want this anywhere near production, but for rapid prototyping this seems great. People famously can't articulate what they want until they get to play around with it. This lets you skip right to the part where you realize they want something completely different from what was first described without having to build the first iteration by hand

turnsout•1mo ago

Honestly the point of this is not to help app developers—it's to replace the need for apps altogether.

The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.

As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.

verdverm•1mo ago

This sounds like they creators think that even more duplicative apps that no one knows how it works or what the code even looks like... is a better idea?

How many times are users going to spin GPUs to create the same app?

turnsout•1mo ago

If Google's paying for the GPU time, I guess it's up to them how they want to cache apps for frequently-used queries. Glad I'm not paying for it!

_pdp_•1mo ago

I am fan of using markdown to describe the UI.

It is simple, effective and feels more native to me than some rigid data structure designed for very specific use-cases that may not fit well into your own problem.

Honestly, we should think of Emacs when working with LLMs and kind of try to apply the same philosophy. I am not a fan of Emacs per-se but the parallels are there. Everything is a file and everything is a text in a buffer. The text can be rendered in various ways depending on the consumer.

This is also the philosophy that we use in our own product and it works remarkably well for diverse set of customers. I have not encountered anything that cannot be modelled in this way. It is simple, effective and it allows for a great degree of flexibility when things are not going as well as planned. It works well with streaming too (streaming parsers are not so difficult to do with simple text structures and we have been doing this for ages) and LLMs are trained very well how to produce this type of output - vs anything custom that has not been seen or adopted yet by anyone.

Besides, given that LLMs are getting good at coding and the browser can render iframes in seamless mode, a better and more flexible approach would be to use HTML, CSS and JavaScript instead of what Slack has been doing for ages with their block kit API which we know is very rigid and frustrating to work with. I get why you might want to have a data structures for UI in order to cover CLI tools as well but at the end of the day browsers and clis are completely different things and I don not believe you can meaningfully make it work for both of them unless you are also prepared to dumb it down and target only the lowest common dominator.

pedrozieg•1mo ago

We’ve had variations of “JSON describes the screen, clients render it” for years; the hard parts weren’t the wire format, they were versioning components, debugging state when something breaks on a specific client, and not painting yourself into a corner with a too-clever layout DSL.

The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.

nsonha•1mo ago

What's agent/AI specific about this? Seems just backend-driven UI

mentalgear•1mo ago

The way to do this would be to come together and design a common W3C-like standard.

iristenteije•1mo ago

I think ultimately GenUI can be integrated into apps more seamlessly, but even if today it's more in context of chat interfaces with prompts, I think it's clear that a wall of text isn't always the best UX/output and it's already a win.

ceuk•1mo ago

A few days ago I was predicting to some colleagues a revival of ideas around "server-driven UI" (which never really seemed to catch on) in order to facilitate agentic UIs.

Feels good to have been on the money, but I'm also glad I didn't start a project only to be harpooned by Google straight away

kridsdale3•1mo ago

Server Driven UI has absolutely caught on. Not including all the Electron apps out there, things like Instagram's native mobile apps have about half of their screens being SDUI at this point because Meta needs to be able to change them instantly, not with a 3 week release cycle.

ceuk•1mo ago

Didn't know Instagram used it, that's cool

empath75•1mo ago

I couldn't get this to work with the default model because it's overloaded, but I tried flash-lite, which at least gave me a response, but it only presents an actual UI 1/3rd of the time that I tried the suggested questions in the demo, and otherwise it attempts to ask me a question which doesn't present a ui at all or even do anything in the app -- i had to look at the logs to see what it was trying to do.

barbazoo•1mo ago

This sounds like a way to have the LLM client render dynamic UI. Is this for use during the chat session or yet another way to build actual applications?

epec254•1mo ago

Google PM here. Right now, it’s designed for rendering UI widgets inline with a chat conversation - it’s an extension to a2a that lets you stream JSON defining UI components in addition to chat messages.

kridsdale3•1mo ago

Google SWE working in this space here. Look up my username (minus the digit) on Moma, let's talk. I can't ID you from your HN handle.

zwarag•1mo ago

Could this be the link that allows designers to design a UI in Figma and let an agent build it via A2UI?

uptownhr•1mo ago

My approach/prototype using XState with websockets from an MCP server https://github.com/uptownhr/mcp-agentic-ui

oddrationale•1mo ago

Seems similar to [Adaptive Cards](https://adaptivecards.io/). Both have a JSON-based UI builder system.

awei•1mo ago

I see how useful a universal UI language working across platforms is, but when I look at some examples from this protocol, I have the feeling it will eventually converge to what we already have, html. Instead of making all platforms support this new universal markup language, why not make them support html, which some already do, and which llms are already trained on.

Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }

{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }

epec254•1mo ago

A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform? This is a common scenario in the enterprise version of these apps - eg I want to use the agent from (insert saas vendor) alongside my company’s home grown agents and data.

Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.

awei•1mo ago

Right this makes sense, I wonder if it would then be a good idea to abstract html to JSON, making it impossible to include css and js into it

epec254•1mo ago

Curious to learn more what you are thinking?

One challenge is you do likely want JS to process/capture the data - for example, taking the data from a form and turning it into json to send back to the agent

oooyay•1mo ago

If you play with A2UIs generator that's effectively what it does, just layer of abstraction or two above what you're describing.

awei•1mo ago

That's what I thought too skimming through the documentation, my thinking is that since it does that, which makes sense to avoid script injection, why not do it with "jsonized" html.

oooyay•1mo ago

I was thinking that raw html might be too verbose, but canned components have signatures and types.

lunar_mycroft•1mo ago

If the JSON protocol in question supports arbitrary behaviors and styles, then you still have an injection problem even over JSON. If it doesn't support them you don't need to support those in an HTML protocol either, and you can solve the injection problem the way we already do: sanitizing the HTML to remove all/some (depending on your specific requirements) script tags, event listeners, etc.

epicurean•1mo ago

Perhaps the protocol, is then html/css/js in a strict sandbox. Component has no access to anything outside of component bounds (no network, no dom/object access, no draw access, etc).

awei•1mo ago

I think you can do that with an iframe, but it always makes me nervous

hulitu•1mo ago

> A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform?

Just like you do with your web browser. A web browser is a Remote Code Execution engine.

mannanj•1mo ago

I want instead of being told “here’s what I think you want to see, now look at it”, “what do you want to see?” And be shown that.

Yes yes we claim the user doesn’t know what they want. I think that’s largely used as an excuse to avoid rethinking how things should meet the users needs and keep status quo where people are made to rely on systems and walled gardens. The goal of this article is UIs should work better for the user. What better way then to let them imagine (or even nudge them with example actions, buttons, text to click to render specific views) in the UI! I’ve been wanting to build something where I just ask in English from options I know I have or otherwise play and hit edges to discover what’s possible and not.

Anyone else thinking along this direction or think I’m missing something obvious here?

alexgotoi•1mo ago

So we're reinventing SOAP but for AI agents. Not saying that's bad - sometimes you need to remake old mistakes before you figure out what actually works.

The real question: do UIs even make sense for agents? Like the whole point of a UI is to expose functionality to humans with constraints (screens, mice, attention). Agents don't have those constraints. They can read JSON, call APIs directly, parse docs. Why are we building them middleware to click buttons?

I think this makes sense as a transition layer while we figure out what agent-native architecture looks like. But long-term it's probably training wheels.

Will include this in my https://hackernewsai.com/ newsletter.

kridsdale3•1mo ago

The need here is at some point an agent has to produce an output that is consumed by a human with eyes. A pixel grid on a screen is far more high bandwidth to send information to a human than a linear string of text.

ChrisArchitect•1mo ago

Blog post: https://developers.googleblog.com/introducing-a2ui-an-open-p...

jadelcastillo•1mo ago

I think this is a good and pragmatic way to approach the use of LLM systems. By translating to an intermediate language, and then processing further symbolically. But probably you can be prompt injected also if you expose sensible "tools" to the LLM.

verdverm•1mo ago

Am I reading (7) of the data flow correctly?

1. Establish SSE connection

... user event

7. send updates over origin SSE connection

So the client is required to maintain an SSE capable connection for the entire chat session? What if my network drops or I switch to another agent?

Seems an onerous requirement to maintain a connection for the life-time of a session, which can span days (as some people have told us they have done with agents)

skybrian•1mo ago

It seems like latency will be poor if you have to wait for a server-side round trip to an LLM to update the UI whenever you press a button?

In a context where you're chatting with an LLM, I suppose the user would expect some lag, but it would be unwelcome in regular apps.

This also means that a lot of other UI performance issues don't matter - form submission is going to be slow anyway, so just be transparent about the delay.

France's homegrown open source online office suite

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Reinforcement Learning from Human Feedback

Coding agents have replaced every framework I used

The Waymo World Model

Vocal Guide – belt sing without killing yourself

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Ga68, a GNU Algol 68 Compiler

Google staff call for firm to cut ties with ICE

Making geo joins faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Cross-Region MSK Replication: K2K vs. MirrorMaker2

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

What Is Ruliology?

Microsoft open-sources LiteBox, a security-focused library OS

An Update on Heroku

Dark Alley Mathematics

How to effectively write quality code with AI

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Was Benoit Mandelbrot a hedgehog or a fox?

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

I now assume that all ads on Apple news are scams

PC Floppy Copy Protection: Vault Prolok

France's homegrown open source online office suite

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Reinforcement Learning from Human Feedback

Coding agents have replaced every framework I used

The Waymo World Model

Vocal Guide – belt sing without killing yourself

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Ga68, a GNU Algol 68 Compiler

Google staff call for firm to cut ties with ICE

Making geo joins faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Cross-Region MSK Replication: K2K vs. MirrorMaker2

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

What Is Ruliology?

Microsoft open-sources LiteBox, a security-focused library OS

An Update on Heroku

Dark Alley Mathematics

How to effectively write quality code with AI

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Was Benoit Mandelbrot a hedgehog or a fox?

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

I now assume that all ads on Apple news are scams

PC Floppy Copy Protection: Vault Prolok

A2UI: A Protocol for Agent-Driven Interfaces

Comments