Beyond Text: On-Demand UI Generation for Better Conversational Experiences

https://blog.fka.dev/blog/2025-05-16-beyond-text-only-ai-on-demand-ui-generation-for-better-conversational-experiences/

64•fka•10h ago

Comments

exe34•8h ago

I was hoping to do this over IRC but never got around to implementing it. I hate the idea of implementing a whole website/chat system, when they already exist. I'd like to use it for my (currently in-existent) home automation communication.

fka•7h ago

Perfect home automation never exists.

maxcan•8h ago

Video isn't loading.

fka•8h ago

I think it’s because of the video format.

https://x.com/fkadev/status/1923102445799927818?s=46

casey2•8h ago

If it could have been done it would have by now

fka•7h ago

You can say this for all kind of inventions and new ideas.

revskill•7h ago

Startups fo not have enough efforts to impriove ux, that is why we have jira.

utku1337•7h ago

looks very useful

joshstrange•7h ago

Related, it’s crazy to me that OpenAI hasn’t already done something like this for Deep Research.

After your initial question, it always follows up asking some clarifying questions, but it’s completely up to the user to format their responses and I always wonder if people are sloppy if the LLM gets confused. It would make much more sense for OpenAI to break out each question and have a dedicated answer box. That way the user’s response can be consistent and there’s less of a chance they make a mistake or forget to answer a question.

fka•7h ago

OpenAI would implement this within a minute or smth I guess.

wddlz•4h ago

Sorry for the shameless plug but, we recently published this research on 'Dynamic Prompt Middleware' (https://www.iandrosos.me/images/chiwork25-27.pdf) as a potential approach for this. Basically, based on the user's prompt (and some other bits of context), we generate UX containing prompt refinements for users to quickly select answers to and do the prompting for the user.

fka•4h ago

Didn't read the paper but sounds like a similar idea.

ics•3h ago

Very neat paper, thanks for sharing. Being able to interact with a model through, say, Jupyter Notebook in this way would be amazing especially.

aatd86•6h ago

that's not a very innovative idea or even better UX. I think that the future wil have to do with voice commands and mcps will be the backend, exposing capabilities.

ActionHank•5h ago

Because we are all going to be in our open planned offices shouting into the void hoping it poops out the app we want?

aatd86•4h ago

because you really think the AI can predict the perfect UX for human consumption out of the blue instead of simply using human made components?

AI or not won't change these sorts of UI too much.

fka•4h ago

We don't do most of our jobs with our voice. "Click" interaction is still an important one.

aatd86•4h ago

there is no benefit in it being AI generated though. There is a closed set of interaction behaviors.

When you want to order a pizza, you won't have to click. Just browse and ask the AI assistant to place an order as you would in a restaurant. Better UX.

fka•4h ago

Yep, that's why it's "on-demand". With LLMs, you won't need to fill the form, it's an optional interaction makes your UX process better. Please read the post and then comment :) You're possibly commenting on the title.

aatd86•4h ago

No I read the post. I had actually read it before I think even. But I am not convinced by the on demand part.

Isn't on demand what chat llms already do nowadays btw?

point being that generating visual UI components is easy. chatgpt does it. server driven UI does it.

But multimodal interaction is something else that goes further.

fka•4h ago

Well, AI might ask you to choose a color. Now, is it better to show a color picker UI or just ask for the name?

You might say naming the color is enough, but in reality, a color picker is the more natural way to interact.

As humans, we don’t communicate only through words. Other forms of interaction matter too.

aatd86•3h ago

Yes but the AI is not creating these components from zero is it (on demand part)?

It will probably have access to a list of components with their specifications, especially the type of data that the components allow to mutably (or not) represent.

Or respond to a query from a database by presenting a graph automatically.

But the hard part is to turn natural language into a sql query in my opinion. It's not really the choice of data representation which is heavily informed by the data itself (type and value) and doesn't require much inference.

fka•3h ago

I still do think you haven’t read the post :D

ActionHank•5h ago

I really believe this is the future.

Conversations are error prone and noisy.

UI distills down the mode of interaction into something defined and well understood by both parties.

Humans have been able to speak to each other for a long time, but we fill out forms for anything formal.

fka•5h ago

Exactly! LLMs can generate UIs according to user needs. E.g. it can generate simplified or translated ones, on-demand. No need for preset forms or long ones. Just the required ones.

visarga•5h ago

> Conversations are error prone and noisy.

I thought you'd say not being able to reload the form at a later time from the same URL is bad. This would be a "quantum UI" slightly different every time you load it.

ActionHank•5h ago

I think that there will be ways to achieve this.

If you look at many of the current innovations around working with llms and agents, they are largely around constraining and tracking context in a structured way. There will likely be emergent patterns for these sorts of things over time, I am implementing my own approach for now with hopefully good abstractions to allow future portability.

aziaziazi•1h ago

> this is the future

For sure! UIs are also most of the past and present way to interact with a computer, off or online. Even Hacker News - which is mostly text - has some UI for to vote, navigate, flag…

Imagine the mess of a text-field-only interface where you had to type "upvote the upper ActionHank message" or "open the third article’ comments on the front page, the one that talks about On-demand UI generation…" then press enter.

Don’t get me wrong: LLMs are great and it’s fascinating to see experimentations with them. Kudos to the author.

banga•5h ago

Semantic clarity of written prose is hard, but this approach seems like making it easier for the machines rather than the other way around.

jFriedensreich•5h ago

I was working on exactly this in gpt 3 days and still believe ad hoc generation of super specifc and contextual relevant UIs will solve a lot of problems and friction that purely textual or speech based conversational interfaces pose especially if the UI elements like sliders provide some form of live feedback of their effect and are possible to scroll back to or pin and make changes anytime.

WillAdams•5h ago

This always felt like something which the LCARS interface addressed, at least conceptually (though I've never seen an implementation which was more than just a skin).

I'd love to see folks finding the same sort of energy and innovation which was driving early projects such as Momenta and PenPoint and so forth.

bhj•2h ago

Yes, there’s a video where Michael Okuda (with Adam Savage, I think?) recalls the TNG cast being worried about where to tap, and his response was essentially “you can’t press a wrong button“.

wddlz•3h ago

Related to this: Here is some recently published research we did at Microsoft Research on generating UX for prompt refinements based on the user prompt and other context (case study: https://www.iandrosos.me/promptly.html, paper link also in intro).

We found it lowered barriers to providing context to AI, improved user perception of control over AI, and provided users guidance for steering AI interactions.

sheo•1h ago

I think that the example in the article is not a good usecase for this technology. It would be better, cheaper and less error prone to have prebuilt forms that LLM can call like tools, at least for things like changing shipping address

Shipping forms usually need verification of addresses, sometimes they even include a map

Especially if on the other end data that would be inputted in this form, would be stored in the traditional DB

Much better usecase would be use it in something, that is dynamic by nature. For example, advanced prompt generator for image generation models (sliders for size of objects in a scene; dropdown menus with variants of backgrounds or style, instead of usual lists)

jmull•42m ago

This seems much worse than the typical pre-AI mechanism of navigating to and clicking on a "Change Delivery Address" button.

I don't know why you wouldn't develop whatever forms you wanted to support upfront and make them available to the agent (and hopefully provide old-fashioned search). You can still use AI to develop and maintain the forms. Since the output can be used as many times as you want, you can probably use more expensive/capable models to develop the forms rather than cheaper/faster but less capable models that you're probably limited to for customer service.

Thoughts on Thinking

A Research Preview of Codex

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED

X X^t can be faster

I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA

MIT asks arXiv to take down preprint of paper on AI and scientific discovery

The Magic Hours: The Films and Hidden Life of Terrence Malick

New 'Superdiffusion' Proof Probes the Mysterious Math of Turbulence

The first year of free-threaded Python

Stax Museum Bob Abrahamian Collection

Transformer: The Deep Chemistry of Life and Death

Foundry (YC F24) Is Hiring – Founding Engineer (ML × SWE)

Show HN: Rv, a Package Manager for R

Show HN: Workflow Use – Deterministic, self-healing browser automation (RPA 2.0)

British naval dominance during the age of sail

Show HN: SQL-tString a t-string SQL builder in Python

Tower Defense: Cache Control

Material 3 Expressive

Java at 30: Interview with James Gosling

What were the MS-DOS programs that the moricons.dll icons were intended for?

Taking a look at the next generation of telescopes

Evolution of Rust Compiler Errors

LPython: Novel, Fast, Retargetable Python Compiler (2023)

Sci-Net

Ground control to Major Trial

Ollama's new engine for multimodal models

Returning to My Roots in Hardware

The Awful German Language (1880)

The fastest Postgres inserts

Beyond Text: On-Demand UI Generation for Better Conversational Experiences

Thoughts on Thinking

A Research Preview of Codex

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED

X X^t can be faster

I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA

MIT asks arXiv to take down preprint of paper on AI and scientific discovery

The Magic Hours: The Films and Hidden Life of Terrence Malick

New 'Superdiffusion' Proof Probes the Mysterious Math of Turbulence

The first year of free-threaded Python

Stax Museum Bob Abrahamian Collection

Transformer: The Deep Chemistry of Life and Death

Foundry (YC F24) Is Hiring – Founding Engineer (ML × SWE)

Show HN: Rv, a Package Manager for R

Show HN: Workflow Use – Deterministic, self-healing browser automation (RPA 2.0)

British naval dominance during the age of sail

Show HN: SQL-tString a t-string SQL builder in Python

Tower Defense: Cache Control

Material 3 Expressive

Java at 30: Interview with James Gosling

What were the MS-DOS programs that the moricons.dll icons were intended for?

Taking a look at the next generation of telescopes

Evolution of Rust Compiler Errors

LPython: Novel, Fast, Retargetable Python Compiler (2023)

Sci-Net

Ground control to Major Trial

Ollama's new engine for multimodal models

Returning to My Roots in Hardware

The Awful German Language (1880)

The fastest Postgres inserts

Beyond Text: On-Demand UI Generation for Better Conversational Experiences

Beyond Text: On-Demand UI Generation for Better Conversational Experiences

Comments