Enough AI copilots, we need AI HUDs

https://www.geoffreylitt.com/2025/07/27/enough-ai-copilots-we-need-ai-huds

373•walterbell•9h ago

Comments

keyle•7h ago

> anyone serious about designing for AI should consider non-copilot form factors that more directly extend the human mind.

Aren't auto-completes doing exactly this? It's not a co-pilot in the sense of a virtual human, but already more in the direction of a HUD.

Sure you can converse with LLMs but you can also clearly just send orders and they eagerly follow and auto-complete.

I think what the author might be trying to express in a quirky fashion, is that AI should work alongside us, looking in the same direction as we are, and not being opposite to us at the table, staring at each other's and arguing. We'll have true AI when they'll be doing our bidding, without any interaction from us.

gklitt•6h ago

Author here. Yes, I think the original GitHub Copilot autocomplete UI is (ironically) a good example of a HUD! Tab autocomplete just becomes part of your mental flow.

Recent coding interfaces are all trending towards chat agents though.

It’s interesting to consider what a “tab autocomplete” UI for coding might look like at a higher level of abstraction, letting you mold code in a direct-feeling way without being bogged down in details.

samfriedman•7h ago

On this topic, can anyone find a document I saw on HN but can no longer locate? A historical computing essay, it was presented in a plaintext (monospaced) text page. It outlined a computer assistant and how it should feel to use. The author believed it should be unobtrusive, something that pops into awareness when needed and then gets back out of the way. I don't believe any of the references in TFA are what it was.

thehappypm•6h ago

Designing Calm Technology?

cadamsdotcom•7h ago

Love the idea & spitballing ways to generalize to coding..

Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.

The tests could appear in a separated panel next to your code, and pass/fail status in the gutter of that panel. As simple as red and green dots for tests that passed or failed in the last run.

The presence or absence and content of certain tests, plus their pass/fail state, tells you what the code you’re writing does from an outside perspective. Not seeing the LLM write a test you think you’ll need? Either your test generator prompt is wrong, or the code you’re writing doesn’t do the things you think they do!

Making it realtime helps you shape the code.

Or if you want to do traditional TDD, the tooling could be reversed so you write the tests and the LLM makes them pass as soon as you stop typing by writing the code.

hnthrowaway121•6h ago

Yes the reverse makes much more sense to me. AI help to spec out the software & then the code has an accepted definition of correctness. People focus on this way less than they should I think

callc•5h ago

Humans writing the test first and LLM writing the code is much better than the reverse. And that is because tests are simply the “truth” and “intention” of the code as a contract.

When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.

JimDabell•5h ago

> When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.

You don’t need to write tests for that, you need to write acceptance criteria.

ThunderSizzle•4h ago

As in, a developer would write something in e.g. gherkin, and AI would automatically create the matching unit tests and the production code?

That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.

kamaal•4h ago

All of this at the end reduces to a simple fact at the end of the discussion.

You need some of way of precisely telling AI what to do. As it turns out there is only that much you can do with text. Come to think of it, you can write a whole book about a scenery, and yet 100 people will imagine it quite differently. And still that actual photograph would be totally different compared to the imagination of all those 100 people.

As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers. We could skip to the end of the discussion here, and say you are better of either writing code directly or test cases.

This is just people revisiting logic programming all over again.

motorest•1h ago

> You need some of way of precisely telling AI what to do.

I think this is the detail you are not getting quite right. The truth of the matter is that you don't need precision to get acceptable results, at least in 100% of the cases. As everything in software engineering, there is indeed "good enough".

Also worth noting, LLMs allow anyone to improve upon "good enough".

> As it turns out if you wish to describe something accurately enough, you have to write mathematical statements, in other words statements that reduce to true/false answers.

Not really. Nothing prevents you to refer to high-level sets of requirements. For example, if you tell a LLM "enforce Google's style guide", you don't have to concern yourself with how many spaces are in a tab. LLMs have been migrating towards instruction files and prompt files for a while, too.

kamaal•1h ago

Yes, you are right. But in the sense that a human decides if AI generated code is right.

But if you want a near 100% automation, you need precise way to specify what you want, else there is no reliable way interpreting what you mean. And by that definition lots of regression/breakage has to be endured everytime a release is made.

JimDabell•4h ago

I’m talking higher level than that. Think about the acceptance criteria you would put in a user story. I’m specifically responding to this:

> When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.

You don’t need to personally write code that mechanically iterates over every possible state to remain in the driver’s seat. You need to describe the acceptance criteria.

skydhash•3h ago

I think your perspective is heavily influenced by the imperative paradigm where you actually write the state transition. Compare that to functional programming where you only describe the relation between the initial and final state. Or logic programming where you describe the properties of the final state and where it would find the elements with those properties in the initial state.

Those does not involves writing state transitions. You are merely describing the acceptance criteria. Imperative is the norm because that's how computers works, but there are other abstractions that maps more to how people thinks. Or how the problem is already solved.

JimDabell•2h ago

I didn’t mention state transitions. When I said “mechanically iterate over every possible state”, I was referring to writing tests that cover every type of input and output.

Acceptance criteria might be something like “the user can enter their email address”.

Tests might cover what happens when the user enters an email address, what happens when the user tries to enter the empty string, what happens when the user tries to enter a non-email address, what happens when the user tries to enter more than one email address…

In order to be in the driver’s seat, you only need to define the acceptance criteria. You don’t need to write all the tests.

sitkack•2h ago

Acceptance criteria describes the thing being accepted, it describes a property of the final state.

There is no prescriptive manner in which to deliver the solution, unless it was built into the acceptance criteria.

You are not talking about the same thing as the parent.

motorest•1h ago

> When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.

You're describing the happy path of BDD-style testing frameworks.

JimDabell•1h ago

I know about BDD frameworks. I’m talking higher level than that.

motorest•1h ago

> I know about BDD frameworks. I’m talking higher level than that.

What level do you think there is above "Given I'm logged in as a Regular User When I go to the front page Then I see the Profile button"?

JimDabell•17m ago

The line you wrote does not describe a feature. Typically you have many of those cases and they collectively describe one feature. I’m talking about describing the feature. Do you seriously think there is no higher level than given/when/thens?

motorest•1h ago

> That would be interesting. Of course, gherkin tends to just be transpiled into generated code that is customized for the particular test, so I'm not sure how AI can really abstract it away too much.

I don't think that's how gherkin is used. Take for example Cucumber. Cucumber only uses it's feature files to specify which steps a test should execute, whereas steps are pretty vanilla JavaScript code.

In theory, nowadays all you need is a skeleton of your test project, including feature files specifying the scenarios you want to run, and prompt LLMs to fill in the steps required by your test scenarios.

You can also use a LLM to generate feature files, but if the goal is to specify requirements and have a test suite enforce them, implicitly the scenarios are the starting point.

motorest•1h ago

> You don’t need to write tests for that, you need to write acceptance criteria.

Sir, those are called tests.

kamaal•5h ago

>>Humans writing the test first and LLM writing the code is much better than the reverse.

Isn't that logic programming/Prolog?

You basically write the sequence of conditions(i.e tests in our lingo) that have to be true, and the compiler(now AI) generates code for your.

Perhaps there has to be a relook on how Logic programming can be done in the modern era to make this more seamless.

cjonas•4h ago

Then do you need tests to validate your tests are correct, otherwise the LLM might just generate passing code even if the test is bad? Or write code that games the system because it's easier to hardcode an output value then to do the actual work.

There probably is a setup where this works well, but the LLM and humans need to be able to move across the respective boundaries fluidly...

Writing clear requirements and letting the AI take care of the bulk of both sides seems more streamlined and productive.

cellis•3h ago

The harder part is “test invalidation”. For instance if a feature no longer makes sense, the human / test validator must painstakingly go through and delete obsolete specs. An idea I’d like to try is to “separate” the concerns; only QA agents can delete specs, engineer agents must conform to the suite, and make a strong case to the qa agent for deletion.

scottgg•2h ago

WallabyJS does something along these lines, although I don’t think it is contextually understanding which tests to highlight

https://wallabyjs.com/

andsoitis•2h ago

> Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.

Doesn’t seem like high ROI to run full suite of tests on each keystroke. Most keystrokes yield an incomplete program, so you want to be smarter about when you run the tests to get a reasonably good trade off.

William_BB•2h ago

There's no way this would work for any serious C++ codebase. Compile times alone make this impossible

I'm also not sure how LLM could guess what the tests should be without having written all of the code, e.g. imagine writing code for a new data structure

motorest•1h ago

> There's no way this would work for any serious C++ codebase. Compile times alone make this impossible

There's nothing in C++ that prevents this. If build times are your bogeyman, you'd be pleased to know that all mainstream build systems support incremental builds.

William_BB•1h ago

The original example was (paraphrasing) "rerunning 10-100 tests that take 1ms after each keystroke".

Even with incremental builds, that surely does not sound plausible? I only mentioned C++ because that's my main working language, but this wouldn't sound reasonable for Rust either, no?

motorest•1h ago

> The original example was (paraphrasing) "rerunning 10-100 tests that take 1ms after each keystroke".

Yeah, OP's point is completely unrealistic and doesn't reflect real-world experience. This sort of test watchers is mundane in any project involving JavaScript, and not even those tests re-run at each keystroke. Watch mode triggers tests when they detect changes, and waits for test executions to finish to re-run tests.

This feature consists of running a small command line app that is designed to run a command whenever specific files within a project tree are touched. There is zero requirement to only watch for JavaScript files or only trigger npm build when a file changes.

To be very clear, this means that right now anyone at all, including you and me, can install a watcher, configure it to run make test/cutest/etc when any file in your project is touched, and call it a day. This is a 5 minute job.

By the way, nowadays even Microsoft's dotnet tool supports watch mode, which means there's out-of-the-box support to "rerunning 10-100 tests that take 1ms after each keystroke".

motorest•1h ago

> Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.

I think this is a bad approach. Tests enforce invariants, and they are exactly the type of code we don't want LLMs to touch willy-nilly.

You want your tests to only change if you explicitly want them to, and even then only the tests should change.

Once you adopt that constraint, you'll quickly realize ever single detail of your thought experiment is already a mundane workflow in any developer's day-to-day activities.

Consider the fact that watch mode is a staple of any JavaScript testing framework, and those even found their way into .NET a couple of years ago.

So, your thought experiment is something professional software developers have been doing for what? A decade now?

cadamsdotcom•1h ago

I think tests should be rewritten as much as needed. But to counter the invariant part, maybe let the user zoom back and forth through past revisions and pull in whatever they want to the current version, in case something important is deleted? And then allow “pinning” of some stuff so it can’t be changed? Would that solve for your concerns?

motorest•1h ago

> I think tests should be rewritten as much as needed.

Yes, I agree. The nuance is that they need to be rewritten independently and without touching the code. You can't change both and expect to get a working system.

I'm speaking based on personal experience, by the way. Today's LLMs don't enforce correctness out of the box and agent mode has only one goal: getting things to work. I had agent mode flip invariants in tests when trying to fix unit tests it broke, and I'm talking about egregious changes such as flipping requirements in line with "normal users should not have access to the admin panel" to "normal users should have access to the admin panel". The worst part is that if agent mode is left unsupervised, it will even adjust the CSS to make sure normal users have a seamless experience going through the admin panel.

jawns•6h ago

The author gives an example of a HUD: an AI that writes a debugger program to enable the human developer to more easily debug code.

But why should it be only the human developer who benefits? What if that debugger program becomes a tool that AI agents can use to more accurately resolve bugs?

Indeed, why can't any programming HUD be used by AI tools? If they benefit humans, wouldn't they benefit AI as well?

I think we'll be pretty quickly at the point where AI agents are more often than not autonomously taking care of business, and humans only need to know about that work at critical points (like when approvals are needed). Once we're there, the idea that this HUD concept should be only human-oriented breaks down.

hi_hi•6h ago

Doesn't it all come down to "what is the ideal interface for humans to deal with digital information"?

We're getting more and more information thrown at us each day, and the AIs are adding to that, not reducing it. The ability to summarise dense and specialist information (I'm thinking error logs, but could be anything really) just means more ways for people to access and view that information who previously wouldn't.

How do we, as individuals, best deal with all this information efficiently? Currently we have a variety of interfaces, websites, dashboards, emails, chat. Are all these necessary anymore? They might be now, but what about the next 10 years. Do I even need to visit a companies website if can get the same information from some single chat interface?

The fact we have AIs building us websites, apps, web UI's just seems so...redundant.

sipjca•6h ago

yep I think this is the fundamental question as well, everything else is intermediate

guardiang•6h ago

Every human is different, don't generalize the interface. Dynamically customize it on the fly.

moomoo11•5h ago

I like the smartphone. It’s honestly perfect and underutilized.

AlotOfReading•5h ago

Websites were a way to get authoritative information about a company, from that company (or another trusted source like Wikipedia). That trust is powerful, which is why we collectively spent so much time trying to educate users about the "line of death" in browsers, drawing padlock icons, chasing down impersonator sites, mitigating homoglyph attacks, etc. This all rested on the assumption that certain sites were authoritative sources of information worth seeking out.

I'm not really sure what trust means in a world where everyone relies uncritically on LLM output. Even if the information from the LLM is usually accurate, can I rely on that in some particularly important instance?

hi_hi•3h ago

You raise a good point, and one I rarely see discussed.

I still believe it fundamentally comes down to an interface issue, but how trust gets decoupled from the interface (as you said, the padlock shown in the browser and certs to validate a website source), thats an interesting one to think about :-)

energy123•3h ago

The designers of 6th gen fighter jets are confronting the same challenge. The cockpit, which is an interface between the pilot and the airframe, will be optionally manned. If the cockpit is manned, the pilot will take on a reduced set of roles focused on higher-level decision making.

By the 7th generation it's hard to see how humans will still be value-add, unless it's for international law reasons to keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.

Perhaps interfaces in every domain will evolve this way. The interface will shrink in complexity, until it's only humans describing what they want to the system, at higher and higher levels of abstraction. That doesn't necessarily have to be an English-language interface if precision in specification is required.

darkwater•38m ago

Is this just what you think it might happen or are you directly involved in these decisions and first-hand exposing a challenge?

bravesoul2•11m ago

We will get to the dream of Homer Simpson gorging on donuts and "operating" a nuclear power plant.

thbb123•5m ago

> keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.

It is a little known secret that plenty of defense systems are already set up to dispense of the human in the loop protocol before a fire action. For defense primarily, but also for attack once a target has been designated. I worked on protocols in the 90's, and this decision was already accepted.

It happens to be so effective that the military won't bulge on this.

Also, it is not much worse to have a decision system act autonomously for a kill system, if you consider that the alternative is a dumb system such as a landmine.

roywiggins•6h ago

> The agentic option is a “copilot” — a virtual human who you talk with to get help flying the plane. If you’re about to run into another plane it might yell at you “collision, go right and down!”

Planes do actually have this now. It seems to work okay:

https://en.m.wikipedia.org/wiki/Traffic_collision_avoidance_...

gklitt•5h ago

Author here. I thought about including TCAS in the article but it felt too tangential…

You’re right that there’s a voice alert. But TCAS also has a map of nearby planes which is much more “HUD”! So it’s a combo of both approaches.

(Interestingly it seems that TCAS may predate Weiser’s 1992 talk)

droideqa•6h ago

I want a link to the GitHub for this[0] which he linked to. Makes Prolog quite interesting.

[0]: https://www.geoffreylitt.com/2024/12/22/making-programming-m...

johnisgood•1h ago

Yeah, it looks interesting. I would give it a spin. Maybe then I could finally make a MUD in Prolog. :D

thepuglor•6h ago

I'd rather flip this around and be in a fully immersive environment watching agents do things in such a way that I am able to interject and guide in realtime. How do we build that, and build it in such a way that the content and delivery of my guidance becomes critical to what they learn? The best teacher gets the best AI students.

sitkack•2h ago

> fully immersive environment

Start with a snapshot of what you are envisioning using Blender.

aaron695•6h ago

This is a mess.

The 1992 talk wasn't at all about AI and since then our phones have given us "ubiquitous computing" en masse.

The original talk required no 'artificial intelligence' for relevance which makes it strange to apply to todays artificial intelligence.

The original talk made good points for instance "voice recognition" has been solved forever at a reasonable level, yet people kept claiming if it was 'better' a 'magic experience' would pop out as if voice was different to typing. Idiots have been around for a long time.

Don't get what OP is trying to say.

'AI HUD metaphors' are very hard, that's why they are not ubiquitous, they require constant input. Spellcheck runs every character typed. Agents are because of less $.

'Hallucinations' also make 'AI HUD metaphors' problematic, for spellcheck squiggly red lines would be blinking on off all over the page as a LLM keeps coming back with different results.

precompute•48m ago

FYI, you're shadowbanned.

jpm_sd•6h ago

This is how ship's AI is depicted in The Expanse (TV series) and I think it's really compelling. Quiet and unobtrusive, but Alex can ask the Rocinante to plot a new course or display the tactical situation and it's fast, effective and effortlessly superhuman with no back-talk or unnecessary personality.

Compare another sci-fi depiction taken to the opposite extreme: Sirius Cybernetics products in the Hitchhikers Guide books. "Thank you for making a simple door very happy!"

skydhash•3h ago

I may remember wrongly, but I don't believe the expanse was depicting AI. It was more powerful computation and unobtrusive interfaces. There was nothing like Jarvis. Rocinante was a war vessel and all its feature was tailored to that. I believe even the mechanical suits of the Martians were very manual (no cortana a la Master Chief).

kova12•6h ago

aren't we missing the point that co-pilot is there in case pilot gets incapacitated?

melagonster•4h ago

The pilot is the copilot today. These computer can handle most of the tasks automatically.

benjaminwootton•5h ago

I think there is a third and distinct model which is AI that runs in the background autonomously amd over a long period and pushes things to you.

It can detect situations intelligently, do the filtering, summarisation of what’s happening and possibly a recommendation.

This feels a lot more natural to me, especially in a business context when you want to monitor for 100 situations about thousands of customers.

skydhash•2h ago

Actually defining those situations and collecting the data (which should help identify those situations) are the hard parts. Having an autonomous system that do it has been solved for ages.

ares623•5h ago

It should have a paperclip mascot

stan_kirdey•5h ago

needed to find a kids’ orthodontist. made a tiny voice agent: feed it numbers, it calls, asks about price/availability/insurance, logs the gist.

it kind of worked. the magic was the smallest UI around it:

- timeline of dials + retries

- "call me back" flags

- when it tried, who picked up

- short summaries with links to the raw transcript

once i could see the behavior, it stopped feeling spooky and started feeling useful.

so yeah, copilots are cool, but i want HUDs: quiet most of the time, glanceable, easy to interrupt, receipts for every action.

perching_aix•5h ago

Been thinking about something similar, from fairly grounded ideas like letting a model autogenerate new features with their own name, keybind and icon, all the way to silly ideas, like letting a model synthesize arbitrary shader code and just letting it do whatever within the viewport. Think the entire UI being created on the fly specifically for the task you're working on, constantly evolving in mesh with your workflow habits. Now if only I went beyond being an idea man...

caleblloyd•5h ago

The reason we are not seeing this in mainstream software may also be due to cost. Paying for tokens on every interaction means paying to use the app. Upfront development may actually be cheaper, but the incremental cost per interaction could cost much more in the long term, especially if the software is used frequently and has a long lifetime.

As the cost of tokens goes down, or commodity hardware can handle running models capable of driving these interactions, we may start to see these UIs emerge.

perching_aix•5h ago

Oh yeah, I was 100% thinking in terms of local models.

ag2s•5h ago

A very relevant article from lesswrong, titled Cyborgism https://www.lesswrong.com/s/f2YA4eGskeztcJsqT/p/bxt7uCiHam4Q...

sothatsit•5h ago

AI building complex visualisations for you on-the-fly seems like a great use-case.

For example, if you are debugging memory leaks in a specific code path, you could get AI to write a visualisation of all the memory allocations and frees under that code path to help you identify the problem. This opens up an interesting new direction where building visualisations to debug specific problems is probably becoming viable.

This idea reminds me of Jonathan Blow's recent talk at LambdaConf. In it, he shows a tool he made to visualise his programs in different ways to help with identifying potential problems. I could imagine AI being good at building these. The talk: https://youtu.be/IdpD5QIVOKQ?si=roTcCcHHMqCPzqSh&t=1108

wewewedxfgdf•5h ago

Kind of a weird article because the computer systems that is "invisible" i.e. an integrated part of the flight control systems - is exactly what we have now. He's sort of arguing for .... computer software.

Like, we have HUDs - that's what a HUD is - it's a computer program.

CGamesPlay•4h ago

A HUD is typically non-interactive, which is the core distinction he’s advocating for. The “copilot” responds to your requests, the “HUD” surfaces relevant information passively.

skydhash•3h ago

Isn't this one of the core selling of an IDE compared to a simple editor? You have the analyzer running in the background and telling you things. And a host of buttons around the editor to act on stuff, all within the context of a project.

nioj•4h ago

Concurrent posting from 5 hours ago (currently no comments): https://news.ycombinator.com/item?id=44705018

satisfice•4h ago

Yes, this is a non-creepy way of applying AI.

SilverElfin•4h ago

Isn’t that what all the AI browsers like Comet, or the things like Cluey are trying to do

eboynyc32•4h ago

Excited for the next wave of ai innovation.

clbrmbr•4h ago

A thought-provoking analogy!

What comes immediately to mind for me is using embeddings to show closest matches to current cursor position on the right tab for fast jumping to related files.

henriquegodoy•4h ago

Great post! i've been thinking along similar lines about human-AI interfaces beyond the copilot paradigm. I see two major patterns emerging:

Orchestration platforms - Evolution of tools like n8n/Make into cybernetic process design systems where each node is an intelligent agent with its own optimization criteria. The key insight: treat processes as processes, not anthropomorphize LLMs as humans. Build walls around probabilistic systems to ensure deterministic outcomes where needed. This solves massive "communication problems"

Oracle systems - AI that holds entire organizations in working memory, understanding temporal context and extracting implicit knowledge from all communications. Not just storage but active synthesis. Imagine AI digesting every email/doc/meeting to build a living organizational consciousness that identifies patterns humans miss and generates strategic insights.

just explored more about it on my personal blog https://henriquegodoy.com/blog/stream-of-consciousness

ankit219•4h ago

The current paradigm is driven by two factors: one is the reliability of the models and that constraints how much autonomy you can give to an agent. Second is about chat as a medium which everyone went to because ChatGPT became a thing.

I see the value in HUDs, but only when you can be sure output is correct. If that number is only 80% or so, copilots work better so that humans in the loop can review and course correct - the pair programmer/worker. This is not to say we need ai to get to higher levels of correctness inherently, just that systems deployed need to do so before they display some information on HUD.

psychoslave•4h ago

This is missing the addictive/engaging part of a conversational interface for most people out there. Which is in line with the critics highlighted in the fine article.

Just because most people are fond of it doesn't actually mean it improves their life, goals and productivity.

makaking•3h ago

> "non-copilot form factors that more directly extend the human mind."

I think Cursor's tab completion and next edit prediction roughly fits the pattern, you don't chat, you don't ask or explain, you just do... And the more in coherent your actions are the more useful the HUB becomes.

ravila4•3h ago

I think one key reason HUDs haven’t taken off more broadly is the fundamental limitation of our current display medium - computer screens and mobile devices are terrible at providing ambient, peripheral information without being intrusive. When I launch an AI agent to fix a bug or handle a complex task, there’s this awkward wait time where it takes too long for me to sit there staring at the screen waiting for output, but it’s too short for me to disengage and do something else meaningful. A HUD approach would give me a much shorter feedback loop. I could see what the AI is doing in my peripheral vision and decide moment-to-moment whether to jump in and take over the coding myself, or let the agent continue while I work on something else. Instead of being locked into either “full attention on the agent” or “completely disengaged,” I’d have that ambient awareness that lets me dynamically choose my level of involvement. This makes me think VR/AR could be the killer application for AI HUDs. Spatial computing gives us the display paradigm where AI assistance can be truly ambient rather than demanding your full visual attention on a 2D screen. I picture that this would be especially helpful for help on more physical tasks, such as cooking, or fixing a bike.

elliotec•3h ago

You just described what I do with my ultrawide monitor and laptop screen.

I can be fully immersed in a game or anything and keep Claude in a corner of a tmux window next to a browser on the other monitor and jump in whenever I see it get to the next step or whatever.

ravila4•2h ago

It’s a similar idea, but imagine you could fire off a task, and go for a run, or do the dishes. Then be notified when it completes, and have the option to review the changes, or see a summary of tests that are failing, without having to be at your workstation.

bigyabai•2h ago

I kinda do this today, with Alpaca[0]'s sandboxed terminal runner and GSConnect[1] syncing the response notifications to my phone over LAN.

[0] https://jeffser.com/alpaca/

[1] https://github.com/GSConnect/gnome-shell-extension-gsconnect

darkwater•33m ago

And, out of curiosity, what are the outputs of this agentic work?

Der_Einzige•2h ago

Actual HUDs, like the ones for cars - along with vector displays, and laser display tech in general, is criminally undervalued and underutilized.

radres•2h ago

One of the simplest and best working applications of AI is the gpts living in your clipboard. Any prompt/workflow you have is assignable to a shortcut on demand and it pops up a chat window with response on demand. It's been game changing honestly.

furyofantares•2h ago

Can you elaborate on this?

furyofantares•2h ago

I'm very curious if a toggle would be useful that would display a heatmap of a source file showing how surprising each token is to the model. Red tokens are more likely to be errors, bad names, or wrong comments.

digdugdirk•2h ago

Interesting! I've often felt that we aren't fully utilizing the "low hanging fruit" from the early days of the LLM craze. This seems like one of those ideas.

jama211•2h ago

That’s actually fantastic as an idea

dclowd9901•2h ago

That's a really cool idea. Also the inverse, where suggestions from the AI were similarly heat mapped for confidence would be extremely useful.

WithinReason•2h ago

previously undefined variable and function names would be red as well

ijk•1h ago

I want that in an editor. It's also a good way to check if your writing is too predictable or cliche.

The perplexity calculation isn't difficult; just need to incorporate it into the editor interface.

nextaccountic•1h ago

Even if something is surprising just because it's a novel algorithm, it warrants better documentation - but commenting the code explaining how it works will make the code itself less surprising!

In short, it's probably possible (and it's maybe a good engineering practice) to structure the source such as no specific part is really surprising

It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system

Kichererbsen•1h ago

I often find myself leaving review comments on pull requests where I was surprised. I'll state as much: This surprised me - I was expecting XYZ at this point. Or I wasn't expecting X to be in charge of Y.

teoremma•29m ago

We explored this exact idea in our recent paper https://arxiv.org/abs/2505.22906

Turns out this kind of UI is not only useful to spot bugs, but also allows users to discover implementation choices and design decisions that are obscured by traditional assistant interfaces.

Very exciting research direction!

yurimo•1h ago

I might be wrong but isn't the HUD the author suggesting for coding is basically AREPL? For debugging I can see it work, but chatbox and inline q&a I feel has awider application.

On a wider note, I buy the argument for alternative interfaces other than chat, but chat permeates our lives every day, smartphone is full of chat interfaces. HUD might be good for AR glasses though, literal HUD.

aantix•1h ago

I would love to see a HUD that allows me to see the change that corresponds to Claude Code's TODO item.

I don't want inline comments as those accumulate, don't get cleaned up appropriately by the LLM.

kn81198•1h ago

About a decade back Bret Victor [1] talked about how his principle in life is to reduce the delay in feedback, and having faster iteration cycles not just helps in doing things (coding) better but also contributes to new creative insights. He had a bunch of examples built to showcase alternative ways of coding, which is very close to being HUDs - one example shown in the OP is very similar to the one he presents to "step through time to figure out the working of the code".

[1]: https://www.youtube.com/watch?v=PUv66718DII

piker•1h ago

Absolutely agree, and spellchecker is a great analogy.

I've recently been snoozing co-pilot for hours at a time in VS Code because it’s adding a ton of latency to my keystrokes. Instead, it turns out that `rust_analyzer` is actually all that I need. Go-to definition and hover-over give me exactly what the article describes: extra senses.

Rust is straightforward, but the tricky part may be figuring out what additional “senses” are helpful in each domain. In that way, it seems like adding value with AI comes full circle to being a software design problem.

ChatGPT and Claude are great as assistants for strategizing problems, but even the typeahead value seems to me negligible in a large enough project. My experience with them as "coding agents" is generally that they fail miserably or are regurgitating some existing code base on a well known problem. But they are great at helping config things and as teachers in (the Socratic sense) to help you get up-to-speed with some technical issue.

The heads-up display is the thesis for Tritium[1], going back to its founding. Lawyers' time and attention (like fighter pilots') is critical but they're still required in the cockpit. And there's some argument they always will be.

[1] https://news.ycombinator.com/item?id=44256765 ("an all-in-one drafting cockpit")

Animats•1h ago

Nobody mentioned Manna [1] yet? That suggests a mostly audio headset giving orders. There is a real-world version using AR glasses.[2]

[1] https://marshallbrain.com/manna1

[2] https://www.six-15.com/vision-picking

utf_8x•1h ago

You know that feature in JetBrains (and possibly other) IDEs that highlights non-errors, like code that could be optimized for speed or readability (inverting ifs, using LINQ instead of a foreach, and so on)? As far as I can tell, these are just heuristics, and it feels like the perfect place for an “AI HUD.”

I don’t use Copilot or other coding AIs directly in the IDE because, most of the time, they just get in the way. I mainly use ChatGPT as a more powerful search engine, and this feels like exactly the kind of IDE integration that would fit well with my workflow.

obscure-enigma•1h ago

> design the cockpit so that the human pilot is naturally aware of their surroundings.

This is a design interface problem. Self-driving cars can easily ingest this HUD. This is the reason what makes Apple's AI different from other microservice-like AI. The spell checker, rewrite, proofread are naturally integrated into the UI to the extent it doesn't feel like AI powered operations.

iddan•1h ago

This is what we are going for in Closer - an AI native experience that extends the mind of salespeople. https://closer.so

Oras•1h ago

Great post, informative and precise.

I think the challenge is primarily the context and intent.

The spellchecker knows my context easily, and there is a setting to choose from (American English, British English, etc.), as well as the paragraphs I'm writing. The intent is easy to recognise. While in a codebase, the context is longer and vaguer, the assistant would hardly know why I'm changing a function and how that impacts the rest of the codebase.

However, as the article mentions, it may not be a universal solution, but it's a perspective to consider when designing AI systems.

thinkingemote•1h ago

We also need what goes along with HUDs : switches, nubs, switches, dials. Actual controls.

Although we are talking HUDs, I'm not really talking about UI widgets having the good old skew-morphism or better buttons. In the cockpit the pilot doesn't have his controls on a touch screen, he has an array of buttons and dials and switches all around him. It's these controls that are used in response to what the pilot sees on the HUD and it's these controls that change the aircraft according to the pilots will, which in turn change what the HUD shows.

TimCTRL•57m ago

makes sense, i guess tools like cluely are HUDs

barrenko•1m ago

Yup, not affiliated, but this is what cluely kinda does -

https://nitter.poast.org/im_roy_lee/status/19387190060029217...

_samjarman•54m ago

I'd love to mount a top-down webcam above me when I work on electronics and have a real time feed into ChatGPT or similar for Q&A, for a real time coach that can see. One day... :)

precompute•48m ago

HUDs are primarily of use to people that are used to parsing dense visual information. Wanting HUDs on a platform that promises to do your work for you is quite pointless.

afro88•48m ago

A HUD is an even more "confident" display of data than text though. What do you do with a HUD that hallucinates? Is there a button on each element that shows you sources?

Dilettante_•43m ago

I could imagine a system in which the model only chooses which data points to show at what time, but the actual passing is still handled by good old deterministic programming.

alwa•8m ago

Night-vision optics come to mind: prone to noise and visual artifacts, and especially strange under certain edge conditions—but an operator learns to intuit which aspects to trust and which to double-check. The fact that it’s an “extra sense” can outweigh the fact that it’s not a perfect source of truth, right? Trust the tech where it proves useful to you, and find ways to compensate (or outright don’t use it) where it’s not.

simianwords•22m ago

I believe this is the framework vs library debate but in a different context. HUD is like me using a library of AI enabled tools. The human agency is given more importance here - they decide how to use the tools to suit their purpose.

Copilot is more like a framework where an AI system exists which tells me what to do (a bit like the inverse of a library).

amelius•13m ago

I want an AI HUD that blocks ads.

Enough AI copilots, we need AI HUDs

Big agriculture mislead the public about the benefits of biofuels

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork

Dumb Pipe

Blender: Beyond Mouse and Keyboard

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

The Meeting Culture

How I fixed my blog's performance issues by writing a new Jekyll plugin

Multiplex: Command-Line Process Mutliplexer

I hacked my washing machine

Software Development at 800 Words per Minute

Making Postgres slower

EU age verification app to ban any Android system not licensed by Google

Claude Code Router

ZUSE – The Modern IRC Chat for the Terminal Made in Go/Bubbletea

Formal specs as sets of behaviors

Ask HN: What are you working on? (July 2025)

Solid protocol restores digital agency

VPN use surges in UK as new online safety rules kick in

Why I write recursive descent parsers, despite their issues (2020)

The JJ VCS workshop: A zero-to-hero speedrun

“Tivoization” and your right to install under Copyleft and GPL (2021)

IBM Keyboard Patents

Digitising CDs (a.k.a. using your phone as an image scanner)

Fourble turns lists of MP3 files hosted anywhere into podcasts

Designing a flatpack bed

How big can I print my image?

Bits 0x02: switching to orion as a browser

Tom Lehrer has died

Why does a fire truck cost $2m

Enough AI copilots, we need AI HUDs

Big agriculture mislead the public about the benefits of biofuels

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork

Dumb Pipe

Blender: Beyond Mouse and Keyboard

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

The Meeting Culture

How I fixed my blog's performance issues by writing a new Jekyll plugin

Multiplex: Command-Line Process Mutliplexer

I hacked my washing machine

Software Development at 800 Words per Minute

Making Postgres slower

EU age verification app to ban any Android system not licensed by Google

Claude Code Router

ZUSE – The Modern IRC Chat for the Terminal Made in Go/Bubbletea

Formal specs as sets of behaviors

Ask HN: What are you working on? (July 2025)

Solid protocol restores digital agency

VPN use surges in UK as new online safety rules kick in

Why I write recursive descent parsers, despite their issues (2020)

The JJ VCS workshop: A zero-to-hero speedrun

“Tivoization” and your right to install under Copyleft and GPL (2021)

IBM Keyboard Patents

Digitising CDs (a.k.a. using your phone as an image scanner)

Fourble turns lists of MP3 files hosted anywhere into podcasts

Designing a flatpack bed

How big can I print my image?

Bits 0x02: switching to orion as a browser

Tom Lehrer has died

Why does a fire truck cost $2m

Enough AI copilots, we need AI HUDs

Comments