frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Rethinking CLI interfaces for AI

https://www.notcheckmark.com/2025/07/rethinking-cli-interfaces-for-ai/
97•Bogdanp•5h ago

Comments

procone•4h ago
Why rethink tools that have existed since the 70s and function predictably for a landscape that drastically shifts every two months? Seems shortsighted to me.
withzombies•4h ago
Many of the changes that would work for LLMs would also be beneficial to users.
procone•4h ago
I'm sorry, but no. The tools work. I don't need "more context" from my `less` or `more` commands. The LLM can train on the man pages just as a human can read the man pages.
esafak•3h ago
What man page? I have never worked on a product with one. We're not teaching the LLM how to use `ls`; we are talking about the code being written today.

edit: Mea culpa.

lucianbr•2h ago
> I think watching the agents use our existing command line utilities get confused and lost is a strong indicator that the information architecture of our command line utilities is inadequate.

Seems pretty clear the article is talking about teaching LLMs how to use 'ls'.

withzombies•1h ago
> The agents may benefit from some training on tools available within their agents. This will certainly help with the majority of general CLI tools, there are bespoke tools that could benefit from adapting to LLMs.

Definitely not just `ls`.

skydhash•3h ago
Not at all. The shell already provide us ways to get contextual information (PS1, ...). And the commands generally provides error message or error code.

In one of the example provided:

  $ sdfsdf
  zsh: command not found: 'sdfsdf'
  zsh: current directory is /Users/ryan
  zsh: Perhaps you meant to run: cd agent_directory; sdfsdf
You could just use `pwd`, like most people that put the current directory in the $PS1 to make sure that the agent stays in the correct directory.
jacobr1•3h ago
Yeah, this example isn't great - you can just tell the llm to run pwd more frequently or something.

But for the `$command | head -100` example, the usage is a bit different. I run into this myself on the cli, and often ended up using `less` in similar context.

Two cases

1) sometimes I use head to short circuit a long running, but streaming output, command so I just assess if it is starting to do the right thing but not bear the time/computational cost of full processing

2) sometimes the timing doesn't matter but the content is too verbose, need to see some subset of the data. But here head is too limited. I need something like wc & head and maybe grep in one command line with context. Maybe something like

$command | contextual-filter -grepn 5 -grep error -head 10

some data ... first the first 10 lines ... an error message with 5 lines of context surrounding before and after

Summary: 100000 total lines 15 printed exited with code 0

You can do all that already with grep and others, but you need to run multiple commands to get all the context

skydhash•2h ago
1) That's why some tools have a simulate option, or you can just do a kill 9 on the processes you've just launched. Just make sure you've capture their output in a file

2) Again logs, if actions needs to be taken after the command has stopped. For immediate action, you can use `tee`.

Managing context isn't hard. I see more issues with ensuring the right command.

psifertex•4h ago
IDEs have changed a lot in the last 50 years. Just like we shouldn't advocate for hand writing assembly for all code, we shouldn't be stuck using CLI tooling the same way.

I share your apprehension regarding the current AI landscape changing so quickly it causes whiplash but I don't think a mindset of "it's been fine for 50 years" is going to survive the pace of development possible by better LLM integration.

skydhash•4h ago
The reason that tools have not changed that much is that our needs haven't changed that much either. Even something like `find` or `ffmpeg`, while complex, are not that complicated to use. They just require you to have a clear idea of what you want. And the latter is why most people advocating for LLMs want to avoid.

IDEs have not changed that much. They've always been an editor superchaged with tools that will all share the same context of a "project". And for development, it's always been about navigation (search and goto), compile/build, and run.

ivape•4h ago
I also feel like command line agents are pretty simple. It's tailor made for tool-use.

while(true):

>> User requests something

<< The LLM picks a cli tool from an index

<< LLM grabs the manual for the tool to get the list of commands

<< Attempts to fulfill the request

I would not be shocked if engineers have already managed to overcomplicate these agents.

kjkjadksj•3h ago
You can pretty much obviate that with an alias that catches the user requesting something then operates deterministically. What is nice about aliases is you don’t need to learn other peoples semantic patterns, you craft ones that make sense to you and your use cases then they always work and consume virtually no resources to work.
swax•4h ago
Throwing this out there, I have a command line driver for LLMs. Lots of little tricks in there to adapt the CLI to make it amiable for LLMs. Like interrupting a long running process periodically and asking the LLM if it wants to kill it or continue waiting. Also allowing the LLM to use and understand apps that use the alternate screen buffer (to some degree).

Overall I try to keep it as thin a wrapper as I can. The better the model, the less wrapper is needed. It's a good way to measure model competence. The code is here https://github.com/swax/NAISYS and context logs here for examples - https://test.naisys.org/logs/

I have agents built with it that do research on the web for content, run python scripts, update the database, maintain a website, etc.. all running through the CLI, if it calls APIs then it does it with curl. Example agent instructions here: https://github.com/swax/NAISYS/tree/main/agents/scdb/subagen...

skydhash•3h ago
The one thing that I always wonder is how varied are those interactions with an agent. My workflow is is enough of a routine that I just write scripts and create functions and aliases to improve ergonomics. Anything that have to do with interacting with the computer can be automated.
swax•1h ago
Yea a lot of this is experimental, I basically have plain text instructions per agent all talking to each other, coordinating and running an entire pipeline to do what would typically be hard coded. There’s definite pros and cons, a lot of unpredictability of course, but also resilience and flexibility in the ways they can work around unexpected errors.
malcolmgreaves•4h ago
> We need to augment our command line tools and design APIs so they can be better used by LLM Agents.

lol no. The right way to get a program to interact with another program is through an API

vidarh•4h ago
The command line tools are also APIs.

We don't necessarily need to replace the versions humans use - though some of the changes might well make tools better for humans too - but e.g. most of the tools I add for my coding agent are attempts at coaxing it to avoid doing things like e.g. the "head" example in the article.

malcolmgreaves•3h ago
That’s just evidence that these sophisticated next token predictors are not good enough yet. The works should not bend over backwards to accommodate a new tool. The new tool needs to adapt to the world. Or only be used in the situations where it is appropriate. This is one of the problems of calling LLMs AI: a language model lacks understanding.
troupo•4h ago
The answer isn't "let's tear up and redo our tools for the hope that it will benefit black boxes with non-deterministic output".

The answer is:

- make non-deterministic black boxes more deterministic and less black boxes

- improve tools for humans

dkdcio•4h ago
Fortunately improving tools for humans tends to improve them for the non-deterministic black boxes too
troupo•3h ago
It's a feeling that really hasn't been tested or verified (like any other feeling when it comes to LLMs)
jerpint•4h ago
I’ve been building a context-engineering tool for collaborating with LLMs. The CLI is for the human and the MCP is for the LLM, but they all map to the same core commands

https://github.com/jerpint/context-llemur

I’ve actually bootstrapped ctx with ctx and found it very useful !

It basically stops me from having to repeat myself over and over to different agents

chubot•4h ago
I do think it’s interesting how Claude Code makes shell and dev automation more important – it also makes testing and code review more important

So there is probably some room for innovation here

But most of these seems like problems with Claude (and maybe fundamental problems with LLMs), not problems with the CLI interface:

This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying Edit(.git/hooks/pre-commit) to my project’s .claude/settings.json. I look forward to its next lazy innovation.

If you watch Claude Code, you’ll see that it often uses head -n100 to limit the results apriori. It also gets lost about which directory it’s in, and it will frustratingly flail around trying to run commands in different directories until it finds the right one.

kordlessagain•3h ago
Agree with the whack-a-mole effect, where it goes from nailing the problem or bug to absolutely destroying the code. I would offer some of these MCP tools I wrote/had written to solve the problem: https://github.com/kordless/gnosis-evolve. Tools are in contrib-tools.

It has helped tremendously having a dedicated build service that CC can control through MCP vs running Docker itself because it can then restart the container and test. And, the fuzzy search tool and diff editor seem to perform better than the replacement strategy Claude Code uses, most of the time. I continue to work on the editor when I run into issues with it, so happy to help anyone interested in implementing their own file editing (and search) strategy.

You will need to convert these to Claude Code format, but all you need to do is ask CC to do it for you...

pbronez•2h ago
Your license is interesting. To meet your intent, I suggest you revisit this definition:

> “Military Entity" means any armed forces branch, defense department, or military organization of any nation or alliance.

As written, this only applies to nation states. It excludes many kinds of human organizations that use force to impose their will on others. The word for this is “Terrorist.”

While that term has been applied to many groups for many reasons, it technically means “the use of violence against non-combatants to achieve political or ideological aims.”

If you add Terrorism + Nation-State Militaries should cover most everyone you intend here, including organized crime and private military contractors. You could add “financial gain” to the definition if you want to ensure those last two are captured.

com2kid•3h ago
I'd argue that many CLI tools output too much log spew by default and rely on making humans take up the burden of parsing through masses of output to find the one useful line.

For another example of where this is a problem, look at any large company that pays to keep logs in kibana, the amount of over logging paid for is insane.

Approximately 1/3rd of my Claude code tokens are spent parsing CLI output, that is insane! Often Claude doesn't catch what it needs in the massive log outputs and I have to go through the logs myself to find the problem and point it out to Claude.

pjmlp•4h ago
I rather see improvements in voice control and hand writing, as means of communication.
kjkjadksj•4h ago
“Rethinking command line interface interfaces with AI” I would have expected nothing less from an airticle.
tempodox•3h ago
Language is just too hard a challenge.
ascorbic•3h ago
I just built a library designed to help with part of this: detecting if a tool is being run in one of these environments. That would allow it to, for example, run in non-interactive mode or give extra context in logs.

https://github.com/ascorbic/am-i-vibing

yoavm•3h ago
Or you can give your AI agent access to your terminal. I've been using https://github.com/hiraishikentaro/wezterm-mcp/ with gemini-cli and it generally allows it to use the terminal like I would, so stuff like scrolling inside interactive TUIs etc more-or-less just works.
j45•3h ago
Appreciate the share!

I might give access to a terminal in a locked down VM, I don't know about a shell.

SoftTalker•3h ago
At least give it its own login (and no sudo privileges).
badlibrarian•3h ago
Step 1: a no-excuses, never-fails undo.
esafak•3h ago
Cursor has checkpoints. Good idea.

https://docs.cursor.com/agent/chat/checkpoints

theodric•3h ago
You can give AI access to your terminal, dude. I'm fine over here, thanks.
Msurrow•3h ago
> This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying […]

When will people acknowledge that LLMs are stochastic text generators?

This whole blog reads like trying to fit a square into a round hole. And frankly most of the comments in this thread is jumping right on the wagon “what water?”-style [1]

By all means use LLMs for what they can be useful for but god damnit when they are not useful please acknowledge this and stop trying to make everything a nail for the LLM-hammer.

LLMs are. not. intelligent. They don’t have a work ethic that says “oh maybe skipping tests is bad”. If they generate output that skips tests it’s because a high enough part of the training data contained that text sentence.

[1] fish joke

lucianbr•2h ago
The whack-a-mole thing is a huge "this thing is not useful" indicator to me, and I am really confused how other people don't see it. Ok, there's an agent and the agent is able to figure out stuff and do stuff on its own. Great. But it's trying to cheat and instead of doing what I'm asking it just tries to go the easiest fastest way to claim "job done". How is that useful? If I had an intern do this I would seriously consider getting rid of them.

This is elementary school stuff. Do the assignment, don't cheat. Does useful software get written by people who don't understand this basic fact?

AchintyaAshok•3h ago
I think part of this is that we're in a transition phase. The shell cmds we have built (for example) were built for human consumption (ex. manpages). They were built around the expectation that we learn how to use it through experimentation or were taught by more knowledgable peers. In the AI world, we basically need to assume that role of the guide / sherpa for the LLM.

Another idea that I've been thinking about is context hierarchy:

Low -> High Utility

Base (AI reads tool desc/manpage,etc.) > General human advice (typically use grep this way, etc.) > Specific advice (for this project / impl this is how you use the tool).

Currently the best interface to provide our insights are via MCPs. At https://toolprint.ai/ we're building a human (or machine) driven way supplement that knowledge around tool-use to Claude/Cursor, etc.

A practical way in which we dogfood our own product is with the Linear MCP. If you connect that and ask an agent to create a new issue, it predictably fails because there's no instructions on which linear project to select or the correct way to provide a description around Linear's quirks. When we connect the linear mcp via the toolprint mcp, it gets pre-primed context around these edge cases to improve tool use.

skydhash•1h ago
The shell is an interface. The computer is the tool. Then we find that we have workflows that are actually routine. And we create scripts to handle them. Then we find that they are contextual, and we created task runners to provide the context.Then our mental capacity is freed, while the computer takes care of the menial stuff. And everything is good.

That is generally how it goes for power users. And people that takes the time to RTFM.

But now, I see people that don't want to determine their workflows. It's just ad-hoc use, spending their time on local decisions that doesn't matter that much instead of grasping the big picture and then solves it. Maybe it helps them looks busy.

So I don't want an agent for Linear. What I want is maybe a bash alias "isc" (for "issue create"), that pops up nano where I write the issue in git commit format (title + blank line + description). Upon saving, the rest is done automatically, because it can determines the project based on some .trackerrc I put in the root of the project. Or maybe a "linear-issues" emacs command and a transient interface (again, the correct project can be determined automatically).

jonplackett•3h ago
Will this just be solved by agents being multimodal and using a computer in a more human way - context is a solved UI problem, by the GUI. The GUI just lacks power - but an AI could just have access to both.
anthk•3h ago
Emacs it's the cli/tui rethinked.
duncanfwalker•3h ago
The --no-verify example is interesting because I can imagine the same hint being useful for junior engineers. In general it's hard to give the the right level of advice in cli docs because you don't always know who the consumer will be and so what knowledge can be assumed. The think that makes LLMs different is that there's no problem to being verbose in the docs because you're not wasting any human's time. It would be cool if you could docs that provide extra advice like in the example and then the interface adapted for the users context - for LLMs provide everything, for human users learn what they know give them just the right level of advice
jasonriddle•3h ago
This is a great post, thank you for sharing. I like the idea of giving hints to the LLMs.

To clarify, the example that was provided using `command_not_found_handler`, is that possible to implement in bash? Or perhaps you were saying this would be a nice to have if this functionality existed?

withzombies•3h ago
The `command_not_found_handler` can be added to your .zshrc or .bashrc as is.
esafak•3h ago
The complicate GUI is simply a visualized version of CLI utilities of the day, which were no less complicated.

I was thinking about this just the other day, and there was one from the late 80s that had scores of parameters, but I could not remember its name. I think it was an `ls` type utility.

kristopolous•3h ago
Just yesterday I updated a tool to parse and snip sections of manpages I made in 2020 to have an LLM ingestion feature for fitting partial manpages into tight context windows (https://github.com/day50-dev/Mansnip).

There may be something more generalizable here.

tomrod•3h ago
That's pretty cool, man (pun intended).
layer8•3h ago
If I can learn how to use the Bulk Rename Utility (it’s actually quite useful once you get to grips with it), then AI should be able to as well. ;)

There’s the saying that computers should adapt to us, rather than the other way around, but now this makes me wonder which side LLMs are on in that picture.

BrianCripe•2h ago
Agree 100% that CLI interface design needs to be altered to include AI Agents as a new type of user persona, but I don't think it's as drastic of a change as one might expect.

We designed Desktop GUI & Web Browsers on top of the terminal to allow a type of user to interact without speaking "lower level" commands, but we've also created abstractions to hide complexity for ourselves at this layer. We just so happen to call them CLI Apps, Scripts, Makefile targets, Taskfile tasks, Justfile recipes, unix tools, etc. It consists of a pseudo-natural language short-code name combined with schema-validated options and some context around what each option does (via the --help view). The trick is how do we optimize for both human developers and AI Agents to have access to the same tools but in the optimized interface for each.

In an experiment to let my agents share the exact same 'tools' that I do for developing in a repository, I gave it direct access to load and self-modify the local project Justfile via MCP: https://github.com/toolprint/just-mcp

Just as (pun intended) I create tools for myself to repeat common tasks with sane defaults and some parameters, my agents immediately gain the same access and I can restrict permissions to use these instead of ANY bash command (IE: "Bash(just:*)"). The agent can also assist in creating tools for me or itself to use that would save on time and token usage. I'd love to see the paradigm evolve to the point it feels more like warp.dev where you don't have to switch between two text boxes to choose whether you're talking in natural language or instructing to run a known 'tool'.

skydhash•2h ago
Interfaces and tools are orthogonal. It's like a hammer. The head is what is used on the nail. While the handle is shaped to fit the human hand. We can modify one without modifying the other. Another good example is Magit (or Lazygit) and git. Magit is designed to be used interactively, while is more about the domain of version control.

Workflows are humans processes, what we do is naming them and identify their parameters. The actual tools to implement those workflows don't matters that much at a human scale other than cognitive load. So I don't care much about gcc various options. What I want is `make debug` or `make release` (or just `make`). And cognitive load is lowered because I only have these to remember and they are deterministic.

Agent is not a good bridge between humans and tools. Because they increase cognitive load, while all the interface have been about lowering it. There's no "make test" and have a nice output of all the lines that have been flagged (and have some integration like Vim's quickfix which can quickly bring you to each line). Instead it's typing a lot and praying that it actually do something good.

BrianCripe•45m ago
I don't think I disagree with you here, but I'm not sure I fully understand your position.

I agree that if the human is "driving", they should be able to use the Tool directly (IE: make test). If you put an agent in the middle and ask it "please run make test" that's just silly and costs extra for no benefit.

Where you get benefit is if you design tools like "just test" as an MCP tool called "mcp__just-mcp__test" and give a fully-autonomous agent instructions like: "Whenever you feel like you've completed a task, run mcp__just-mcp__test and fix errors and warnings until it passes, then you may commit changes locally". LLM's have 'congitive load' as well, so why not offload the deterministic logic to Tools in the same way we do?

graphememes•2h ago
Ironically, I really like bulk rename utility, it's quite nice
gtirloni•2h ago
Somehow a whole industry is now fine with Heisenbugs being a regular part of the dev workflow.
fullstackwife•2h ago
the salary raise and promo project industry within large corps is fine with that

there is everyone else who is supposed deliver software that works, like always, and they are not fine with built-in flakiness

Zezima•56m ago
command line interface interface
elitan•56m ago
I had the same thought when I was dog fooding a CLI tool I've been vibe coding. It's a CLI for deploying Docker apps on any server. An here is the exact PR "I" did.

https://github.com/elitan/lightform/pull/35

One of the advantages of vibe coding CLI tools like this is that it's easy for the AI to debug itself, and it's easy to see where it gets stuck.

And it usually gets stuck because of:

1. Errors 2. Don't know what command to run

So:

1. Errors must be clear and *actionable* like `app_port` ("8080") is a string, expected a number. 2. The command should have simple, actionable and complete help (`--help`) sections for all commands.

jiehong•27m ago
Sounds like it applies to humans using CLIs as well.
sly010•35m ago
Sorry for the snark, but we couldn't even do this for humans, but let's do it for poor poor LLMs? It's kind of ironic that NOW is the time we worry about usability. What happened to RTFM?

The robo-bunny taking on invasive pythons in Florida

https://www.palmbeachpost.com/story/news/2025/07/11/python-challenge-robot-bunny-new-weapon-to-fight-invasive-in-florida/84535055007/
1•domofutu•47s ago•0 comments

Quantum computing could be humanity's biggest breakthrough since fire

https://www.aol.com/quantum-computing-fire-no-seriously-190420695.html
1•Bluestein•1m ago•0 comments

Show HN: Project 5QL, a different approach to working with SQL

https://5ql.site
2•SophieBroderick•3m ago•0 comments

Asyncio Fundamentals – A Conceputal Overview

https://github.com/anordin95/a-conceptual-overview-of-asyncio/blob/main/readme.md
1•anordin95•5m ago•0 comments

My Two Cents on Abundance

https://josephheath.substack.com/p/my-two-cents-on-abundance
1•paulpauper•10m ago•0 comments

4. Boxing Day: Unwrapping the Mind

https://blog.phenomenal.ink/states-of-mind/
1•paulpauper•10m ago•0 comments

Book Review: Arguments About Aborigines

https://www.astralcodexten.com/p/book-review-arguments-about-aborigines
1•paulpauper•11m ago•0 comments

Reversing a Fingerprint Reader Protocol (2021)

https://blog.th0m.as/misc/fingerprint-reversing/
1•thejj100100•12m ago•0 comments

A major AI training data set contains millions of examples of personal data

https://www.technologyreview.com/2025/07/18/1120466/a-major-ai-training-data-set-contains-millions-of-examples-of-personal-data/
2•belter•17m ago•0 comments

ChatGPT Is Changing the Words We Use in Conversation

https://www.scientificamerican.com/article/chatgpt-is-changing-the-words-we-use-in-conversation/
1•bdev12345•18m ago•0 comments

Quantum internet gives new insights into Einstein's relativity

https://cosmosmagazine.com/science/physics/quantum-internet-einstein-relativity/
1•Bluestein•19m ago•0 comments

Just Say No to Overcomplicated Cars

https://fossforce.com/2025/07/just-say-no-to-overcomplicated-cars/
4•dxs•19m ago•0 comments

Rocket engine designed by generative AI just completed its first hot fire test

https://www.pcgamer.com/hardware/this-aerospike-rocket-engine-designed-by-generative-ai-just-completed-its-first-hot-fire-test/
2•Bluestein•20m ago•0 comments

Ask HN: What is your Tech Stack?

1•jerawaj740•23m ago•1 comments

MIPS – The hyperactive history and legacy of the pioneering RISC architecture

https://thechipletter.substack.com/p/mips
2•rbanffy•25m ago•0 comments

Anukari working better on some Radeon chips

https://anukari.com/blog/devlog/working-better-on-some-radeon-chips
1•humbledrone•26m ago•0 comments

The perfect cross platform framework

1•miguellima•27m ago•0 comments

Show HN: I built a simple study app and got 60 users so far:')

https://apps.apple.com/us/app/noggn-ai/id6747649185
1•iboshidev•30m ago•0 comments

How Albert Camus Found Solace in the Absurdity of Football

https://www.mmowen.me/camus-absurd-love-of-football
1•decafquest•30m ago•0 comments

Perl Versioning Scheme and Gentoo

https://wiki.gentoo.org/wiki/Project:Perl/Version-Scheme
1•RGBCube•30m ago•0 comments

A Survey of Context Engineering for Large Language Models

https://arxiv.org/abs/2507.13334
1•amirkabbara•40m ago•0 comments

Show HN: A database specialized in Event Sourcing

https://www.thenativeweb.io/products/eventsourcingdb
1•goloroden•42m ago•0 comments

Ask HN: Where is Git for my Claude Code conversations?

2•lil-lugger•42m ago•2 comments

New York halts offshore wind transmission plan amid federal uncertainty

https://www.reuters.com/business/energy/new-york-halts-offshore-wind-transmission-plan-amid-federal-uncertainty-2025-07-17/
3•geox•46m ago•0 comments

Show HN: FishSonar – Real-Time Crypto "Fish" Detector for Binance

https://github.com/swampus/FishSonar
1•swampus•48m ago•0 comments

Life on Venus: Verve Mission Aims for Answers

https://www.universetoday.com/articles/uk-is-considering-a-mission-to-venus-to-search-for-life
1•rbanffy•49m ago•0 comments

Tech CEO caught with company's HR head on Coldplay kiss cam resigns

https://www.theguardian.com/us-news/2025/jul/19/coldplay-couple-ceo-andy-byron-resigns
2•vinni2•49m ago•0 comments

TSMC's quarterly sales hit a record $30B – chipmaker plans over 15 new fabs

https://www.tomshardware.com/tech-industry/semiconductors/tsmc-to-build-over-15-new-fabs-in-the-coming-years-as-quarterly-sales-hit-usd30-billion-on-ai-demand
2•rbanffy•49m ago•0 comments

The role of metabolism in shaping enzyme structures over 400M years

https://www.nature.com/articles/s41586-025-09205-6
3•PaulHoule•51m ago•0 comments

Say No to Gnulib

https://rgbcu.be/blog/no-gnulib/
1•RGBCube•51m ago•0 comments