Rethinking CLI interfaces for AI

https://www.notcheckmark.com/2025/07/rethinking-cli-interfaces-for-ai/

203•Bogdanp•6mo ago

Comments

procone•6mo ago

Why rethink tools that have existed since the 70s and function predictably for a landscape that drastically shifts every two months? Seems shortsighted to me.

withzombies•6mo ago

Many of the changes that would work for LLMs would also be beneficial to users.

procone•6mo ago

I'm sorry, but no. The tools work. I don't need "more context" from my `less` or `more` commands. The LLM can train on the man pages just as a human can read the man pages.

esafak•6mo ago

What man page? I have never worked on a product with one. We're not teaching the LLM how to use `ls`; we are talking about the code being written today.

edit: Mea culpa.

lucianbr•6mo ago

> I think watching the agents use our existing command line utilities get confused and lost is a strong indicator that the information architecture of our command line utilities is inadequate.

Seems pretty clear the article is talking about teaching LLMs how to use 'ls'.

withzombies•6mo ago

> The agents may benefit from some training on tools available within their agents. This will certainly help with the majority of general CLI tools, there are bespoke tools that could benefit from adapting to LLMs.

Definitely not just `ls`.

skydhash•6mo ago

Not at all. The shell already provide us ways to get contextual information (PS1, ...). And the commands generally provides error message or error code.

In one of the example provided:

  $ sdfsdf
  zsh: command not found: 'sdfsdf'
  zsh: current directory is /Users/ryan
  zsh: Perhaps you meant to run: cd agent_directory; sdfsdf

You could just use `pwd`, like most people that put the current directory in the $PS1 to make sure that the agent stays in the correct directory.

jacobr1•6mo ago

Yeah, this example isn't great - you can just tell the llm to run pwd more frequently or something.

But for the `$command | head -100` example, the usage is a bit different. I run into this myself on the cli, and often ended up using `less` in similar context.

Two cases

1) sometimes I use head to short circuit a long running, but streaming output, command so I just assess if it is starting to do the right thing but not bear the time/computational cost of full processing

2) sometimes the timing doesn't matter but the content is too verbose, need to see some subset of the data. But here head is too limited. I need something like wc & head and maybe grep in one command line with context. Maybe something like

$command | contextual-filter -grepn 5 -grep error -head 10

some data ... first the first 10 lines ... an error message with 5 lines of context surrounding before and after

Summary: 100000 total lines 15 printed exited with code 0

You can do all that already with grep and others, but you need to run multiple commands to get all the context

skydhash•6mo ago

1) That's why some tools have a simulate option, or you can just do a kill 9 on the processes you've just launched. Just make sure you've capture their output in a file

2) Again logs, if actions needs to be taken after the command has stopped. For immediate action, you can use `tee`.

Managing context isn't hard. I see more issues with ensuring the right command.

psifertex•6mo ago

IDEs have changed a lot in the last 50 years. Just like we shouldn't advocate for hand writing assembly for all code, we shouldn't be stuck using CLI tooling the same way.

I share your apprehension regarding the current AI landscape changing so quickly it causes whiplash but I don't think a mindset of "it's been fine for 50 years" is going to survive the pace of development possible by better LLM integration.

skydhash•6mo ago

The reason that tools have not changed that much is that our needs haven't changed that much either. Even something like `find` or `ffmpeg`, while complex, are not that complicated to use. They just require you to have a clear idea of what you want. And the latter is why most people advocating for LLMs want to avoid.

IDEs have not changed that much. They've always been an editor superchaged with tools that will all share the same context of a "project". And for development, it's always been about navigation (search and goto), compile/build, and run.

ivape•6mo ago

I also feel like command line agents are pretty simple. It's tailor made for tool-use.

while(true):

>> User requests something

<< The LLM picks a cli tool from an index

<< LLM grabs the manual for the tool to get the list of commands

<< Attempts to fulfill the request

I would not be shocked if engineers have already managed to overcomplicate these agents.

kjkjadksj•6mo ago

You can pretty much obviate that with an alias that catches the user requesting something then operates deterministically. What is nice about aliases is you don’t need to learn other peoples semantic patterns, you craft ones that make sense to you and your use cases then they always work and consume virtually no resources to work.

swax•6mo ago

Throwing this out there, I have a command line driver for LLMs. Lots of little tricks in there to adapt the CLI to make it amiable for LLMs. Like interrupting a long running process periodically and asking the LLM if it wants to kill it or continue waiting. Also allowing the LLM to use and understand apps that use the alternate screen buffer (to some degree).

Overall I try to keep it as thin a wrapper as I can. The better the model, the less wrapper is needed. It's a good way to measure model competence. The code is here https://github.com/swax/NAISYS and context logs here for examples - https://test.naisys.org/logs/

I have agents built with it that do research on the web for content, run python scripts, update the database, maintain a website, etc.. all running through the CLI, if it calls APIs then it does it with curl. Example agent instructions here: https://github.com/swax/NAISYS/tree/main/agents/scdb/subagen...

skydhash•6mo ago

The one thing that I always wonder is how varied are those interactions with an agent. My workflow is is enough of a routine that I just write scripts and create functions and aliases to improve ergonomics. Anything that have to do with interacting with the computer can be automated.

swax•6mo ago

Yea a lot of this is experimental, I basically have plain text instructions per agent all talking to each other, coordinating and running an entire pipeline to do what would typically be hard coded. There’s definite pros and cons, a lot of unpredictability of course, but also resilience and flexibility in the ways they can work around unexpected errors.

DeepYogurt•6mo ago

> It's a good way to measure model competence.

Can you elaborate?

swax•6mo ago

Sure, so you tell the model, here's a command prompt, what do you type next? Ideally, it types commands, but a lesser model may just type what it's thinking which is invalid. You can give it an out with a 'comment' command, but some models will forget about that. The next biggest problem is fake output; it types not just 'cat file.txt' but the following command prompt and fake output for the file.

The biggest mark of intelligence is can it continue a project long-term over multiple contexts and sessions. Like, 'build me a whole website to do x', many AIs are very good at one-shotting something, but not continually working on the same thing, maintenance, and improvement. Basically, after that good one shot, the AI starts regressing, the project gets continually broken by changes, and the architecture becomes convoluted.

My plan is not to change NAISYS that much; I'm not going to continually add crutches and scaffolding to handhold the AI; it's the AI that needs to get better, and the AI has improved significantly since I mostly finished the project last year.

malcolmgreaves•6mo ago

> We need to augment our command line tools and design APIs so they can be better used by LLM Agents.

lol no. The right way to get a program to interact with another program is through an API

vidarh•6mo ago

The command line tools are also APIs.

We don't necessarily need to replace the versions humans use - though some of the changes might well make tools better for humans too - but e.g. most of the tools I add for my coding agent are attempts at coaxing it to avoid doing things like e.g. the "head" example in the article.

malcolmgreaves•6mo ago

That’s just evidence that these sophisticated next token predictors are not good enough yet. The works should not bend over backwards to accommodate a new tool. The new tool needs to adapt to the world. Or only be used in the situations where it is appropriate. This is one of the problems of calling LLMs AI: a language model lacks understanding.

vidarh•6mo ago

Many of us have actual work to get done, rather than conform to purity tests for the sake of it. Nobody will erase the other versions of these tools by making adaptations that are more suitable for this use.

troupo•6mo ago

The answer isn't "let's tear up and redo our tools for the hope that it will benefit black boxes with non-deterministic output".

The answer is:

- make non-deterministic black boxes more deterministic and less black boxes

- improve tools for humans

dkdcio•6mo ago

Fortunately improving tools for humans tends to improve them for the non-deterministic black boxes too

troupo•6mo ago

It's a feeling that really hasn't been tested or verified (like any other feeling when it comes to LLMs)

jerpint•6mo ago

I’ve been building a context-engineering tool for collaborating with LLMs. The CLI is for the human and the MCP is for the LLM, but they all map to the same core commands

https://github.com/jerpint/context-llemur

I’ve actually bootstrapped ctx with ctx and found it very useful !

It basically stops me from having to repeat myself over and over to different agents

chubot•6mo ago

I do think it’s interesting how Claude Code makes shell and dev automation more important – it also makes testing and code review more important

So there is probably some room for innovation here

But most of these seems like problems with Claude (and maybe fundamental problems with LLMs), not problems with the CLI interface:

This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying Edit(.git/hooks/pre-commit) to my project’s .claude/settings.json. I look forward to its next lazy innovation.

If you watch Claude Code, you’ll see that it often uses head -n100 to limit the results apriori. It also gets lost about which directory it’s in, and it will frustratingly flail around trying to run commands in different directories until it finds the right one.

kordlessagain•6mo ago

Agree with the whack-a-mole effect, where it goes from nailing the problem or bug to absolutely destroying the code. I would offer some of these MCP tools I wrote/had written to solve the problem: https://github.com/kordless/gnosis-evolve. Tools are in contrib-tools.

It has helped tremendously having a dedicated build service that CC can control through MCP vs running Docker itself because it can then restart the container and test. And, the fuzzy search tool and diff editor seem to perform better than the replacement strategy Claude Code uses, most of the time. I continue to work on the editor when I run into issues with it, so happy to help anyone interested in implementing their own file editing (and search) strategy.

You will need to convert these to Claude Code format, but all you need to do is ask CC to do it for you...

pbronez•6mo ago

Your license is interesting. To meet your intent, I suggest you revisit this definition:

> “Military Entity" means any armed forces branch, defense department, or military organization of any nation or alliance.

As written, this only applies to nation states. It excludes many kinds of human organizations that use force to impose their will on others. The word for this is “Terrorist.”

While that term has been applied to many groups for many reasons, it technically means “the use of violence against non-combatants to achieve political or ideological aims.”

If you add Terrorism + Nation-State Militaries should cover most everyone you intend here, including organized crime and private military contractors. You could add “financial gain” to the definition if you want to ensure those last two are captured.

Uehreka•6mo ago

The words “license” and “interesting” should not be in the same sentence.

Unless you are a lawyer practicing IP law, do not attempt modify or customize licenses, period. Contrary to what pbronez is saying, you have no guarantee that the terms “Terrorism + Nation-State Militaries” will function anything like you expect them to. Not to mention that most mainstream licenses have had to be crafted in special ways to deal with differences in what different countries will even allow you to put in a license (you can’t generally just write “by using this software you agree to name your child after me” and have it actually hold up).

If you don’t want your software to be used by the military industrial complex, I get it, but DO NOT just try to hand-spin a license based on what seems to make sense.

Instead, consider that you’re not the first person to want this, and there are probably like 3 existing licenses crafted by actual lawyers who know what they’re doing and can also explain the possible pitfalls in using the license.

IncreasePosts•6mo ago

Nation states might actually follow the license. Would terrorists?

jampekka•6mo ago

> The word for this is “Terrorist.”

Or Freedom Fighter or Opposition Forces or Militia etc. The term used correlates significantly with how west-aligned the armed group is geopolitically.

With Terrorist it would probably be interpreted as whatever is assigned into US/EU terrorist organization lists.

fsckboy•6mo ago

>As written, this only applies to nation states.

if you're quibbling about legal definitions, your "nation state" is used incorrectly. A "nation" is a large ethnic group, for example the German nation, the Cherokee nation, etc. A state is geography that is its own political entity/country, for example the gulf states.

Germany is a nation state, the Cherokee nation is not a state at all. Netherlands and France are nation states, Belgium is not, its just a state. Austria is part of the German nation but it is not part of the nation state of Germany. Australia or Canada are states but not nations.

com2kid•6mo ago

I'd argue that many CLI tools output too much log spew by default and rely on making humans take up the burden of parsing through masses of output to find the one useful line.

For another example of where this is a problem, look at any large company that pays to keep logs in kibana, the amount of over logging paid for is insane.

Approximately 1/3rd of my Claude code tokens are spent parsing CLI output, that is insane! Often Claude doesn't catch what it needs in the massive log outputs and I have to go through the logs myself to find the problem and point it out to Claude.

pjmlp•6mo ago

I rather see improvements in voice control and hand writing, as means of communication.

kjkjadksj•6mo ago

“Rethinking command line interface interfaces with AI” I would have expected nothing less from an airticle.

tempodox•6mo ago

Language is just too hard a challenge.

ascorbic•6mo ago

I just built a library designed to help with part of this: detecting if a tool is being run in one of these environments. That would allow it to, for example, run in non-interactive mode or give extra context in logs.

https://github.com/ascorbic/am-i-vibing

yoavm•6mo ago

Or you can give your AI agent access to your terminal. I've been using https://github.com/hiraishikentaro/wezterm-mcp/ with gemini-cli and it generally allows it to use the terminal like I would, so stuff like scrolling inside interactive TUIs etc more-or-less just works.

j45•6mo ago

Appreciate the share!

I might give access to a terminal in a locked down VM, I don't know about a shell.

SoftTalker•6mo ago

At least give it its own login (and no sudo privileges).

j45•6mo ago

Maybe, don't need to run a daycare of what the heck is the model doing today unless absolutely necessary.

It might be like giving a 3 year old the same thing lol.

badlibrarian•6mo ago

Step 1: a no-excuses, never-fails undo.

esafak•6mo ago

Cursor has checkpoints. Good idea.

https://docs.cursor.com/agent/chat/checkpoints

joshka•6mo ago

jujutsu would make this fairly simple.

theodric•6mo ago

You can give AI access to your terminal, dude. I'm fine over here, thanks.

Msurrow•6mo ago

> This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying […]

When will people acknowledge that LLMs are stochastic text generators?

This whole blog reads like trying to fit a square into a round hole. And frankly most of the comments in this thread is jumping right on the wagon “what water?”-style [1]

By all means use LLMs for what they can be useful for but god damnit when they are not useful please acknowledge this and stop trying to make everything a nail for the LLM-hammer.

LLMs are. not. intelligent. They don’t have a work ethic that says “oh maybe skipping tests is bad”. If they generate output that skips tests it’s because a high enough part of the training data contained that text sentence.

[1] fish joke

lucianbr•6mo ago

The whack-a-mole thing is a huge "this thing is not useful" indicator to me, and I am really confused how other people don't see it. Ok, there's an agent and the agent is able to figure out stuff and do stuff on its own. Great. But it's trying to cheat and instead of doing what I'm asking it just tries to go the easiest fastest way to claim "job done". How is that useful? If I had an intern do this I would seriously consider getting rid of them.

This is elementary school stuff. Do the assignment, don't cheat. Does useful software get written by people who don't understand this basic fact?

project2501a•6mo ago

Yeah, pretty much. This feels like a piece of "how do i vibe-code the command line, while my options vest"

How about you learn what the heck you are doing?

AchintyaAshok•6mo ago

I think part of this is that we're in a transition phase. The shell cmds we have built (for example) were built for human consumption (ex. manpages). They were built around the expectation that we learn how to use it through experimentation or were taught by more knowledgable peers. In the AI world, we basically need to assume that role of the guide / sherpa for the LLM.

Another idea that I've been thinking about is context hierarchy:

Low -> High Utility

Base (AI reads tool desc/manpage,etc.) > General human advice (typically use grep this way, etc.) > Specific advice (for this project / impl this is how you use the tool).

Currently the best interface to provide our insights are via MCPs. At https://toolprint.ai/ we're building a human (or machine) driven way supplement that knowledge around tool-use to Claude/Cursor, etc.

A practical way in which we dogfood our own product is with the Linear MCP. If you connect that and ask an agent to create a new issue, it predictably fails because there's no instructions on which linear project to select or the correct way to provide a description around Linear's quirks. When we connect the linear mcp via the toolprint mcp, it gets pre-primed context around these edge cases to improve tool use.

skydhash•6mo ago

The shell is an interface. The computer is the tool. Then we find that we have workflows that are actually routine. And we create scripts to handle them. Then we find that they are contextual, and we created task runners to provide the context.Then our mental capacity is freed, while the computer takes care of the menial stuff. And everything is good.

That is generally how it goes for power users. And people that takes the time to RTFM.

But now, I see people that don't want to determine their workflows. It's just ad-hoc use, spending their time on local decisions that doesn't matter that much instead of grasping the big picture and then solves it. Maybe it helps them looks busy.

So I don't want an agent for Linear. What I want is maybe a bash alias "isc" (for "issue create"), that pops up nano where I write the issue in git commit format (title + blank line + description). Upon saving, the rest is done automatically, because it can determines the project based on some .trackerrc I put in the root of the project. Or maybe a "linear-issues" emacs command and a transient interface (again, the correct project can be determined automatically).

jonplackett•6mo ago

Will this just be solved by agents being multimodal and using a computer in a more human way - context is a solved UI problem, by the GUI. The GUI just lacks power - but an AI could just have access to both.

anthk•6mo ago

Emacs it's the cli/tui rethinked.

duncanfwalker•6mo ago

The --no-verify example is interesting because I can imagine the same hint being useful for junior engineers. In general it's hard to give the the right level of advice in cli docs because you don't always know who the consumer will be and so what knowledge can be assumed. The think that makes LLMs different is that there's no problem to being verbose in the docs because you're not wasting any human's time. It would be cool if you could docs that provide extra advice like in the example and then the interface adapted for the users context - for LLMs provide everything, for human users learn what they know give them just the right level of advice

jasonriddle•6mo ago

This is a great post, thank you for sharing. I like the idea of giving hints to the LLMs.

To clarify, the example that was provided using `command_not_found_handler`, is that possible to implement in bash? Or perhaps you were saying this would be a nice to have if this functionality existed?

withzombies•6mo ago

The `command_not_found_handler` can be added to your .zshrc or .bashrc as is.

esafak•6mo ago

The complicate GUI is simply a visualized version of CLI utilities of the day, which were no less complicated.

I was thinking about this just the other day, and there was one from the late 80s that had scores of parameters, but I could not remember its name. I think it was an `ls` type utility.

esafak•6mo ago

I finally remembered: PocketD(irectory), by Jeff Rollason.

https://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/simteln...

I used https://www.neilb.net/doswasmx/ to run it, generating this screenshot of its CLI: https://imgur.com/a/6i91Vyz

And that's just all they could fit on one page.

kristopolous•6mo ago

Just yesterday I updated a tool to parse and snip sections of manpages I made in 2020 to have an LLM ingestion feature for fitting partial manpages into tight context windows (https://github.com/day50-dev/Mansnip).

There may be something more generalizable here.

tomrod•6mo ago

That's pretty cool, man (pun intended).

layer8•6mo ago

If I can learn how to use the Bulk Rename Utility (it’s actually quite useful once you get to grips with it), then AI should be able to as well. ;)

There’s the saying that computers should adapt to us, rather than the other way around, but now this makes me wonder which side LLMs are on in that picture.

qingcharles•6mo ago

Bulk Rename Utility is excellent and I've used it a lot in the past. Ironically, I've been thinking about replacing it with an LLM based rename tool that can look at each filename and make decisions about how to rename them (I'm usually trying to rename tens of thousands of PDFs which have the date written in a dozen different format and languages and normalize them all to dd MM YYYY).

BrianCripe•6mo ago

Agree 100% that CLI interface design needs to be altered to include AI Agents as a new type of user persona, but I don't think it's as drastic of a change as one might expect.

We designed Desktop GUI & Web Browsers on top of the terminal to allow a type of user to interact without speaking "lower level" commands, but we've also created abstractions to hide complexity for ourselves at this layer. We just so happen to call them CLI Apps, Scripts, Makefile targets, Taskfile tasks, Justfile recipes, unix tools, etc. It consists of a pseudo-natural language short-code name combined with schema-validated options and some context around what each option does (via the --help view). The trick is how do we optimize for both human developers and AI Agents to have access to the same tools but in the optimized interface for each.

In an experiment to let my agents share the exact same 'tools' that I do for developing in a repository, I gave it direct access to load and self-modify the local project Justfile via MCP: https://github.com/toolprint/just-mcp

Just as (pun intended) I create tools for myself to repeat common tasks with sane defaults and some parameters, my agents immediately gain the same access and I can restrict permissions to use these instead of ANY bash command (IE: "Bash(just:*)"). The agent can also assist in creating tools for me or itself to use that would save on time and token usage. I'd love to see the paradigm evolve to the point it feels more like warp.dev where you don't have to switch between two text boxes to choose whether you're talking in natural language or instructing to run a known 'tool'.

skydhash•6mo ago

Interfaces and tools are orthogonal. It's like a hammer. The head is what is used on the nail. While the handle is shaped to fit the human hand. We can modify one without modifying the other. Another good example is Magit (or Lazygit) and git. Magit is designed to be used interactively, while is more about the domain of version control.

Workflows are humans processes, what we do is naming them and identify their parameters. The actual tools to implement those workflows don't matters that much at a human scale other than cognitive load. So I don't care much about gcc various options. What I want is `make debug` or `make release` (or just `make`). And cognitive load is lowered because I only have these to remember and they are deterministic.

Agent is not a good bridge between humans and tools. Because they increase cognitive load, while all the interface have been about lowering it. There's no "make test" and have a nice output of all the lines that have been flagged (and have some integration like Vim's quickfix which can quickly bring you to each line). Instead it's typing a lot and praying that it actually do something good.

BrianCripe•6mo ago

I don't think I disagree with you here, but I'm not sure I fully understand your position.

I agree that if the human is "driving", they should be able to use the Tool directly (IE: make test). If you put an agent in the middle and ask it "please run make test" that's just silly and costs extra for no benefit.

Where you get benefit is if you design tools like "just test" as an MCP tool called "mcp__just-mcp__test" and give a fully-autonomous agent instructions like: "Whenever you feel like you've completed a task, run mcp__just-mcp__test and fix errors and warnings until it passes, then you may commit changes locally". LLM's have 'congitive load' as well, so why not offload the deterministic logic to Tools in the same way we do?

skydhash•6mo ago

My point is whenever the instructions feels repetitive, it's better to write it down and instead have a trigger. And if you find yourself with a lot of trigger, then there's usually some kind of abstraction that reduce them.

> give a fully-autonomous agent instructions like: "Whenever you feel like you've completed a task, run mcp__just-mcp__test and fix errors and warnings until it passes, then you may commit changes locally"

Why not have some script named `work-on` like:

  agent --from-file $task_file

  make test > test.log
  test_status = $last_status
  
  while test_status is error
    agent --from-file fix-test-errors.txt
    make test > test.log
    test_status = $last_status

  git commit

And then use one of the git hooks to run

  message = agent --from-file commit-message-from-last-task.txt

and use $message for the commit.

Then you will only have to run `work-on task-one.txt`. You can also make it an emacs command that runs on the current buffer and bind it to "C-c C-c"

BrianCripe•6mo ago

I think we are both approaching the situation with the same intent, which is: "when I have a repetitive task with a small cardinality of input options, I want to create a deterministic abstraction and an easy-to-invoke trigger".

If the implementation and execution of the script is considered separate, I just want my agent to immediately know "how it's supposed to be used" for any new script I just wrote and be given scoped permissions for it. If it's given full Bash access it can certainly invoke it as I would, but unless the documentation in the script is extensive, it might not know all the context I do about how and when to use it properly. Plus, the output may be overly verbose by default and waste tokens, so it should make sure to only call in a more "quiet" mode.

The original point of this thread was around "how to design CLI better for AI Agents", so the question is if we can do better from a token efficiency standpoint than writing the same scripts as before. Perhaps simple hook-driven actions are not good examples of where things may be significantly improved.

skydhash•6mo ago

> If the implementation and execution of the script is considered separate, I just want my agent to immediately know "how it's supposed to be used" for any new script I just wrote and be given scoped permissions for it.

Then there's only two path I can think of. Either consistency in the help system so that the agent can recursively determine a path by asking questions and getting answers (which no one does really, other than children). I think that's what MCP is all about. The other is to have these kind of overarching workflows/scripts with agents sharing the same context, but each uniquely suited for a specific task.

But I can't find any pros for agent over training yourself and having a specialized and deterministic toolset. If you look hard enough at the prompts, you'll find enough similarity between them to build out a script.

sothatsit•6mo ago

I have found just creating a scripts directory and telling Claude Code to run the scripts is pretty effective. MCP seems like overkill for this use-case.

graphememes•6mo ago

Ironically, I really like bulk rename utility, it's quite nice

ivanjermakov•6mo ago

There is something about these power-user oriented tools. It does not try to hide complexity and shows right away the list of features it has.

gtirloni•6mo ago

Somehow a whole industry is now fine with Heisenbugs being a regular part of the dev workflow.

fullstackwife•6mo ago

the salary raise and promo project industry within large corps is fine with that

there is everyone else who is supposed deliver software that works, like always, and they are not fine with built-in flakiness

LtWorf•6mo ago

I'm not fine but the board decided like that.

joegibbs•6mo ago

If you wanted to you could make the LLM return entirely deterministic results, but it wouldn’t be very helpful since a semantically identical prompt could still create an entirely different result with a single character difference.

Zezima•6mo ago

command line interface interface

elitan•6mo ago

I had the same thought when I was dog fooding a CLI tool I've been vibe coding. It's a CLI for deploying Docker apps on any server. An here is the exact PR "I" did.

https://github.com/elitan/lightform/pull/35

One of the advantages of vibe coding CLI tools like this is that it's easy for the AI to debug itself, and it's easy to see where it gets stuck.

And it usually gets stuck because of:

1. Errors 2. Don't know what command to run

So:

1. Errors must be clear and *actionable* like `app_port` ("8080") is a string, expected a number. 2. The command should have simple, actionable and complete help (`--help`) sections for all commands.

jiehong•6mo ago

Sounds like it applies to humans using CLIs as well.

elitan•6mo ago

For sure.

sly010•6mo ago

Sorry for the snark, but we couldn't even do this for humans, but let's do it for poor poor LLMs? It's kind of ironic that NOW is the time we worry about usability. What happened to RTFM?

pcwelder•6mo ago

Losing the sense of cwd is the reason why I append it in the output of each command run in wcgw mcp [1]

It rarely does it incorrectly after that.

I won't be surprised if claude code does the same soon.

However, they do have an env flag called CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1

This should also fix the wrong dir behavior.

[1] https://github.com/rusiaaman/wcgw

saghm•6mo ago

I'm not entirely convinced by the examples given in the article. For starters, there's already a perfectly reasonable way to lazily buffer outputs for CLI commands: pagers. If LLMS aren't able to figure out how to pipe to `less` and send the escape sequence for the down arrow to read until they're done instead of repeating the same builds to get the error output small amounts at a time, it's seems a bit overconfident to assume that it's worth investing in larger changes to design around the undesirable local optima they seem to be getting stuck in today. Furthermore, there already _is_ a way to get structured output today from cargo build; it has the ability to emit diagnostics as JSON rather then plaintext. I'm actually a bit surprised that this isn't already something they're taking advantage of. It's not clear to me whether this is a limitation of the model or the configurations of the sector author that prevent this, but either way, I'm not really sure the problem lies with the interface as much as how it's being used.

As for the git CLI...yeah, that one sucks for humans too. There have been some recent improvements, like the addition of subcommands like `switch` to try to replace of the API space previously provided by `checkout` command, which was semantically overloaded to an absolutely staggering degree. I feel like there's been more than enough pain for the humans using the git CLI to justify completely overhaul the interface for years now though, and that hasn't happened yet, so I'm skeptical that the situation for LLMS will realistically improve all that much either. On the other hand, I've found using `jj` as an extremely suitable replacement for interacting with git repos to the point where I seem to have shed quite a bit of my working knowledge of handling some of the messier rebases that I'd previously be fairly used to having to do on a regular basis. As someone who frequently was the one who helped teammates when they had to deal with messy conflicts when pulling in changes that would occur when they were working on benches that took a long time to be ready to merge, it's a bit surreal to realize how much mental energy I've been able to recoup by just not having to care about doing things directly via the git CLI anymore. Maybe LLMs would have a better time working with git repos through jj as well.

vmt-man•6mo ago

Thanks for a great article. Problems you described are very common.

joshka•6mo ago

> It seems a bit silly to suggest, but perhaps we need a whole set of LLM-enhanced CLI tools or a custom LLM shell? The user experience (UX) field could even branch into AI experience and provide us a whole new information architecture.

Not silly at all. I think we actually need the various parts of the software development inner loop (compiler, linter, typo check, unit test runner, ...) to have much fewer levels of abstraction so that they can integrate well into IDEs as well as agentic AI tooling (whether CLI or otherwise). MCP is just a start on the right direct, but we can probably do better.

aktuel•6mo ago

I'm confused. Is super intelligence just around the corner or are we dealing with some special needs kind of exceptionalism?

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Rethinking CLI interfaces for AI

Comments