Show HN: My LLM CLI tool can run tools now, from Python code or plugins

https://simonwillison.net/2025/May/27/llm-tools/

529•simonw•1mo ago

Comments

behnamoh•1mo ago

what are the use cases for llm, the CLI tool? I keep finding tgpt or the bulletin AI features of iTerm2 sufficient for quick shell scripting. does llm have any special features that others don't? am I missing something?

simonw•1mo ago

I find it extremely useful as a research tool. It can talk to probably over 100 models at this point, providing a single interface to all of them and logging full details of prompts and responses to its SQLite database. This makes it fantastic for recording experiments with different models over time.

The ability to pipe files and other program outputs into an LLM is wildly useful. A few examples:

  llm -f code.py -s 'Add type hints' > code_typed.py
  git diff | llm -s 'write a commit message'
  llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
    -s 'explain all the tricks used by this CSS'

It can process images too! https://simonwillison.net/2024/Oct/29/llm-multi-modal/

  llm 'describe this photo' -a path/to/photo.jpg

LLM plugins can be a lot of fun. One of my favorites is llm-cmd which adds the ability to do things like this:

  llm install llm-cmd
  llm cmd ffmpeg convert video.mov to mp4

It proposes a command to run, you hit enter to run it. I use it for ffmpeg and similar tools all the time now. https://simonwillison.net/2024/Mar/26/llm-cmd/

I'm getting a whole lot of coding done with LLM now too. Here's how I wrote one of my recent plugins:

  llm -m openai/o3 \
    -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
    -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
    -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
      number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'

I wrote about that one here: https://simonwillison.net/2025/Apr/20/llm-fragments-github/

LLM was also used recently in that "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation" story - to help automate running 100s of prompts: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...

th0ma5•1mo ago

"LLM was used to find" is not what they did

> had I used o3 to find and fix the original vulnerability I would have, in theory [...]

they ran a scenario that they thought could have lead to finding it, which is pretty much not what you said. We don't know how much their foreshadowing crept into their LLM context, and even the article says it was also sort of chance. Please be more precise and don't give into these false beliefs of productivity. Not yet at least.

simonw•1mo ago

I said "LLM was also used recently in that..." which is entirely true. They used my LLM CLI tool as part of the work they described in that post.

th0ma5•1mo ago

Very fair, I expect others to confuse what you mean productivity of your tool called LLM vs. the doubt that many have on the actually productivity of LLM the large language model concept.

setheron•1mo ago

Wow what a great overview; is there a big doc to see all these options? I'd love to try it -- I've been trying `gh` copilot pulgin but this looks more appealing.

simonw•1mo ago

I really need to put together a better tutorial - there's a TON of documentation but it's scattered across a bunch of different places:

- The official docs: https://llm.datasette.io/

- The workshop I gave at PyCon a few weeks ago: https://building-with-llms-pycon-2025.readthedocs.io/

- The "New releases of LLM" series on my blog: https://simonwillison.net/series/llm-releases/

- My "llm" tag, which has 195 posts now! https://simonwillison.net/tags/llm/

setheron•1mo ago

I use NixOS seems like this got me enough to get started (I wanted Gemini)

``` # AI cli (unstable.python3.withPackages ( ps: with ps; [ llm llm-gemini llm-cmd ] )) ```

looks like most of the plugins are models and most of the functionality you demo'd in the parent comment is baked into the tool itself.

Yea a live document might be cool -- part of the interesting bit was seeing "real" type of use cases you use it for .

Anyways will give it a spin.

furyofantares•1mo ago

I don't use llm, but I have my own "think" tool (with MUCH less support than llm, it just calls openai + some special prompt I have set) and what I use it for is when I need to call an llm from a script.

Most recently I wanted a script that could produce word lists from a dictionary of 180k words given a query, like "is this an animal?" The script breaks the dictionary up into chunks of size N (asking "which of these words is an animal? respond with just the list of words that match, or NONE if none, and nothing else"), makes M parallel "think" queries, and aggregates the results in an output text file.

I had Claude Code do it, and even though I'm _already_ talking to an LLM, it's not a task that I trust an LLM to do without breaking the word list up into much smaller chunks and making loads of requests.

cyanydeez•1mo ago

youre pnly a few steps away from creating a LLM synaptic network

furyofantares•1mo ago

I'm automating spending money at an exponential rate.

behnamoh•1mo ago

unrelated note: your blog is nice and I've been following you for a while, but as a quick suggestion: could you make the code blocks (inline or not) highlighted and more visible?

simonw•1mo ago

I have syntax highlighting for blocks of Python code - e.g. this one https://simonwillison.net/2025/May/27/llm-tools/#tools-in-th... - is that not visible enough?

This post has an unusually large number of code blocks without syntax highlighting since they're copy-pasted outputs from the debug tool which isn't in any formal syntax.

oliviergg•1mo ago

Thank you for this release. I believe your library is a key component to unlocking the potential of LLMs without the limitations/restricitions of existing clients.

Since you released version 0.26 alpha, I’ve been trying to create a plugin to interact with a some MCP server, but it’s a bit too challenging for me. So far, I’ve managed to connect and dynamically retrieve and use tools, but I’m not yet able to pass parameters.

simonw•1mo ago

Yeah I had a bit of an experiment with MCP this morning, to see if I could get a quick plugin demo out for it. It's a bit tricky! The official mcp Python library really wants you to run asyncio and connect to the server and introspect the available tools.

mihau•1mo ago

Hi Simon!

I'm a heavy user of the llm tool, so as soon as I saw your post, I started tinkering with MCP.

I’ve just published an alpha version that works with stdio-based MCP servers (tested with @modelcontextprotocol/server-filesystem) - https://github.com/Virtuslab/llm-tools-mcp. Very early stage, so please make sure to use with --ta option (Manually approve every tool execution).

The code is still messy and there are a couple of TODOs in the README.md, but I plan to work on it full-time until the end of the week.

Some questions:

Where do you think mcp.json should be stored? Also, it might be a bit inconvenient to specify tools one by one with -T. Do you think adding a --all-tools flag or supporting glob patterns like -T name-prefix* in llm would be a good idea?

simonw•1mo ago

OK this looks like a very promising start!

You're using function-based tools at the moment, hence why you have to register each one individually.

The alternative to doing that is to use what I call a "toolbox", described here: https://llm.datasette.io/en/stable/python-api.html#python-ap...

Those get you two things you need:

1. A single class can have multiple tool methods in it, you just have to specify it once 2. Toolboxes can take configuration

With a Toolbox, your plugin could work like this:

  llm -T 'MCP("path/to/mcp.json")' ...

You might even be able to design it such that you don't need a mcp.json at all, and everything gets passed to that constructor.

There's one catch: currently you would have to dynamically create the class with methods for each tool, which is possible in Python but a bit messy. I have an open issue to make that better here: https://github.com/simonw/llm/issues/1111

mihau•1mo ago

Thanks for feedback!

Ah, I saw "llm.Toolbox" but I thought it's just for plugin developer convenience.

I'll take a look at the issue you posted (#1111). Maybe I can contribute somehow :).

sorenjan•1mo ago

Every time I update llm I have to reinstall all plugins, like gemini and ollama. My Gemini key is still saved, as are my aliases for my ollama models, so I don't get why the installed plugins are lost.

simonw•1mo ago

Sorry about that! Presumably you're updating via Homebrew? That blows away your virtual environment, hence why the plugins all go missing.

I have an idea to fix that by writing a 'plugins.txt' file somewhere with all of your installed plugins and then re-installing any that go missing - issue for that is here: https://github.com/simonw/llm/issues/575

sorenjan•1mo ago

No, I'm using uv tool just like in that issue. I'll keep an eye on it, at least I know it's not just me.

tionis•1mo ago

I'm also using uv tools and fixed it by doing something like this to upgrade:

uv tool install llm --upgrade --upgrade --with llm-openrouter --with llm-cmd ...

johnisgood•1mo ago

Is the double "--upgrade" a typo?

tionis•1mo ago

Yes, autocorrect on my phone worked against me

tinodb•1mo ago

Same, but put that in a llm-upgrade.sh because the list was getting quite long :)

varyherb•1mo ago

I was running into this too until I started upgrading with

llm install -U llm

instead of

uv tool upgrade llm

(the latter of which is recommended by simonw in the original post)

simonw•1mo ago

Thanks! I didn't realize "llm install -U llm" did that. I'll add that to the upgrade docs.

dr_kretyn•1mo ago

Maybe a small plug of own similar library: terminal-agent (https://github.com/laszukdawid/terminal-agent) which also supports tools and even MCP. There's a limited agentic capability but needs some polishing. Only once I made some progress on own app I learned about this `llm` CLI. Though that one more won't harm.

swyx•1mo ago

nice one simon - i'm guessing this is mildly related to your observation that everyone is converging on the same set of tools? https://x.com/simonw/status/1927378768873550310

simonw•1mo ago

Actually a total coincidence! I have been trying to ship this for weeks.

ttul•1mo ago

GPT-4.1 is a capable model, especially for structured outputs and tool calling. I’ve been using LLMs for my day to day grunt work for two years now and this is my goto as a great combination of cheap and capable.

simonw•1mo ago

I'm honestly really impressed with GPT-4.1 mini. It is my default from messing around by their API because it is unbelievably inexpensive and genuinely capable at most of the things I throw at it.

I'll switch to o4-mini when I'm writing code, but otherwise 4.1-mini usually does a great job.

Fun example from earlier today:

  llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
    -s 'explain all the tricks used by this CSS'

That's piping the CSS from that incredible CSS Minecraft demo - https://news.ycombinator.com/item?id=44100148 - into GPT-4.1 mini and asking it for an explanation.

The code is clearly written but entirely uncommented: https://github.com/BenjaminAster/CSS-Minecraft/blob/main/mai...

GPT-4.1 mini's explanation is genuinely excellent: https://gist.github.com/simonw/cafd612b3982e3ad463788dd50287... - it correctly identifies "This CSS uses modern CSS features at an expert level to create a 3D interactive voxel-style UI while minimizing or eliminating JavaScript" and explains a bunch of tricks I hadn't figured out.

And it used 3,813 input tokens and 1,291 output tokens - https://www.llm-prices.com/#it=3813&ot=1291&ic=0.4&oc=1.6 - that's 0.3591 cents (around a third of a cent).

puttycat•1mo ago

> while minimizing or eliminating JavaScript

How come it doesn't know for sure?

simonw•1mo ago

Because I only showed it the CSS! It doesn't even get the HTML, it's guessed all of that exclusively from what's in the (uncommented) CSS code.

Though it's worth noting that CSS Minecraft was first released three years ago, so there's a chance it has hints about it in the training data already. This is not a meticulous experiment.

(I've had a search around though and the most detailed explanation I could find of how that code works is the one I posted on my blog yesterday - my hunch is that it figured it out from the code alone.)

puttycat•1mo ago

Thanks. I meant that it should understand that the css doesn't require/relates to a js

yangikan•1mo ago

Thanks for this. I am planning to cancel my ChatGPT plus subscription and use something like the llm tool with the API key. For regular interactions, how do you handle context? For example, the UI allows me to ask a question, and then a followup and the context is kind of automatically handled.

yangikan•1mo ago

I should have RTFM https://llm.datasette.io/en/stable/usage.html#starting-an-in...

Are you aware of any user interfaces that expose some limited ChatGPT functionality using a UI, that internally uses llm. This is for my non-techie wife.

simonw•1mo ago

Here's one: https://github.com/icarito/gtk-llm-chat

I've been meaning to put together a web UI for ages, I think that's the next big project now that tools is out.

It's not using LLM, but right now one of the best UI options out there is https://openwebui.com/ - it works really well with Ollama (and any other OpenAI-compatible endpoint).

tantalor•1mo ago

This greatly opens up the risk of footguns.

The doc [1] warns about prompt injection, but I think a more likely scenario is self-inflicted harm. For instance, you give a tool access to your brokerage account to automate trading. Even without prompt injection, there's nothing preventing the bot from making stupid trades.

[1] https://llm.datasette.io/en/stable/tools.html

simonw•1mo ago

> This greatly opens up the risk of footguns.

Yeah, it really does.

There are so many ways things can go wrong once you start plugging tools into an LLM, especially if those tool calls are authenticated and can take actions on your behalf.

The MCP world is speed-running this right now, see the GitHub MCP story from yesterday: https://news.ycombinator.com/item?id=44097390

I stuck a big warning in the documentation and I've been careful not to release any initial tool plugins that can cause any damage - hence my QuickJS sandbox one and SQLite plugin being read-only - but it's a dangerous space to be exploring.

(Super fun and fascinating though.)

kbelder•1mo ago

If you hook an llm up to your brokerage account, someone is being stupid, but it ain't the bot.

isaacremuant•1mo ago

You think "senior leadership/boards of directors" aren't thinking of going all in with AI to "save money" and "grow faster and cheaper"?

This is absolutely going to happen at a large scale and then we'll have "cautionary tales" and a lot of "compliance" rules.

zaik•1mo ago

Let it happen. Just don't bail them out using tax money again.

theptip•1mo ago

Evolution in action. This is what the free market is good for.

shepherdjerred•1mo ago

Any tool can be misused

tantalor•1mo ago

You're missing the point. Most tools are deployed by humans. If they do something bad, we can blame the human for using the tool badly. And we can predict when a bad choice by the human operator will lead to a bad outcome.

Letting the LLM run the tool unsupervised is another thing entirely. We do not understand the choices the machines are making. They are unpredictable and you can't root-cause their decisions.

LLM tool use is a new thing we haven't had before, which means tool misuse is a whole new class of FUBAR waiting to happen.

johnisgood•1mo ago

But why can we not hold humans responsible in the case of LLM? You do have to go out of your way to do all of these things with an LLM. And it is the human that does it. It is the humans that give it the permission to act on their behalf. We can definitely hold humans responsible. The question is: are we going to?

tantalor•1mo ago

I think intent matters.

Let's say you are making an AI-controlled radiation therapy machine. You prompt and train and eval the system very carefully, and you are quite sure it won't overdose any patients. Well, that's not really good enough, it can still screw up. But did you do anything wrong? Not really, you followed best practices and didn't make any mistakes. The LLM just sometimes kills people. You didn't intend that at all.

I make this point because this is already how these systems work today. But instead of giving you a lethal dose of radiation, it uses slurs or promotes genocide or something else. The builders of those bots didn't intend that, and in all likelihood tried very hard to prevent it. It's not very fair to blame them.

johnisgood•1mo ago

What I am trying to say is that humans absolutely can be held responsible. Do you disagree?

cauefcr•1mo ago

You did something wrong: non-deterministic impossible to validate process for critical system.

kelseyfrog•1mo ago

> But did you do anything wrong?

Yes, you had a deontological blindspot that prevented you from asking, "What are some of the high-risk consequences, and what responsibility do I have to these consequences of my actions?"

Deontology's failure case is this ethically naive version that doesn't include a maxim that covers the "Sometimes peoples' reliance on good intentions relieves them of exerting the mental energy of considering the consequences which ultimately cause those consequences to exist"-situation.

One of the assumptions that bears arguing against is that the choice is framed as happening once before the system is deployed. However, this oversimplification has an unfortunate side effect - that we can't know all of the consequences upfront, so sometimes we're surprised by unforeseen results and that we shouldn't hold people to the standard of needing to accurately predict the future.

In real life, though, the choices are rarely final. Even this one, deploying the genocide-promoting LLM is reversible. If you didn't predict that the LLM promotes genocide, and then you find out it does in fact promote genocide, you don't throw up your hands and says, "I hadn't thought of this, but my decision is final." No, armed with the information, you feel a sense of duty to minimize the consequences by shutting it down and fixing it before deploying it again.

Further more, you take a systems level approach and ask, "How could we prevent this type of error in the future?" Which ends with "We will consider the consequences of our actions to a greater degree in the future," or perhaps even, "We don't have enough foresight to be making these decisions and we will stop making medical devices."

The point is that distilling the essence of the situation down to "good intentions absolve responsibility," or "one drop of involvement implies total responsibility," isn't really how people think about these things in real life. It's the spherical cow of ethics - it says more about spheres than it does about cows.

johnisgood•1mo ago

> "good intentions absolve responsibility"

Yeah, I call it bullshit. I accidentally shot someone yet I did not intend to. I am still being liable for it.

johnisgood•1mo ago

> But did you do anything wrong? Not really, you followed best practices and didn't make any mistakes. The LLM just sometimes kills people. You didn't intend that at all.

Whoever made and/or (mis)use the LLMs should be liable for it, regardless of intent.

If I shot someone which I did not intend to, do I get away with it? No, I would not get away with it.

yard2010•1mo ago

This is not misuse. This is equivalent to a driller that in some cases drills the hand holding it.

theptip•1mo ago

A band saw is more dangerous than a spoon.

mike_hearn•1mo ago

Yes, sandboxing will be crucial. On macOS it's not that hard, but there aren't good easy to use tools available for it right now. Claude Code has started using Seatbelt a bit to optimize the UX.

arendtio•1mo ago

I think the whole footgun discussion misses the point. Yes, you can shoot yourself in the foot (and probably will), but not evaluating the possibilities is also a risk. Regular people tend to underestimate the footgun potential (probably driven by fear of missing out) and technical people tend to underestimate the risk of not learning the new possibilities.

Even a year ago I let LLMs execute local commands on my laptop. I think it is somewhat risky, but nothing harmful happened. You also have to consider what you are prompting. So when I prompt 'find out where I am and what weather it is going to be', it is possible that it will execute rm -rf / but very unlikely.

However, speaking of letting an LLMs trade stocks without understanding how the LLM will come to a decision... too risky for my taste ;-)

stavros•1mo ago

Have you guys had luck with tool calls? I made a simple assistant with access to my calendar, and most models fail to call the tool to add calendar events. GPT-4.1 also regularly tries to gaslight me into believing that it added the event when it didn't call the tool!

Overall, I found tool use extremely hit-and-miss, to the point where I'm sure I'm doing something wrong (I'm using the OpenAI Agents SDK, FWIW).

simonw•1mo ago

I get the impression that the key to getting great performance out of tool calls is having a really detailed system prompt, with a bunch of examples.

Anthropic's system prompt just for their "web_search" tool is over 6,000 tokens long! https://simonwillison.net/2025/May/25/claude-4-system-prompt...

xrd•1mo ago

Is no one else bothered by that way of using tools? Tools feel like a way to get deterministic behavior from a very hallucinatory process. But unless you put a very lengthy and comprehensive non-deterministic English statement, you can't effectively use tools. As we all know, the more code, the more bugs. These long and often hidden prompts seem like the wrong way to go.

And, this is why I'm very excited about this addition to the llm tool, because it feels like it moves the tool closer to the user and reduces the likelihood of the problem I'm describing.

simonw•1mo ago

As an experienced software engineer I'm bothered about pretty much everything about how we develop things on top of LLMs! I can't even figure out how to write automated tests for them.

See also my multi-year obsession with prompt injection and LLM security, which still isn't close to being a solved problem: https://simonwillison.net/tags/prompt-injection/

Yet somehow I can't tear myself away from them. The fact that we can use computers to mostly understand human language (and vision problems as well) is irresistible to me.

131012•1mo ago

This is exactly why I follow your work, this mix of critical thinking and enthusiasm. Please keep going!

th0ma5•1mo ago

> The fact that we can use computers to mostly understand human language

I agree that'd be amazing if they do that, but they most certainly do not. I think this is the core my disagreement here that you believe this and let this guide you. They don't understand anything and are matching and synthesizing patterns. I can see how that's enthralling like watching a rube goldberg machine go through its paces, but there is no there there. The idea that there is an emergent something there is at best an unproven theory, is documented as being an illusion, and at worst has become an unfounded messianic belief.

simonw•1mo ago

That's why I said "mostly".

I know they're just statistical models, and that having conversations with them is like having a conversation with a stack of dice.

But if the simulation is good enough to be useful, the fact that they don't genuinely "understand" doesn't really matter to me.

I've had tens of thousands of "conversations" with these things now (I know because I log them all). Whether or not they understand anything they're still providing a ton of value back to me.

th0ma5•1mo ago

I guess I respect that you're stating it honestly, but this is a statement of belief or faith. I think it is something that you should disclose perhaps more often because it doesn't stem from other first principles and is I guess actually just tautological. I guess this is also getting more precise with our fundamental disagreement, I guess I just wouldn't blog about things that are beliefs as if they are the technology itself?

tessellated•1mo ago

I don't need belief or faith to get use and entertainment out of the transformers. As Simon said, good enough.

xrd•1mo ago

You put it so well! I agree wholeheartedly. Llms are language toys we get to play and it's so much fun. But I'm bothered in the same way you are and that's fine.

stavros•1mo ago

That's really interesting, thanks Simon! I was kind of expecting the LLM to be trained already, I'll use Claude's prompt and see. Thanks again.

kristopolous•1mo ago

It's worth noting the streaming markdown renderer I wrote just for this tool: https://github.com/day50-dev/Streamdown

More background: https://github.com/simonw/llm/issues/12

(Also check out https://github.com/day50-dev/llmehelp which features a tmux tool I built on top of Simon's llm. I use it every day. Really. It's become indispensable)

simonw•1mo ago

Wow, that library is looking really great!

I think I want a plugin hook that lets plugins take over the display of content by the tool.

Just filed an issue: https://github.com/simonw/llm/issues/1112

Would love to get your feedback on it, I included a few design options but none of them feel 100% right to me yet.

kristopolous•1mo ago

The real solution is semantic routing. You want to be able to define routing rules based on something like mdast (https://github.com/syntax-tree/mdast) . I've built a few hacked versions. This would not only allow for things like terminal rendering but is also a great complement to tool calling. Being able to siphon and multiplex inputs for the future where cerebras like speeds become more common, dynamic configurable stream routing will unlock quite a bit more use cases.

We have cost, latency, context window and model routing but I haven't seen anything semantic yet. Someone's going to do it, might as well be me.

rpeden•1mo ago

Neat! I've written streaming Markdown renderers in a couple of languages for quickly displaying streaming LLM output. Nice to see I'm not the only one! :)

kristopolous•1mo ago

It's a wildly nontrivial problem if you're trying to only be forward moving and want to minimize your buffer.

That's why everybody else either rerenders (such as rich) or relies on the whole buffer (such as glow).

I didn't write Streamdown for fun - there are genuinely no suitable tools that did what I needed.

Also various models have various ideas of what markdown should be and coding against CommonMark doesn't get you there.

Then there's other things. You have to check individual character width and the language family type to do proper word wrap. I've seen a number of interesting tmux and alacritty bugs in doing multi language support

The only real break I do is I render h6 (######) as muted grey.

Compare:

    for i in $(seq 1 6); do 
      printf "%${i}sh${i}\n\n-----\n" | tr " " "#"; 
    done | pv -bqL 30 | sd -w 30

to swapping out `sd` with `glow`. You'll see glow's lag - waiting for that EOF is annoying.

Also try sd -b 0.4 or even -b 0.7,0.8,0.8 for a nice blue. It's a bit easier to configure than the usual catalog of themes that requires a compilation after modification like with pygments.

icarito•1mo ago

That's right this is a nontrivial problem that I struggled with too for gtk-llm-chat! I resolved it using the streaming markdown-it-py library.

kristopolous•1mo ago

Huh this might be another approach with a bit of effort. Thanks for that. I didn't know about this

hanatanaka1984•1mo ago

Interesting, I will be sure to check into this. I have been using llm and bat with syntax highlighting.

kristopolous•1mo ago

Do you just do

| bat --language=markdown --force-colorization ?

hanatanaka1984•1mo ago

| bat -p -l md

simple and works well.

hanatanaka1984•1mo ago

A simple bash script provides quick command line access to the tool. Output is paged syntax highlighted markdown.

  echo "$@" | llm "Provide a brief response to the question, if the question is related to command provide the command and short description" | bat --plain -l md

Lauch as:

  llmquick "why is the sky blue?"

kristopolous•1mo ago

I've got a nice tool as well

https://github.com/day50-dev/llmehelp/blob/main/Snoopers/wtf

I've thought about redoing it because my needs are things like

   $ ls | wtf which endpoints do these things talk to, give me a map and line numbers.

What this will eventually be is "ai-grep" built transparently on https://ast-grep.github.io/ where the llm writes the complicated query (these coding agents all seem to use ripgrep but this works better)

Conceptual grep is what I've wanted my while life

Semantic routing, which I alluded to above, could get this to work progressively so you quickly get adequate results which then pareto their way up as the token count increases.

Really you'd like some tampering, like a coreutils timeout(1) but for simplex optimization.

johnisgood•1mo ago

> DO NOT include the file name. Again, DO NOT INCLUDE THE FILE NAME.

Lmao. Does it work? I hate that it needs to be repeated (in general). ChatGPT could not care less to follow my instructions, through the API it probably would?

nbbaier•1mo ago

Ohh I've wanted this so much! Thank you!

kristopolous•1mo ago

Also I forgot to one other built on llm.

This one is a ZSH plugin that uses zle to translate your English to shell commands with a keystroke.

https://github.com/day50-dev/Zummoner

It's been life changing for me. Here's one I wrote today:

    $ git find out if abcdefg is a descendent of hijklmnop

In fact I used it in one of these comments

    $ for i in $(seq 1 6); do 
      printf "%${i}sh${i}\n\n-----\n" | tr " " "#"; 
    done | pv -bqL 30

Was originally

    $ for i in $(seq 1 6); do 
      printf "(# $i times)\n\n-----\n"
    done | pv (30 bps and quietly)

I did my trusty ctrl-x x and the buffer got sent off through openrouter and got swapped out with the proper syntax in under a second.

kazinator•1mo ago

The brace expansion syntax in Bash and Zsh expands integer ranges: {1..6}; no calling out to external command.

It's also intelligent about inferring leading zeros without needing to be told with options, e.g. {001..995}.

CGamesPlay•1mo ago

I built a similar one to this one: https://github.com/CGamesPlay/llm-cmd-comp

Looks from the demo like mine's a little less automatic and more iterative that yours.

kristopolous•1mo ago

Interesting! I like it!

The conversational context is nice. The ongoing command building is convenient and the # syntax carryover makes a lot of sense!

My next step is recursion and composability. I want to be able to do things contextualized. Stuff like this:

   $ echo PUBLIC_KEY=(( get the users public key pertaining to the private key for this repo )) >> .env

or some other contextually complex thing that is actually fairly simple, just tedious to code. Then I want that <as the code> so people collectively program and revise stuff <at that level as the language>.

Then you can do this through composability like so:

    with ((find the variable store for this repo by looking in the .gitignore)) as m:
      ((write in the format of m))SSH_PUBLICKEY=(( get the users public key pertaining to the private key for this repo ))

or even recursively:

    (( 
      (( 
        ((rsync, rclone, or similar)) with compression 
      ))  
        $HOME exclude ((find directories with secrets))         
        ((read the backup.md and find the server)) 
        ((make sure it goes to the right path))
    ));

it's not a fully formed syntax yet but then people will be able to do something like:

    $ llm-compile --format terraform --context my_infra script.llm > some_code.tf

and compile publicly shared snippets as specific to their context and you get abstract infra management at a fractional complexity.

It's basically GCC's RTL but for LLMs.

The point of this approach is your building blocks remain fairly atomic simple dumb things that even a 1b model can reliably handle - kinda like the guarantee of the RTL.

Then if you want to move from terraform to opentofu or whatever, who cares ... your stuff is in the llm metalanguage ... it's just a different compile target.

It's kinda like PHP. You just go along like normal and occasionally break form for the special metalanguage whenever your hit a point of contextual variance.

kristopolous•1mo ago

https://github.com/day50-dev/ono

It's happening

rglynn•1mo ago

Ah this is great, in combo with something like superwhisper you can use voice for longer queries.

vicek22•1mo ago

This is fantastic! Thank you for that.

I use fish, but the language change is straightforward https://github.com/viktomas/dotfiles/blob/master/fish/.confi...

I'll use this daily

rcarmo•1mo ago

Okay, this is very cool.

nlh•1mo ago

Ok this is great and perfect timing -- I've been playing around with Warp (the terminal) and while I love the idea of their terminal-based "agent" (eg tool loop), I don't love the whole Cursor-esque model of "trust us we'll make a good prompt and LLM calls for you" (and charge you for it), so I was hoping for a simple CLI-based terminal agent just to solve for my lack of shell-fu.

I am keenly aware this is a major footgun here, but it seems that a terminal tool + llm would be a perfect lightweight solution.

Is there a way to have llm get permission for each tool call the way other "agents" do this? ("llm would like to call `rm -rf ./*` press Y to confirm...")

Would be a decent way to prevent letting an llm run wild on my terminal and still provide some measure of protection.

andresnds•1mo ago

Isn’t that how the default way codex CLI runs? I.e. without passing —full-auto

johntash•1mo ago

> I don't love the whole Cursor-esque model of "trust us we'll make a good prompt and LLM calls for you" (and charge you for it

Huh, I've been wondering why a terminal was advertising everywhere and sponsoring random projects. I didn't realize it's not a completely free thing.

roxolotl•1mo ago

Simon thank you so much for this tool! I use it daily now since charmbraclet’s Mods[0] doesn’t support anthropics models. And now with tool calling it’ll be even more useful. I am curious though if there’s any appetite for improving performance? It’s noticeably slow to even just print the help on all of my machines(M1 32gb/M2 21g/ryzen 7700 64gb).

0: https://github.com/charmbracelet/mods

simonw•1mo ago

How many plugins do you have installed?

We've seen problems in the past where plugins with expensive imports (like torch) slow everything down a lot: https://github.com/simonw/llm/issues/949

I'm interested in tracking down the worst offenders and encouraging them to move to lazy imports instead.

roxolotl•1mo ago

I’ve only got the Anthropic and Gemini plugins installed. I’d be happy to do a bit more digging. I’m away for a bit but would be happy to file an issue with more context when I get a chance.

simonw•1mo ago

Try running this and see if anything interesting comes out of it:

  sudo uvx py-spy record -o /tmp/profile.svg -- llm --help

hanatanaka1984•1mo ago

Great work Simon! I use your tool daily. Pipes and easy model switching for local (ollama) and remote makes this very easy to work with.

prettyblocks•1mo ago

I've been trying to maintain a (mostly vibe-coded) zsh/omz plugin for tab completions on your LLM cli and the rate at which you release new features makes it tough to keep up!

Fortunately this gets me 90% of the way there:

llm -f README.md -f llm.plugin.zsh -f completions/_llm -f https://simonwillison.net/2025/May/27/llm-tools/ "implement tab completions for the new tool plugins feature"

My repo is here:

https://github.com/eliyastein/llm-zsh-plugin

And again, it's a bit of a mess, because I'm trying to get as many options and their flags as I can. I wouldn't mind if anyone has any feedback for me.

sillysaurusx•1mo ago

Kind of crazy this isn’t sci-fi, it’s just how coding is done now. Future generations are going to wonder how we ever got anything done, the same way we wonder how assembly programmers managed to.

xk_id•1mo ago

The transition from assembly to C was to a different layer of abstraction within the same context of deterministic computation. The transition from programming to LLM prompting is to a qualitatively different context, because the process is no longer deterministic, nor debuggable. So your analogy fails to apply in a meaningful way to this situation.

pollinations•1mo ago

Why isn't it debuggable?

sillysaurusx•1mo ago

Ultimately it’s about being able to create features within a certain period of time, not just to write code. And in that context the shift from assembly to C and the shift from deterministic programming to LLM prompting seem to have similar magnitudes of impact.

anthk•1mo ago

We tried with Common Lisp, Prolog... it didn't work for all cases, and it won't work again.

kristopolous•1mo ago

it makes simple things easy but hard things impossible. We'll see.

therouwboat•1mo ago

I'm wondering why you need LLM to show you how to use variables in shell scripts when you apparently make shell scripts everyday.

It's like if you use english everyday, but don't bother to learn the language because you have google translate (and now AI).

rat87•1mo ago

Nice. I was just looking at how to copy the python version of the barebones ruby agent https://news.ycombinator.com/item?id=43984860. I found sw-llm and tried to find how to pass it tools but was having difficulty finding it in the docs

icarito•1mo ago

For all of you using `llm` - perhaps take a look at [Gtk-llm-chat](https://github.com/icarito/gtk-llm-chat).

I put a lot of effort into it - it integrates with `llm` command line tool and with your desktop, via a tray icon and nice chat window.

I recently released 3.0.0 with packages for all three major desktop operating systems.

kristopolous•1mo ago

Interesting. What do you use it for beyond the normal chatting

icarito•1mo ago

I sometimes use llm from the command line, for instance with a fragment, or piping a resource from the web with curl, and then pick up the cid with `llm gtk-chat --cid MYCID`.

kristopolous•1mo ago

I'm actually planning on abandoning Simon's infra soon. I want a multi-stream, routing based solution that is more aware of the modern API advancements.

The Unix shell is good at being the glue between programs. We've increased the dimensionality with LLMs.

Some kind of ports based system like named pipes with consumers and producers.

Maybe something like gRPC or NATS (https://github.com/nats-io). MQTT might also work. Network transparent would be great.

howmayiannoyyou•1mo ago

This is pretty great.

samuel•1mo ago

This is great, and a feature I was hoping for, for a long time.

At this point I would have expected something mcp or openapi based but probably is simpler and more flexible this way. Implementing it as plugin shouldn't be hard, I think.

lynx97•1mo ago

This is great! AIUI, llama.cpp does support tools, but I haven't figured out yet what to do to make llm use it. Is there anything I can put into extra-openai-models.yaml to make this work?

simonw•1mo ago

That's likely a change that needs to be made to either https://github.com/simonw/llm-gguf or https://github.com/simonw/llm-llama-server

... OK, I got the second one working!

  brew install llama.cpp
  llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

Then in another window:

  llm install llm-llama-server
  llm -m llama-server-tools -T llm_time 'what time is it?' --td

Wrote it up here: https://simonwillison.net/2025/May/28/llama-server-tools/

johnisgood•1mo ago

Simon, at this point you have a lot of LLM-related tools, and I am not sure which one is outdated, which one is the newest, fanciest stuff, which one is the one that one should use (and when), and so forth.

Is there a blog post / article that addresses this?

simonw•1mo ago

For my stuff it's basically just the latest https://llm.datasette.io and its collection of plugins.

If you're interested in what I recommend generally that changes a lot, but my most recent piece about that is here: https://simonwillison.net/2025/May/15/building-on-llms/

lynx97•1mo ago

Great, this works here. I wonder, with extra-openai-models.yaml I was able to set the api_base and vision/audio: true. How do I do this with the llama-server-tools plugin? vision works, but llm refuses to attach audio because it thinks the model does not support audio /which it does).

EDIT: I think I just found what I want. There is no need for the plugin, extra-openai-models.yaml just needs "supports_tools: true" and "can_stream: false".

pawanjswal•1mo ago

I think, this LLM 0.26 just turned every terminal into a playground for AI-powered tools.

a_bonobo•1mo ago

How does this differ from langchain's tool calling?

simonw•1mo ago

Dunno, I haven't used LangChain very much. My guess is that LLM is simpler to use!

a_bonobo•1mo ago

Thanks for you answer! I agree, yours looks far easier to use - it'll be far easier to call your cmd client than Python package

tiniuclx•1mo ago

Thanks for the work you put into this, Simon. I've had a great experience with LLM, it's amazing for quickly iterating on AI application ideas.

rahimnathwani•1mo ago

Thanks for making this. I used it (0.26a0) last week to create a demo for a customer-facing chatbot using proprietary data.

The key elements I had to write:

- The system prompt

- Tools to pull external data

- Tools to do some calculations

Your library made the core functionality very easy.

Most of the effort for the demo was to get the plumbing working (a nice-looking web UI for the chatbot that would persist the conversation, update nicely if the user refreshed their browser due to a connection issue, and allow the user to start a new chat session).

I didn't know about `after_call=print`. So I'm glad I read this blog post!

chrissoundz•1mo ago

I think the project should really be given a name other than 'llm'. Not something that can be easily found or identified otherwise.

WhereIsTheTruth•1mo ago

pip, brew, pipx, uv

Can we stop already? stop following webdevs practices

never_inline•1mo ago

They are all compatible with existing stuff, which makes it kinda better than js.

never_inline•1mo ago

Thanks for writing the LLM CLI.

It's been a very useful tool to test out and prototype using various LLM features like multimodal, schema output and now tools as well! I specifically like that I can just write a python function with type annotations and plug it to the LLM.

aitchnyu•1mo ago

Do local multimodal llms have low latency? If it can plug into monitor video stream and the accessibility representation of UI (like Shortcat), it could answer if a printer is connected to the computer and preview a full page print and wait for us to hit print.

https://shortcat.app/

anthk•1mo ago

Almost like RiveScript from Perl running code from within a chatbot :)

Things now come in full circle :D

losvedir•1mo ago

Very cool!

I've wondered how exactly, say, Claude Code knows about and uses tools. Obviously, an LLM can be "told" about tools and how to use them, and the harness can kind of manage that. But I assumed Claude Code has a very specific expectation around the tool call "API" that the harness uses, probably reinforced very heavily by some post-training / fine tuning.

Do you think your 3rd party tool-calling framework using Claude is at any disadvantage to Anthropic's own framework because of this?

Separately, on that other HN post about the GitHub MCP "attack", I made the point that LLMs can be tricked into using up to the full potential of the credential. GitHub has fine-grained auth credentials, and my own company does as well. I would love for someone to take a stab at a credential protocol that the harness can use to generate fine-grained credentials to hand to the LLM. I'm envisioning something where the application (e.g. your `llm` CLI tool) is given a more powerful credential, and the underlying LLM is taught how to "ask for permission" for certain actions/resources, which the user can grant. When that happens the framework gets the scoped credential from the service, which the LLM can then use in tool calls.

simonw•1mo ago

That credentials trick is possible right now using LLM's tool support. You'd have to write a pretty elaborate tool setup which exposes the "ask for more credentials" tool and then prompts the user when that's called. The tool should keep the credentials and never pass the actual tokens back to the LLM, but it can pass e.g. a symbol "creds1" and tell the LLM to request to make calls with "creds1" in future requests.

consumer451•1mo ago

Hello Simon, sorry for asking about this tangent here, but have you seen this paper? Is it as important as it appears to be? Should this metric be on all system cards?

> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.

https://arxiv.org/abs/2502.05167

simonw•1mo ago

I had not seen that one! That's really interesting. I'd love to see them run that against Gemini 2.5 Pro and Gemini 2.5 Flash, to my understanding they're way ahead of other models on the needle in a haystack tests these days.

consumer451•1mo ago

Yes, I wish their methodology was run against new models, in an on-going fashion.

jasonriddle•1mo ago

@simonw, how much have you used Claude code? I'm wondering if it makes sense to use Claude code as the main driver, but have crafted specialized "llm" based tools for Claude to use.

I would love to only have to use llm, but with the amount of functionality that Claude code provides, it seems unlikely that llm would ever support all of its features. Also, I have realized that llm is narrower in scoop, which is something I like about the tool.

simonw•1mo ago

I use Claude Code a few times a week but honestly I'm a little intimidated by how quickly it chews through tokens!

I think it's not impossible that LLM might one day be able to handle the same kind of processes with the right collection of tool plugins and a really sophisticated system prompt but it's not something I'm shooting for right now - Claude Code is a very complicated system!

Show HN: touchgrass.fm, reclaim screen time for quality time

Hacker News MCP Server

The Growing Threat: The Dark Side of AI and LLMs

The Anthropic Economic Futures Program

Examining a Copyright Claim from Copytrack

Tell HN: A fake, highly obfuscated Solidity VSCode plugin found on marketplace

Cheese may give you nightmares

XGH: EXtreme Go Horse Methodology (2019)

Context Engineering as Code – Systematic approach to reliable AI development

India bars Jane Street from securities market, citing stock index manipulation

Dylanaraps changes README after >1 year

Ask HN: How do accelerators/VC track internal operations across startups?

Context Engineering Guide

The Two Towers MUD

Network Reconnaissance as a Way of Seeing the Invisible

Pet ownership and cognitive functioning in later adulthood across pet types

Killer AI [video]

Let's Talk Safari Extensions on iOS

Agencymaxxing

Core RISC-V supercluster on a single M.2 [video]

Gödel's Beavers, or the Limits of Knowledge

Congress passes budget reconciliation bill with $10B for NASA – SpaceNews

My Blog Is Overengineered to the Point People Think It's a Static Site (2022)

Ask HN: Is there a market for agentic scraping tools?

Hanako-San

Ask HN: What are fundamental books on systems, system thinking, reliability?

Stop Killing Games in EU passed 1.000.000 signatures

Jan – Local AI Assistant

Fixing the Web? – Carson Gross [video]

Cod Have Been Shrinking for Decades, Scientists Say They've Solved Mystery

Show HN: touchgrass.fm, reclaim screen time for quality time

Hacker News MCP Server

The Growing Threat: The Dark Side of AI and LLMs

The Anthropic Economic Futures Program

Examining a Copyright Claim from Copytrack

Tell HN: A fake, highly obfuscated Solidity VSCode plugin found on marketplace

Cheese may give you nightmares

XGH: EXtreme Go Horse Methodology (2019)

Context Engineering as Code – Systematic approach to reliable AI development

India bars Jane Street from securities market, citing stock index manipulation

Dylanaraps changes README after >1 year

Ask HN: How do accelerators/VC track internal operations across startups?

Context Engineering Guide

The Two Towers MUD

Network Reconnaissance as a Way of Seeing the Invisible

Pet ownership and cognitive functioning in later adulthood across pet types

Killer AI [video]

Let's Talk Safari Extensions on iOS

Agencymaxxing

Core RISC-V supercluster on a single M.2 [video]

Gödel's Beavers, or the Limits of Knowledge

Congress passes budget reconciliation bill with $10B for NASA – SpaceNews

My Blog Is Overengineered to the Point People Think It's a Static Site (2022)

Ask HN: Is there a market for agentic scraping tools?

Hanako-San

Ask HN: What are fundamental books on systems, system thinking, reliability?

Stop Killing Games in EU passed 1.000.000 signatures

Jan – Local AI Assistant

Fixing the Web? – Carson Gross [video]

Cod Have Been Shrinking for Decades, Scientists Say They've Solved Mystery

Show HN: My LLM CLI tool can run tools now, from Python code or plugins

Comments