Hey Max, do you use a custom wrapper to interface with the API or is there some already established client you like to use?
If anyone else has a suggestion please let me know too.
Nowadays for writing code to interface with LLMs, I don't use client SDKs unless required, instead just hitting HTTP endpoints with libraries such as requests and httpx. It's also easier to upgrade to async if needed.
As an alternative you could always use OpenWebUI
llm -m o4-mini -f github:simonw/llm-hacker-news -s 'write a new plugin called llm_video_frames.py which takes video:path-to-video.mp4 and creates a temporary directory which it then populates with one frame per second of that video using ffmpeg - then it returns a list of [llm.Attachment(path="path-to-frame1.jpg"), ...] - it should also support passing video:video.mp4?fps=2 to increase to two frames per second, and if you pass ?timestamps=1 or ×tamps=1 then it should add a text timestamp to the bottom right conner of each image with the mm:ss timestamp of that frame (or hh:mm:ss if more than one hour in) and the filename of the video without the path as well.' -o reasoning_effort high
Any time I use it like that the prompt and response are logged to a local SQLite database.
More on that example here: https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
It uses OpenRouter for the API layer to simplify use of APIs from multiple providers, though I'm also working on direct integration of model provider API keys—should release it this week.
'Usw React, typescript, materialUi, prefer functions over const, don't use unnecessary semi colons, 4 spaces for tabs, build me a UI that looks like this sketch'
And it'll do all that.
For anyone trying to return consistent json, checkout structured data where you define a json schema with required field and that would return the same structure all the time.
I have tested it with high success using GPT-4o-mini.
Yes, I also often use the "studio" of each LLM for better results because in my experience OpenAI "nerfs" models on the ChatGPT UI (models keep forgetting things—probably a limited context length set by OpenAI to reduce costs, generally the model is less chatty (again, probably to reduce their costs), etc. But I've noticed Gemini 2.5 Pro is the same on the studio and the Gemini app.
> Any modern LLM interface that does not let you explicitly set a system prompt is most likely using their own system prompt which you can’t control: for example, when ChatGPT.com had an issue where...
ChatGPT does have system prompts but Claude doesn't (one of its many, many UI shortcomings which Anthropic never addressed).
That said, I've found system prompts less and less useful with newer models. I can simply preface my own prompt with the instructions and the model follows them very well.
> Specifying specific constraints for the generated text such as “keep it to no more than 30 words” or “never use the word ‘delve’” tends to be more effective in the system prompt than putting them in the user prompt as you would with ChatGPT.com.
I get that LLMs have a vague idea of how many words are 30 words, but they never do a good job in these tasks for me.
[0] https://softwaremisadventures.com/p/simon-willison-llm-weird...
[1] https://youtu.be/6U_Zk_PZ6Kg?feature=shared&t=56m29 (the exact moment)
[2] https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
They're pretty long and extensive, and honestly could use some cleaning up and refactoring at this point, but they are being used heavily in production and work quite well, which took a fairly extreme amount of trial-and-error to achieve.
Here's an example: what's the best prompt to use to summarize an article?
That feels like such an obvious thing, and yet I haven't even seen that being well explored.
It's actually a surprisingly deep topic. I like using tricks like "directly quote the sentences that best illustrate the overall themes" and "identify the most surprising ideas", but I'd love to see a thorough breakdown of all the tricks I haven't seen yet.
https://news.ycombinator.com/item?id=43897666
Maybe you should ask him. lol
Honestly true and I’m sick of it.
A very vocal group of people are convinced AI is a scheme by the evil capitalists to make you train your own replacement. The discussion gets very emotional very quickly because they feel personally threatened by the possibility that AI is actually useful.
I read this like you are framing this as though it is irrational. However, history is littered with examples of capitalists replacing labour with automation and using any productivity gains of new technology to push salaries lower
Of course people who see this playing out again are personally threatened. If you aren't feeling personally threatened, you are either part of the wealthy class or for some reason you think this time will be different somehow
You may be thinking "Even if I lose my job to automation, there will be other work to do like piloting the LLMs", but you should know that the goal is to eventually pay LLM operators peanuts in comparison to what you currently make in whatever role you do
Nonsense. We make far, far more than people did in the past entirely because of the productivity gains from automation.
The industrial revolution led to the biggest increase in quality of life in history, not in spite of but because it automated 90% of jobs. Without it we'd all still be subsistence farmers.
"We" were never subsistence farmers, our ancestors were
I'm talking about real changes that have happened in our actual lifetimes.
In our actual lifetimes we have watched wages stagnate for decades, pur purchasing power is dramatically lower than our parents. In order to afford even remotely close to the same standard of living as our parents had, we have to go into much larger amounts of debt
We have watched jobs move overseas as automation lowered the skill requirements so that anyone could perform them, so we sought the cheapest possible labour to do them
We have watched wealthy countries shift from higher paying production economies into lower paying service economies
We have watched the wealth gap widen as the rich get richer the poor get poorer and the middle class shrinks. Some of the shrinking middle class moved up, but most moved down
The fact is that automation is disruptive, which is great for markets but bad for people who are relying on consistency. Which is most people
This graph is going up.
https://fred.stlouisfed.org/series/MEPAINUSA672N
The reason you believe otherwise is that people on social media think they're only allowed to say untrue negative things about the economy, because if they ever say anything positive it'd be disrespectful to poor people.
> We have watched wealthy countries shift from higher paying production economies into lower paying service economies
Service work is higher paying, which is why factory workers try to get their children educated enough to do it.
> We have watched the wealth gap widen as the rich get richer the poor get poorer and the middle class shrinks. Some of the shrinking middle class moved up, but most moved down
They mostly moved up. But notice this graph is positive for every group.
https://realtimeinequality.org/?id=wealth&wealthend=03012023...
You think service workers at a fast food restaurant or working the till at Walmart are higher paid than Factory workers?
> They mostly moved up. But notice this graph is positive for every group.
The reason it is positive for (almost) every group is because it isn't measuring anything meaningful
Salaries may have nominally gone up but this is clearly not weighing the cost of living into the equation
Fast food workers aren't service-economy workers, they're making burgers back there.
More importantly, factory work destroys your body and email jobs don't, so whether or not it's high earning at the start… it isn't forever.
> Salaries may have nominally gone up but this is clearly not weighing the cost of living into the equation
That's not a salary chart. The income chart I did post is adjusted for cost of living (that's what "real" means).
Also see https://www.epi.org/blog/wage-growth-since-1979-has-not-been...
Also, “this time it’s different” depends on the framing. A cynical programmer who has seen new programming tools hyped too many times would make a different argument: At the beginning of the dot-com era, you could get a job writing HTML pages. That’s been automated away, so you need more skill now. It hasn’t resulting in fewer software-engineering jobs so far.
But that’s not entirely convincing either. Predicting the future is difficult. Sometimes the future is different. Making someone else’s scenario sound foolish won’t actually rule anything out.
I'm earning a salary so I can have a nice life
Anything that threatens my ability to earn a salary so I can have a nice life is absolutely my fucking enemy
I have no interest in automating myself out of existence, and I am deeply resentful of the fact that so many people are gleefully trying to do so.
And of course it will only become more powerful. It's a dangerous game.
Are not mutually exclusive. LLMs will train people's replacements while same people pay for the privilege of training those replacement. LLMs also allows me to auto-complete a huge volume of boilerplate, which would take me several hours. It also helps people step out of writer's block, generate a first draft of prototype/mvp/poc etc quickly without wasting long hours bike shedding. It also helps my previously super confident cousin, who blamed me for killing his dream of next AirBnB for dogs, Uber for groceries, instagram for cats not materializing due to me being selfish hoarding my privileges and knowledge, to finally create those ideas and kill his own dreams and definitely ignoring/avoiding me these days.
LLMs are same as knives, crimes will happen, but also necessary in the kitchen and industries.
Although pandas is the standard for manipulating tabular data in Python and has been around since 2008, I’ve been using the relatively new polars library exclusively, and I’ve noticed that LLMs tend to hallucinate polars functions as if they were pandas functions which requires documentation deep dives to confirm which became annoying.
The post does later touch on coding agents (Max doesn't use them because "they're distracting", which, as a person who can't even stand autocomplete, is a position I'm sympathetic to), but still: coding agents solve the core problem he just described. "Raw" LLMs set loose on coding tasks throwing code onto a blank page hallucinate stuff. But agenty LLM configurations aren't just the LLM; they're also code that structures the LLM interactions. When the LLM behind a coding agent hallucinates a function, the program doesn't compile, the agent notices it, and the LLM iterates. You don't even notice it's happening unless you're watching very carefully.
I see people getting LLMs to generate code in isolation and like pasting it into a text editor and trying it, and then getting frustrated, and it's like, that's not how you're supposed to be doing it anymore. That's 2024 praxis.
Easy to to script/autogenerate code and build out pipelines this way
Mostly joke, but also not joke: https://news.smol.ai/
Similarly Claude 3.5 was stuck on TensorRT 8, and not even pointing it at the documentation for the updated 10 APIs for RAG could ever get it to correctly use the new APIs (not that they were very complex; bind tensors, execute, retrieve results). The whole concept of the self-reinforcing Agent loop is more of a fantasy. I think someone else likened it to a lawnmower that will run rampage over your flower bed at the first hiccup.
I don't know about "self-reinforcing". I'm just saying: coding agents compile and lint the code they're running, and when they hallucinate interfaces, they notice. The same way any developer who has ever used ChatGPT knows that you can paste most errors into the web page and it will often (maybe even usually) come up with an apposite fix. I don't understand how anybody expects to convince LLM users this doesn't work; it obviously does work.
This is really one of the hugest divides I've seen in the discourse about this: anti-LLM people saying very obviously untrue things, which is uh, kind of hilarious in a meta way.
https://bsky.app/profile/caseynewton.bsky.social/post/3lo4td... is an instance of this from a few days ago.
I am still trying to sort out why experiences are so divergent. I've had much more positive LLM experiences while coding than many other people seem to, even as someone who's deeply skeptical of what's being promised about them. I don't know how to reconcile the two.
That might sound snarky, but it probably works out for people in 99% of cases. AI and LLMs are advancing at a pace that is so different from any other technology that people aren't yet trained to re-evaluate their assumptions at the high rate necessary to form accurate new opinions. There are too many tools coming (and going, to be fair).
HN (and certain parts of other social media) is a bubble of early adopters. We're on the front lines seeing the war in realtime and shaking our heads at what's being reported in the papers back home.
But at the same time, the pace of advancement is very fast, and so not having recently re-evaluated things is significantly more likely while also being more charitable, I think.
If someone is embracing uncertainty or expending the time/energy/money to reevaluate then they don't post such confidently wrong ideas on social media.
As with many topics, I feel like you can divide people in a couple of groups. You have people who try it, have their mind blown by it, so they over-hype it. Then the polar-opposite, people who are overly dismissive and cement themselves into a really defensive position. Both groups are relatively annoying, inaccurate, and too extremist. Then another group of people might try it out, find some value, integrate it somewhat and maybe got a little productivity-boost and moves on with their day. Then a bunch of other groupings in-between.
Problem is that the people in the middle tend to not make a lot of noise about it, and the extremists (on both ends) tend to be very vocal about their preference, in their ways. So you end up perceiving something as very polarizing. There are many accurate and true drawbacks with LLMs as well, but it also ends up poisoning the entire concept/conversation/ecosystem for some people, and they tend to be noisy as well.
Then the whole experience depends a lot on your setup, how you use it, what you expect, what you've learned and so many much more, and some folks are very quick to judge a whole ecosystem without giving parts of it an honest try. It took me a long time to try Aider, Cursor and others, and even now after I've tried them out, I feel like there are probably better ways to use this new category of tooling we have available.
In the end I think reality is a bit less black/white for most folks, common sentiment I see and hear is that LLMs are probably not hellfire ending humanity nor is it digital-Jesus coming to save us all.
This is probably a big chunk of it. I was pretty anti-LLM until recently, when I joked that I wanted to become an informed hater, so I spent some more time with things. It's put me significantly more in the middle than either extremely pro or extremely anti. It's also hard to talk about anything that's not purely anti in the spaces I seemingly run in, so that also contributes to my relative quiet about it. I'm sure others are in a similar boat.
> for most folks, common sentiment I see and hear is that LLMs are probably not hellfire ending humanity nor is it digital-Jesus coming to save us all.
Especially around non-programmers, this is the vibe I get as well. They also tend to see the inaccuracies as much less significant than programmers seem to, that is, they assume they're checking the output already, or see it as a starting point, or that humans also make mistakes, and so don't get so immediately "this is useless" about it.
Bro I've been using LLMs for search since before it even had search capabilities...
"LLMs not being for search" has been an argument from the naysayers for a while now, but very often when I use an LLM I am looking for the answer to something - if that isn't [information] search, then what is?
Whether they hallucinate or outright bullshit sometimes is immaterial. For many information retrieval tasks they are infinitely better than Google and have been since GPT3.
> https://bsky.app/profile/caseynewton.bsky.social/post/3lo4td... is an instance of this from a few days ago.
Not sure why this is so surprising? ChatGPT search was only released in November last year, was a different mode, and it sucked. Search in o3 and o4-mini came out like three weeks ago. Otherwise you were using completely different products from Perplexity or Kagi, which aren't widespread yet.
Casey Newton even half acknowledges that timing ("But it has had integrated web search since last year"...even while in the next comment criticising criticisms using the things "you half-remember from when ChatGPT launched in 2022").
If you give the original poster the benefit of the doubt, you can sort of see what they're saying, too. An LLM, on its own, is not a search engine and can not scan the web for information. The information encoded in them might be ok, but is not complete, and does not encompass the full body of the published human thought it was trained on. Trusting an offline LLM with an informational search is sometimes a really bad idea ("who are all the presidents that did X").
The fact that they're incorrect when they say that LLM's can't trigger search doesn't seem that "hilarious" to me, at least. The OP post maybe should have been less strident, but it also seems like a really bad idea to gatekeep anybody wanting to weigh in on something if their knowledge of product roadmaps is more than six months out of date (which I guarantee is all of us for at least some subject we are invested in).
It is entirely possible that I simply got involved at a particular moment that was crazy lucky: it's only been a couple of weeks. I don't closely keep up with when things are released, I had just asked ChatGPT something where it did a web search, and then immediately read a "it cannot do search" claim right after.
> An LLM, on its own, is not a search engine and can not scan the web for information.
In a narrow sense, this is true, but that's not the claim: the claim is "You cannot use it as a search engine, or as a substitute for searching." That is pretty demonstrably incorrect, given that many people use it as such.
> Trusting an offline LLM with an informational search is sometimes a really bad idea ("who are all the presidents that did X").
I fully agree with this, but it's also the case with search engines. They also do not always "encompass the fully body of the published human thought" either, or always provide answers that are comprehensible.
I recently was looking for examples of accomplishing things with a certain software architecture. I did a bunch of searches, which led me to a bunch of StackOverflow and blog posts. Virtually all of those posts gave vague examples which did not really answer my question with anything other than platitudes. I decided to ask ChatGPT about it instead. It was able to not only answer my question in depth, but provide specific examples, tailored to my questions, which the previous hours of reading search results had not afforded me. I was further able to interrogate it about various tradeoffs. It was legitimately more useful than a search engine.
Of course, sometimes it is not that good, and a web search wins. That's fine too. But suggesting that it's never useful for a task is just contrary to my actual experience.
> The fact that they're incorrect when they say that LLM's can't trigger search doesn't seem that "hilarious" to me, at least.
It's not them, it's the overall state of the discourse. I find it ironic that the fallibility of LLMs is used to suggest they're worthless compared to a human, when humans are also fallible. OP did not directly say this, but others often do, and it's the combination that's amusing to me.
It's also frustrating to me, because it feels impossible to have reasonable discussions about this topic. It's full of enthusiastic cheerleaders that misrepresent what these things can do, and enthusiastic haters that misrepresent what these things can do. My own feelings are all over the map here, but it feels impossible to have reasonable discussions about it due to the polarization, and I find that frustrating.
tptacek shifted the goal posts from "correct a hallucination" to "solve a copy pasted error" (very different things!) and just a comment later theres someone assassinating me as an "anti-LLM person" saying "very obviously untrue things", "kind of hilarious". And you call yourself "charitable". It's a joke.
> theres someone assassinating me as an "anti-LLM person"
Is this not true? That's the vibe the comment gives off. I'm happy to not say that in the future if that's not correct, and if so, additionally, I apologize.
I myself was pretty anti-LLM until the last month or so. My opinions have shifted recently, and I've been trying to sort through my feelings about it. I'm not entirely enthusiastically pro, and have some pretty big reservations myself, but I'm more in the middle than where I was previously, which was firmly anti.
> "very obviously untrue things"
At the time I saw the post, I had just tabbed away from a ChatGPT session where it had relied on searching the web for some info, so the contrast was very stark.
> "kind of hilarious"
I do think that when people say that LLMs occasionally hallucinate things, and are therefore worthless, when others make false claims about them for the purpose of suggesting we shouldn't use them, that it is kind of funny. You didn't directly say this in your post, only handwaved towards it, but I'm talking about the discourse in general, not you specifically.
> And you call yourself "charitable"
I am trying to be charitable. A lot of people reached for some variant of "this person is stupid," and I do not think that's the case, or the good way to understand what people mean when they say things. A mistake is a mistake. I am actively not trying to simply dismiss arguments on either side of here, but take them seriously.
I googled "how do I use ai with VS: Code" and it pointed me at Cline. I've then swapped between their various backends, and just played around with it. I'm still far too new to this to have strong options about LLM/agent pairs, or even largely between which LLMs, other than "the free ChatGPT agent was far worse than the $20/month one at the task I threw it at." As in, choosing worse algorithms that are less idiomatic for the exact same task.
Finally, if it really were really true that some people know the special sauce of how to use LLMs to make a massive difference in productivity but many people didn't know how to do that then you could make millions or tens of millions per year as a consultant training everyone at big companies. In other words if you really believed what you were saying you should pick up the money on the ground.
This might be a good explanation for the disconnect!
> I would point that LLM boosters have been saying the same thing
I certainly 100% agree that lots of LLM boosters are way over-selling what they can accomplish as well.
> In other words if you really believed what you were saying you should pick up the money on the ground.
I mean, I'm doing that in the sense that I am using them. I also am not saying that I "know the special sauce of how to use LLMs to make a massive difference in productivity," but what I will say is, my productivity is genuinely higher with LLM assistance than without. I don't necessarily believe that means it's replicable, one of the things I'm curious about it "is it something special about my setup or what I'm doing or the technologies I'm using or anything else that makes me have a good time with this stuff when other smart people seem to only have a bad time?" Because I don't think that the detractors are just lying. But there is a clear disconnect, and I don't know why.
The tools will get better, but what I see happening with people who are good at using them (and from my own code, even in my degraded LLM usage), we have an existence proof of the value of the tools.
This reminds me of a scene from the recent animation movie "Wallace and Gromit: Vengeance Most Fowl" where Wallace actually uses a robot (Norbot) to do gardening tasks, and rampages over Gromit's flower bed.
I’m getting massive productivity gains with Cursor and Gemini 2.5 or Claude 3.7.
One-shotting whole features into my rust codebase.
And when that happens I review the code and if it is bad then I "git revert". And if it is 90% of the way there I fix it up and move on.
The question shouldn't be "are they infallible tools of perfection". It should be "do I get value equal to or greater than the time/money I spend". And if you use git appropriately you lose at most five minutes on a agent looping. And that happens a couple of times a week.
And be honest with yourself, is getting stuck in a loop fighting a compiler, type-checker or lint something you have ever experienced in your pre-LLM days?
The issue you are addressing refers specifically to Python, which is not compiled... Are you referring to this workflow in another language, or by "compile" do you mean something else, such as using static checkers or tests?
Also, what tooling do you use to implement this workflow? Cursor, aider, something else?
In the context we are talking about (hallucinating Polars methods), if I'm not mistaken the compilation step won't catch that, Python will actually throw the error at runtime post-compilation.
So my question still stands on what OP means by "won't compile".
Python is AOT compiled to bytecode, but if a compiled version of a module is not available when needed it will be compiled and the compiled version saved for next use. In the normal usage pattern, this is mostly invisible to the user except in first vs. subsequent run startup speed, unless you check the file system and see all the .pyc compilation artifacts.
You can do AOT compilation to bytecode outside of a compile-as-needed-then-execute cycle, but there is rarely a good reason to do so explicitly for the average user (the main use case is on package installation, but that's usually handled by package manager settings).
But, relevant to the specific issue here, (edit: calling) a hallucinated function would lead to a runtime failure not a compilation failure, since function calls aren't resolved at compile time, but by lookup by name at runtime.
(Edit: A sibling comment points out that importing a hallucinated function would cause a compilation failure, and that's a good point.)
It's not a compilation error but it does feel like one, somewhat. It happens at more or less the same time.
> On paper, coding agents should be able to address my complaints with LLM-generated code reliability since it inherently double-checks itself and it’s able to incorporate the context of an entire code project. However, I have also heard the horror stories of people spending hundreds of dollars by accident and not get anything that solves their coding problems. There’s a fine line between experimenting with code generation and gambling with code generation.
Specifically I was researching a lesser known kafka-mqtt connector: https://docs.lenses.io/latest/connectors/kafka-connectors/si..., and o1 was hallucinating the configuration needed to support dynamic topics. The docs said one thing, and I even mentioned it to o1 that the docs contradicted with it. But it would stick to its guns. If I mentioned that the code wouldn't compile it would start suggesting very implausible scenarios -- did you spell this correctly? Responses like that indicate you've reached a dead end. I'm curious how/if the "structured LLM interactions" you mention overcome this.
It's nice when the LLM outputs bullshit, which is frequent.
A stage every developer goes through early in their development.
It sucks, but the trick is to always restart the conversations/chat with a new message. I never go beyond one reply, and also copy-paste a bunch. Got tired of copy-pasting, wrote something like a prompting manager (https://github.com/victorb/prompta) to make it easier, and not having to neatly format code blocks and so on.
Basically make one message, if they get the reply wrong, iterate on the prompt itself and start fresh, always. Don't try to correct by adding another message, but update initial prompt to make it clearer/steer more.
But I've noticed that every model degrades really quickly past the initial reply, no matter what length of each individual message. The companies seem to continue to increase the theoretical and practical context limits, but the quality degrades a lot faster even within the context limits, and they don't seem to try to address that (nor have a way of measuring it).
I have to chuckle at that because it reminds me of a typical response on technical forums long before LLMs were invented.
Maybe the LLM has actually learned from those responses and is imitating them.
The more prominent and widely used a language/library/framework, and the more "common" what you are attempting, the more accurate LLMs tends to be. The more you deviate from mainstream paths, the more you will hit such problems.
Which is why I find them them most useful to help me build things when I am very familiar with the subject matter, because at that point I can quickly spot misconceptions, errors, bugs, etc.
It's when it hits the sweet spot of being a productivity tool, really improving the speed with which I write code (and sometimes improving the quality of what I write, for sometimes incorporating good practices I was unaware of).
One very interesting variant of this: I've been experimenting with LLMs in a react-router based project. There's an interesting development history where there's another project called Remix, and later versions of react-router effectively ate it, that is, in December of last year, react-router 7 is effectively also Remix v3 https://remix.run/blog/merging-remix-and-react-router
Sometimes, the LLM will be like "oh, I didn't realize you were using remix" and start importing from it, when I in fact want the same imports, but from react-router.
All of this happened so recently, it doesn't surprise me that it's a bit wonky at this, but it's also kind of amusing.
For example, I don't like ORMs. There are reasons which aren't super important but I tend to prefer SQL directly or a simple query builder pattern. But I did a chain of messages with LLMs asking which would be better for LLM based development. The LLM made a compelling case as to why an ORM with a schema that generated a typed client would be better if I expected LLM coding agents to write a significant amount of the business logic that accessed the DB.
My dislike of ORMs is something I hold lightly. If I was writing 100% of the code myself then I would have breezed past that decision. But with the agentic code assistants as my partners, I can make decisions that make their job easier from their point of view.
I think if you are going to claim you have an opinion based on experience you should probably, at the least, experience the thing you are trying to state your opinion on. It's probably not enough to imagine the experience you would have and then go with that.
Part of the reason I've been blogging about LLMs for so long is that a lot of it is counterintuitive (which I find interesting!) and there's a lot of misinformation and suboptimal workflows that results from it.
One example: “normal-person frontends” immediately makes the statement a judgement about people. You could have said regular, typical, or normal instead of “normal-person”.
Saying your coworkers often come to you to fix problems and your solutions almost always work can come off as saying you’re more intelligent than your coworkers.
The only context your readers have are the words you write. This makes communication a damned nuisance because nobody knows who you are and they only know about you from what they read.
Most experienced LLM users already know about temperature controls and API access - that's not some secret knowledge. Many use both the public vanilla frontends and specialized interfaces (various HF workflows, custom setups, sillytavern, oobabooga (̵r̵i̵p̵)̵, ollama, lmstudio, etc) depending on the task.
Your dismissal of LLMs for writing comes across as someone who scratched the surface and gave up. There's an entire ecosystem of techniques for effectively using LLMs to assist writing without replacing it - from ideation to restructuring to getting unstuck on specific sections.
Throughout the article, you seem to dismiss tools and approaches after only minimal exploration. The depth and nuance that would be evident to anyone who's been integrating these tools into their workflow for the past couple years is missing.
Being honest about your experiences is valuable, but framing basic observations as contrarian insights isn't counterintuitive - it's just incomplete.
Why (rip) here?
This is unfortunate, though I don't blame you. Tech shouldn't be about blind faith in any particular orthodoxy.
It feels weird to write something positive here...given the context...but this is a great idea. ;)
The other thing I find LLMs most useful for is work that is simply unbearably tedious. Literature reviews are the perfect example of this - Sure, I could go read 30-50 journal articles, some of which are relevant, and form an opinion. But my confidence level in letting the AI do it in 90 seconds is reasonable-ish (~60%+) and 60% confidence in 90 seconds is infinitely better than 0% confidence because I just didn't bother.
A lot of the other highly hyped uses for LLMs I personally don't find that compelling - my favorite uses are mostly like a notebook that actually talks back, like the Young Lady's Illustrated Primer from Diamond Age.
So you got the 30 to 50 articles summarized by the LLM, now how do you know what 60% you can trust and what’s hallucinated without reading it? It’s hard to be usable at all unless you already do know what is real and what is not.
That's changed for me in the past couple of months. I've been using the ChatGPT interface to o3 and o4-mini for a bunch of code questions against more recent libraries and finding that they're surprisingly good at using their search tool to look up new details. Best version of that so far:
"This code needs to be upgraded to the new recommended JavaScript library from Google. Figure out what that is and then look up enough documentation to port this code to it."
This actually worked! https://simonwillison.net/2025/Apr/21/ai-assisted-search/#la...
The other trick I've been using a lot is pasting the documentation or even the entire codebase of a new library directly into a long context model as part of my prompt. This works great for any library under about 50,000 tokens total - more than that and you usually have to manually select the most relevant pieces, though Gemini 2.5 Pro can crunch through hundreds of thousands of tokens pretty well with getting distracted.
Here's an example of that from yesterday: https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
So it is capable of integrating new API usage, it just isn't a part of the default "memory" of the LLM. Given how quickly JS libraries tend to change (even on the API side) that isn't ideal. And given that the typical JS server project has dozens of libs, including the most recent documentation for each is not really feasible. So for now, I am just looking out for runtime deprecation errors.
But I give the LLM some slack here, because even if I was programming myself using an library I've used in the past, I'm likely to make the same mistake.
How do you do this? Do you have to be on a paid plan for this?
This is independent of ChatGPT+. You do need to have a credit card attached but you only pay for your usage.
What resonated most was the distinction between knowing when to force the square peg through the round hole vs. when precision matters. I've found LLMs incredibly useful for generating regex (who hasn't?) and solving specific coding problems with unusual constraints, but nearly useless for my data visualization work.
The part about using Claude to generate simulated HN criticism of drafts is brilliant - getting perspective without the usual "this is amazing!" LLM nonsense. That's the kind of creative tool use that actually leverages what these models are good at.
I'm skeptical about the author's optimism regarding open-source models though. While Qwen3 and DeepSeek are impressive, the infrastructure costs for running these at scale remain prohibitive for most use cases. The economics still don't work.
What's refreshing is how the author avoids both the "AGI will replace us all" hysteria and the "LLMs are useless toys" dismissiveness. They're just tools - sometimes useful, sometimes not, always imperfect.
Over few years, we went from literally impossible to being able to run a 72B model locally on a laptop. Give it 5-10 years and we might not need to have any infrastructure at all, all served locally with switchable (and different sized) open source models.
>Baited into clicking
>Article about generative LLMs
>It's a buzzfeed employee
llm -m claude-3.7-sonnet "prompt"
llm logs -c | pbcopy
Then paste into a Gist. Gets me things like this: https://gist.github.com/simonw/0a5337d1de7f77b36d488fdd7651b...
rfonseca•4h ago
meowzero•4h ago
vunderba•3h ago
I then feed this into an LLM with the following prompt:
It's basically Grammarly on steroids and works very well.kixiQu•3h ago