what am I missing here?
edit: lol I "love" that I got downvoted for asking a simple question that might have an open answer. "be curious" says the rules. stay classy HN
Hype. There's nothing wrong with using, e.g., full-text search for RAG.
But realistically lots of RAG systems have LLM calls interleaved for various reasons, so what they probably mean it not doing the usual chunking + embeddings thing.
I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship
Nothing in the world is simply outright garbage. Even the seemingly worst products exist for a reason and is used for a variety of use cases.
So, take a step back and reevaluate whether your reply could have been better. Because, it simply "just sucks"
For the command line tool (claude code vs gemini code)? It isn't even close. Gemini code was useless. Claude code was mostly just slow.
I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.
My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.
This avoids pretty much all of Claude's unplanned bumbling.
That's been my experience, anyway. Maybe it hates me? I sure hate it.
It’s not consistent, though. I haven’t figured out what they are but it feels like there are circumstances where it’s more prone to doing ugly hacky things.
Either I'm worse than then at programming, to the point that I find an LLM useful and they don't, or they don't know how to use LLMs for coding.
I guess most people are not paying and cant therefore apply the project-space (which is one of the best features), which unleashes its full magic.
Even if I'm currently without a job, I'm still paying because it helps me.
(Not disagreeing, but most of these comments -- on both sides -- are pretty vague.)
And you start from the stratch all the time so you can generate all the documentation before you ever start to generate code. And when LLM slop become overwhelming you just drop it and go to check next idea.
If people are getting faster responses than this regularly, it could account for a large amount of the difference in experiences.
Despite the persistent memes here and elsewhere, it doesn't depend very much on the particular tool you use (with the exception of model choice), how you hold it, or your experience prompting (beyond a bare minimum of competence). People who jump into any conversation with "use tool X" or "you just don't understand how to prompt" are the noise floor of any conversation about AI-assisted coding. Folks might as well be talking about Santeria.
Even for projects that I initiate with LLM support, I find that the usefulness of the tool declines quickly as the codebase increases in size. The iron law of the context window rules everything.
Edit: one thing I'll add, which I only recently realized exists (perhaps stupidly) is that there is a population of people who are willing to prompt expensive LLMs dozens of times to get a single working output. This approach seems to me to be roughly equivalent to pulling the lever on a slot machine, or blindly copy-pasting from Stack Overflow, and is not what I am talking about. I am talking about the tradeoffs involved in using LLMs as an assistant for human-guided programming.
(Though now that I think of it, I might start interrupting people with “SUMMARIZING CONVERSATION HISTORY!” whenever they begin to bore me. Then I can change the subject.)
There are various hacks these tools take to cram more crap into a fixed-size bucket, but it’s still fundamentally different than how a person thinks.
Do you understand yourself what you just said? File is a way to organize data in memory of a computer by definition. When you write instructions to LLM, they persistently modify your prompts making LLM „remember“ certain stuff like coding conventions or explanations of your architectural choices.
> particularly if I have to do it
You have to communicate with LLM about the code. You either do it persistently (must remember) or contextually (should know only in context of a current session). So word „particularly“ is out of place here. You choose one way or another instead of bring able to just tell that some information is important or unimportant long-term. This communication would happen with humans too. LLMs have different interface for it, more explicit (giving the perception of more effort, when it is in fact the same; and let’s not forget that LLM is able to decide itself on whether to remember something or not).
> and in any case, it consumes context
So what? Generalization is an effective way to compress information. Because of it persistent instructions consume only a tiny fraction of context, but they reduce the need for LLM to go into full analysis of your code.
> but it’s still fundamentally different than how a person thinks.
Again, so what? Nobody can keep in short-term memory the entire code base. It should not be the expectation to have this ability neither it should not be considered a major disadvantage not to have it. Yes, we use our „context windows“ differently in a thinking process. What matters is what information we pack there and what we make of it.
I've yet had the "forgets everything" to be a limiting factor. In fact, when using Aider, I aggressively ensure it forgets everything several times per session.
To me, it's a feature, not a drawback.
I've certainly had coworkers who I've had to tell "Look, will you forget about X? That use case, while it look similar, is actually quite different in assumptions, etc. Stop invoking your experiences there!"
A lot of programmers work on maintaining huge monolith codebases, built on top of 10-years old tech using obscure proprietary dependencies. Usually they dont have most of the code to begin with and APIs are often not well documented.
I know it, because I recently learned jj, with a lot of struggling.
If a human struggles learning it, I wouldn't expect LLMs to be much better.
If I only ever wrote small Python scripts, did small to medium JavaScript front end or full stack websites, or a number of other generic tasks where LLMs are well trained I’d probably have a different opinion.
Drop into one of my non-generic Rust codebases that does something complex and I could spent hours trying to keep the LLM moving in the right direction and away from all of the dead ends and thought loops.
It really depends on what you’re using them for.
That said, there are a lot of commenters who haven’t spent more than a few hours playing with LLMs and see every LLM misstep as confirmation of their preconceived ideas that they’re entirely useless.
First of all, keep in mind that research has shown that people generally overestimate the productivity gains of LLM coding assistance. Even when using a coding assistant makes them less productive, they feel like they are more productive.
Second, yeah, experience matters, both with programming and LLM coding assistants. The better you are, the less helpful the coding assistant will be, it can take less work to just write what you want than convince an LLM to do it.
Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.
I've noticed the stronger my opinions are about how code should be written or structured, the less productive LLMs feel to me. Then I'm just fighting them at every step to do things "my way."
If I don't really have an opinion about what's going on, LLMs churning out hundreds of lines of mostly-working code is a huge boon. After all, I'd rather not spend the energy thinking through code I don't care about.
I don’t think this research is fully baked. I don’t see a story in these results that aligns with my experience and makes me think “yeah, that actually is what I’m doing”. I get that at this point I’m supposed to go “the effect is so subtle that even I don’t notice it!” But experience tells me that’s not normally how this kind of thing works.
Perhaps we’re still figuring out how to describe the positive effects of these tools or what axes we should really be measuring on, but the idea that there’s some sort of placebo effect going on here doesn’t pass muster.
Everytime I tried LLMs, I had the feeling of talking with a ignorant trying to sound VERY CLEVER: terrible mistakes at every line, surrounded with punchlines, rocket emojis and tons of bullshit. (I'm partly kidding).
Maybe there are situations where LLMs are useful e.g. if you can properly delimit and isolate your problem; but when you have to write code that is meant to mess up with the internal of some piece of software then it doesn't do well.
It would be nice to know from each part of the "happy users" and "mecontent usere" of LLMs in what context they experimented with it to be more informed on this question.
But the situation is very different if you’re coding slop in the first place (front end stuff, small repo simple code). The LLMs can churn that slop out at a rapid clip.
If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.
With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.
Error: kill EPERM
at process.kill (node:internal/process/per_thread:226:13)
at Ba2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19791)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19664
at Array.forEach (<anonymous>)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19635
at Array.forEach (<anonymous>)
at Aa2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19607)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19538
at ChildProcess.W (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:20023)
at ChildProcess.emit (node:events:519:28) {
errno: -1,
code: 'EPERM',
syscall: 'kill'
}
I'm guessing one of the scripts it runs kills Node.js processes, and that inadvertantly kills Claude as well. Or maybe it feels bad that it can't solve my problem and commits suicide.In any case, I wish it would stay alive and help me lol.
For those who’ve built coding agents: do you think LLMs are better suited for generating structured config vs. raw code?
My theory is that agents producing valid YAML/JSON schemas could be more reliable than code generation. The output is constrained, easier to validate, and when it breaks, you can actually debug it.
I keep seeing people creating apps with vibe coder tools but then get stuck when they need to modify the generated code.
Curious if others think config-based approaches are more practical for AI-assisted development.
Then add a grader step to your agentic loop that is triggered after the files are modified. Give feedback to the model if there any errors and it will fix them.
Config files should be mature programming languages, not Yaml/Json files.
Might sound crazy but we built full web apps in just yaml.. Been doing this for about 5 years now and it helps us scale to build many web apps, fast, that are easy to maintain. We at Resonancy[1] have found many benefits in doing so. I should write more about this.
[1] - https://resonancy.io
This is essential to productivity for humans and LLMs alike. The more reliable your edit/test loop, the better your results will be. It doesn't matter if it's compiling code, validating yaml, or anything else.
To your broader question. People have been trying to crack the low-code nut for ages. I don't think it's solvable. Either you make something overly restrictive, or you are inventing a very bad programming language which is doomed to fail because professional coders will never use it.
It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.
You can see the system prompts too.
It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.
That repository does not contain the code. It's just used for the issue tracker and some example hooks.
[1]: https://github.com/badlogic/lemmy/tree/main/apps/claude-brid...
I know, thus the :trollface:
> Happen to know where I can find a fork?
I don't know where you can find a fork, but even if there is a fork somewhere that's still alive, which is unlikely, it would be for a really old version of Claude Code. You would probably be better off reverse engineering the minified JavaScript or whatever that ships with the latest Claude Code.
Had a similar problems until I saw the advice "Dont say what it shouldn't but focus on what it should".
i.e. make sure when it reaches for the 'thing', it has the alternative in context.
Haven't had those problems since then.
I’m in the middle of some refactoring/bug fixing/optimization but it’s constantly running into issues, making half baked changes, not able to fix regressions etc. Still trying to figure out how to make do a better job. Might have to break it into smaller chunks or something. Been pretty frustrating couple of weeks.
If anyone has pointers, I’m all ears!!
Give programming a try, you might like it.
Next…
Why????????????
Why do you want devs to lose cognaizance of their own "work" to the point that they have "existential worry"?
Why are people like you trying to drown us all in slop? I bet you could replace your slop pile with a tenth of the lines of clean code, and chances are it'd be less work than you think.
Is it because you're lazy?
Actually, no. When LLMs produce good, working code, it also tends to be efficient (in terms of lines, etc).
May vary with language and domain, though.
It may be the size of the changes you're asking for. I tend to micromanage it. I don't know your algorithm, but if it's complex enough, I may have done 4 separate prompts - one for each step.
Let the LLM do the boring stuff, and focus on writing the fun stuff.
Also, setting up logging in Python is never fun.
That it authored in the first place?
Also if the task runs out of context it will get progressively worse rather than refresh its own context from time to time.
A few takeaways for me from this (1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc (2) Tool calling is basic af; you need more context (when to use, when not to use, etc) (3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)
LaGrange•3h ago
dang•3h ago
The idea here is: if you have a substantive point, make it thoughtfully. If not, please don't comment until you do.
https://news.ycombinator.com/newsguidelines.html
dingnuts•3h ago
Is this why HN is so dang pro-AI? the negative comments, even small ones, are moderated away? explains a lot TBH
h4ch1•2h ago
LaGrange•2h ago
Edit: bonus points if this gets me banned.
exe34•2h ago
danielbln•2h ago