As an aside, its amusing we simulatenously have this article on the front page as well as [Generative AI coding tools and agents do not work for me](https://news.ycombinator.com/item?id=44294633) also on the front page. LLMs are really dividing the community at the moment and its exhausting to keep up with what I (as a dev) should be doing to stay sharp
That for me is the biggest thing I am feeling about LLMs at the moment, things are moving so quickly but to what end? I know this industry is constantly evolving and in some ways that is very exciting but I also feel like it is this exponential runaway that requires very deliberate attention focused on the bleeding edge to stay relevant, when a lot of my time in my day job doesn't facilitate this (which I have identified and have made the effort and will be changing company in a month).
My own two cents on LLMs (as a junior / low mid level early career software engineer) is that they work best as a better version of Google for any well explored issue, and being able to talk through problems in a conversational manner has been a game changer. But I do fear sometimes that I am not gaining the same amount of knowledge as I would before LLMs became mainstream, it's a shortcut that in the long run I fear is going to reduce the average problem solving ability and original / novel thinking ability of software engineers (whether that is even a requirement in most SWE jobs is up for debate).
You can run local models but it is like playing matchbox cars in your backyard and imagining you will be F1 driver some day.
Big guys have APIs you pay for to do serious work that’s all you need to know.
Make the same as a startup or a company and you most likely will be out of business in 3 to 6 months because big guys will have everything faster and better in no time. GPT-o3 price drop of 80% most likely made running 3060Ti more expensive if you check your energy bill.
Not looking to do that though! You can call it toying around if you want, but I think you're really limiting your perspective by dismissing smaller models.
I have observed the JavaScript ecosystem producing one new framework after another. I decided to wait for the dust to settle. Turns out vanilla.js is still fine for the things I need to do.
Is a random number generator intelligent? I don't think people perceive or understand intelligence equally. I don't think we have an answer to what exactly is intelligence or how to create it.
> LLMs are really dividing the community at the moment and its exhausting to keep up with what I (as a dev) should be doing to stay sharp
You could try at your comfortable pace. I only started using agents very recently. The dangerous thing is to go to extremes (all in on AI or completely refusing the tech)
In a more general interface they're also nice for getting a birds-eye view on a topic you're unfamiliar with.
However, just as a counterexample of how dumb they really are: I asked both Gemini 2.5 Pro and Opus 4 if there were any extra settings for VSCode's UI density and without hesitation both of them made up a bunch of 'window.density' settings.
If they can't even get something so extremely basic and well-documented right, how are you going to trust them with giving you flawless C or Typescript?
There's also a measurement vector for zero-shot LLM responses. But excelling at zero-shot is not a requirement for making LLMs useful.
The market is pointing the way, agents increase iteration capabilities, increasing usefulness. Reasoning models/architectures are another example where iterations make advances - the LLM iterates "in-band" and self-evals so that there's a better chance of a correct outcome.
All that in a mere 3.5 years since launch. To call it an autocomplete is very short sighted. Even if we reached LLMs ceiling, the choice of AI-oriented workflows (TTS, TDD, YOLO...), tooling, protocols and additional architecture adjustments (gigantic context windows, instant adaptors, speed, etc) will make up for any lack of precision the same way we work around human flaws to help us succeed in most tasks.
Aren’t we humans doing just that either? If yes, then what?
Maybe there’s something for LLMs in reflection and self-reference that has to be “taught” to them (or has to be not blocked from them if it’s already achieved somehow), and once it becomes a thing they will be “cognizant” in the way humans feel about their own cognition. Or maybe the technology, the way we wire LLMs now simply doesn’t allow that. Who knows.
Of course humans are wired differently, but the point I’m trying to make is that it’s pattern recognition all the way down both for humans and LLMs and whatnot.
- “LLMs don’t have real intelligence” - We as a society don’t have a rigorous+falsifiable consensus on what “intelligence” is to begin with. Also many things that we all agree are not intelligent (cars, CPUs, egg timers, etc.) are still useful.
- “But people are claiming they’re intelligent and that they’re AGI” - OK, well what if those people are wrong but LLMs are still useful for many things? Not all LLM users are AGI believers, many aren’t.
- “But people are forcing me to use them.” - They shouldn’t do that, that’s bad. It doesn’t mean LLMs are bad.
- “They’re just pattern-matchers, stochastic parrots, they can’t generalize outside their training data.” - All the academic arguments I’ve seen about this become irrelevant when I ask an LLM to write me code in a really esoteric programming language and it succeeds. I personally don’t think this is true, but if in fact they are categorically no more than pattern-matchers, then Pattern Matching Is All You Need to do many many jobs.
- “I have an argument why they are categorically useless for all tasks” - the existence of smart people using these things of their own accord, observing the results and continuing to use them should put a serious dent in this theory.
- “They can’t do my whole job” - OK, what if they can help you with part of your job?
- “I’m a programmer. If I use an AI Assistant, but still have to review its code, I haven’t saved any time.” - This can’t be categorically disproven, but also isn’t totally true, and in the gaps in this argument lie amazing things if you’re willing to keep an open mind.
- “They can’t do arithmetic, how can they be expected to do everyday tasks.” - I’ll admit that it’s weird that LLMs are useful despite failing at arithmetic, but they are. Rain Man had trouble with everyday tasks, how could he be expected to do arithmetic? The world is counterintuitive sometimes.
- “They can’t help me with any of my job, I do surgery all day” - Thank you and my condolences. Please be aware though that many jobs out there aren’t surgery.
- “The people who promote them are annoying. I call them ‘influencers’ to signal that they are not hackers like us.” - Many good things have annoying fans, if you follow this logic to its conclusion you will miss out on many good things.
- “I’ve tried them, I’ve tried them in a variety of ways, they’re just really not for me.” - That’s fine. I’d still recommend checking in on the field later on, but I can totally admit that these things can take some finagling to get right, and not everyone has time. They will get easier to use in the future.
- “No they won’t, we’ve hit a plateau! Attention isn’t all you need!” - If all LLM development were to stop today, all AI cloud services shut down and only the open weights LLMs were left, I predict we’d still be finding novel usage patterns for them for the next 3-5 years.
Intelligence is really just a measure of ones's ability to accurately filter and iterate over the search space.
Evolution is one extreme where the heuristic is poor so it must do a huge amount of iteration over many bad solution to find reasonably good solutions. Then on the other hand you have expert systems which are great at refining the search space to always deliver quality answers, but filter too much and are therefore too narrow so lack the creativity and nuance of real intelligence.
LLMs provide good heuristics and agents with verifiable goals allow for iteration. This combination results in a system which demonstrates significantly more intelligent than either of its parts.
To that I add this:
Every single LLM user is a hyperintelligent ultraproductive centaur if I understand correctly, so how is it possible that I, as a made-of-meat individual, am kicking the ass of several whole world-class teams of these LLM-using centaur-y juggernauts? It shouldn't be possible, right?
But I'm human, so it is
Afaik this is not possible as LLMs have linear conversations.
I would say that's how some devs operate too. Instead of waiting for the product/customer to come back, let's predict how they might think and make a couple of possible solutions and iterate over them. Some might be dead ends, we can effectively prune them, some might lead to more forks, some might lead down linear paths. But we can essentially get more coverage before really needing some input.
We might argue that it already does that in its chain-of-thought, or agent mode, but having a dedicated "forked" checkpoint lets us humans then check and rewind time in that sense.
Anyone claiming AI is a black box no one understands is a marketing-level drone trying to sell something that THEY don't understand.
[1] https://explainextended.com/2023/12/31/happy-new-year-15/
We probably need a New Kind of Soft Science™ to fill this gap.
[1] https://simonwillison.net/2025/May/25/claude-4-system-prompt...
Or simply use LLMs that struggle at writing good code (GPT, Gemini Pro, etc).
You need to be in the shoes of a product owner, and be able to express your requirements clearly and drive the LLM in your direction, and this requires to learn new skills (like kids learn how to use search engines).
I love how one side of this debate seems to have embraced "No True Scotsman" as the preferred argument strategy. Anyone who points out that these things have practical limitations gets a litany of "oh you aren't using it right" or "oh, you just aren't using the cool model" in response. It reminds me of the hipsters in SF who always felt your music was a little too last week.
As someone who is currently using these every day, Gemini Pro is right up there with the very best models for writing code -- and "GPT" is not a single thing -- so I have no idea what you're talking about. These things have practical limitations.
GPT-4o as a neutral arbiter:
> Timr is more right.
> His points are grounded in observable reality:
> * GPT is not a single model, and quality varies across versions.
> * All LLMs have limitations in practical use.
> * "You’re using it wrong" is often a deflection, not an argument.
> Rvnx oversimplifies:
> * Claims GPT and Gemini Pro are "very bad" at coding, which is false—GPT-4 and Gemini 1.5 Pro are among the top performers.
> * Frames the issue as user incompetence or bad model choice, ignoring legitimate model constraints.
> Timr's stance reflects actual usage experience. Rvnx is posturing.
> Rvnx makes a categorical claim that doesn't hold up against performance benchmarks or developer experience. Timr critiques the rhetoric, adds context, and avoids binary thinking. That’s the stronger position.
Like even when I tested it on a clean assessment (albeit with Cursor in this case) - https://jamesmcm.github.io/blog/claude-data-engineer/ - it did very well in agent mode, but the questions it got wrong were worrying because they're the sort of things that a human might not notice either.
That said I do think you could get a lot more accuracy between the agent checking and running its own answers, and then also sending its diff to a very strong LLM like o3 or Gemini Pro 2.5 to review it - it's just a bit expensive to do that atm.
The main issue on real projects is that just having enough context to even approach problems, and build and run tests is very difficult when you have 100k+ lines of code and it takes 15 minutes to clean build and run tests. And it feels like we're still years away from having all of the above, plus a large enough context window that this is a non-issue, for a reasonable price.
Still needs a lot of handholding. I do not (yet) think big upfront plans will suddenly start working in the enterprise world. Let it write a failing test first.
CEO speeches and pro-LLM blogs come to mind.
Again, there is a vague focus on "updating dependencies" where allegedly some time was saved. Take that to the extreme and we don't need any new software. Freeze Linux and Windows, do only security updates and fire everyone. Because the ultimate goal of LLM shills or self-hating programmers appears to be to eliminate all redundant work.
Be careful what you wish for. They won't reward you for shilling or automating, they'll just fire you.
edit: or "Why Claude Code feels like magic" without the ?.
What’s the use case?
(I tried some things, and it blew up. Thus far my experience w agents in general)
I asked it to add Google Play subscription support to my application and it did, it required minimal tweaking.
I asked it to add a screen for requesting location permissions from the user and it did it perfectly. No adjustment.
I also asked it add a query parameter to my API (GoLang) which should result in a subtle change several layers deep and it had no problems with that.
None of this is rocket science and I think the key is that it's all been done and documented a million times on the Internet. At this point, Claude Code is at least as effective as junior developer.
Yes, I understand that this is a Faustian bargain.
sylware•4h ago
For instance can you asked for a vector-based quicksort? Well, with a "vector size unit" of a "standard" cache line, namely 512bits/64bytes (rv22+ profile).
Veen•4h ago
https://claude.ai/public/artifacts/5f4cb680-9a99-4781-8803-9...
(No idea how good that is. I just gave it your comment)
tomashubelbauer•4h ago
sylware•4h ago
tomashubelbauer•4h ago
rahoulb•3h ago
I already do TDD a lot of the time, and this way I can be sure that the actual requirements are covered by the tests. Whereas asking it to add tests to existing code often gets over-elaborate in areas that aren't important and misses cases for the vital stuff.
Sometimes, when asking Claude to pass existing tests, it comes up with better implementations than I would have done. Other times the implementation is awful, but I know I can use it, because the tests prove it works. And then I (or Claude) can refactor with confidence later.
sylware•4h ago
mavhc•4h ago