GPTs and Feeling Left Behind

https://whynothugo.nl/journal/2025/08/06/gpts-and-feeling-left-behind/

71•Bogdanp•3h ago

Comments

PaulHoule•2h ago

I think what people are missing is that they work sometimes and sometimes they don't work.

People think "Oh, it works better when somebody else does it" or "There must be some model that does better than the one I am using" or "If I knew how to prompt better I'd get better results" or "There must be some other agentic IDE which is better than the one I am using."

All those things might be true but they just change the odds, they don't change the fact that it works sometimes and fails other times.

For instance I asked an agent to write me a screen to display some well-typed data. It came up with something great right away that was missing some fields and had some inconsistent formatting but it fixed all those problems when I mentioned them -- all speaking the language of product managers and end users. The code quality was just great, as good as if I wrote it, maybe better.

Plenty of times it doesn't work out like that.

I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.

Sometimes it writes something that doesn't typecheck and I tell it to run tsc and fix the errors and sometimes it does a job I am proud of and other times it adds lame typeguards like

   if (x && typeof x === "object") x.someMethod()

Give it essentially the same problem, say writing tests in Java, and it might take very different approaches. One time it will use the same dependency injection framework used in other tests to inject mocks into private fields, other times it will write some a helper method to inject the mocks into private fields with introspection directly.

You might be able to somewhat tame this randomness with better techniques but sometimes it works and sometimes it doesn't and if I just told you about the good times or just told you about the bad times it would be a very different story.

leptons•1h ago

>I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.

I have to wonder if you tried a simple google search and read through some docs if you couldn't have figured this out quicker than trying to coax a result out of the LLM.

solarkraft•1h ago

My (not GP) intuitive answer would be hell no. Typescript messages are pretty hard to google and even parse manually and the LLM suggesting multiple approaches and ways to think about the problem does seem useful. It sometimes uncovers unknown unknowns you might never find otherwise.

I have had cases in which a web search and some good old fashioned thinking have yielded better results than using an LLM, but on average I’m pretty sure the LLM has the edge.

PaulHoule•20m ago

Personally I think Stack Overflow and Google are 95% trash for that kind of problem.

The answers are in (i) the Typescript documentation and (ii) the documentation of libraries that I'm using. I could get lucky with a Google search and it could be worth trying, but I wouldn't expect it to work. Personally my preference is to have a language and libraries with great documentation (Python, Java, Typescript isn't too bad [1]) and really know that documentation like the back of my hand.

If I hadn't had the LLM I would have probably figured it out the same way doing experiments, I might have asked my other "rubber duck"

https://mastodon.social/@UP8/113935901671533690

A tactic I didn't use, which helps in "legacy" systems where I am stuck, is start a fresh project in the IDE and try either reproducing the problem or building a tiny system which is problem free.

I'm hesitant to say what speedup I got out of the "figuring out the types together with the LLM" but emotionally I felt supported and in the process I wrote a whole lot, like I was keeping track of the process in a notebook. I feel a lot times when I have good LLM conversations I wind up writing better code than I would otherwise, not necessarily write it faster -- it's like pair programming.

[1] The typescript docs are great for the typescript stuff, MDN is good for Javascript and Javascript's stdlib

calrain•2h ago

Well, it takes a while to learn Vim and then get value from it.

It also takes a while to learn using an LLM and get value from it.

The keys are how to build prompts, ways of working, and guidelines that help the AI stay focused.

You end up spending much more time guiding and coaching rather than coding, that can take a while to get used to.

Eventually though, you will master it and be able to write secure, fast code far beyond what you could have done by yourself.

Note: Also, prep yourself for incoming hate every time you make claims like that! If you write bad code, it's your fault. If your LLM writes bad code, you're a moron! hah

throwawa14223•1h ago

So you're taking an easy task, formal logic, and replacing it with a more difficult and time consuming task, babysitting a random number generator. How is that a net-positive?

calrain•52m ago

I get your position, and I don't want to sound dismissive, but when you really learn how to manage an LLM for a complex piece of software far beyond what you have time for, you see the benefits.

Try

dumbmrblah•2h ago

I really wish posts like this included the parameters that they were using. What model? What was the question? How many shots? Etc etc

You’re going to get vastly different responses if you’re using Opus versus 4o.

siscia•2h ago

Do you really?

Frontier models seems remarkably similar in performance.

Yeah some nuances for sure, but the whole article could apply to every model.

arthur-st•1h ago

4o on ChatGPT.com vs. Opus in an IDE is like cooking food without kitchen tools vs. using them. 4o is neither a coding-optimized model nor a reasoning model in general.

dnh44•1h ago

You're not pushing them hard enough if you're not seeing a vast difference between 4o and Opus. Or possibly they're equivalent in the field you're working in but I suspect it's the former.

1over137•1h ago

I’d like to know which programming language.

dnh44•4m ago

I'm of the same opinion as the op and I'm programming in Rust(api) and Swift(client) right now.

I entered a vibe coded game hackathon a few months back and in a little over a week I was at 25k lines of code across both the client and server. It all worked (multiplayer) even though the game sucked.

storus•2h ago

The worst thing is when LLMs introduce subtle bugs into code and one just can't spot them quickly. I was recently doing some Langfuse integration and used Cursor to generate skeleton code for pushing some traces/scores quickly. The generated code included one parameter "score_id" that was undocumented in Langfuse but somehow was accepted and messed the whole tracking up. Even after multiple passes of debugging I couldn't figure out what the issue with tracking was, until I asked another LLM to find any possible issues with the code, that promptly marked those score_id lines.

siscia•2h ago

I find myself on both sides actually.

I did have some great luck producing quite useful and impactful code. But also lost time chasing tiny changes.

jackdawed•2h ago

One blogpost I found on HN completely leveled up how I use LLMs for coding: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

Having the AI ask me questions and think about the PRD/spec ultimately made me a better system designer.

8organicbits•1h ago

> This is working well NOW, it will probably not work in 2 weeks, or it will work twice as well. ¯\_(ツ)_/¯

This all feels like spinning the roulette wheel. I sometimes wonder if AI proponents are just gamblers who had the unfortunate luck of winning the first few prompts.

dbalatero•1h ago

I've seen comparisons to gambling before (activating reward centers, sometimes it pays out big, etc), but couldn't find the article when I searched.

PessimalDecimal•29m ago

A comparison I've seen isn't to roulette but to a slot machine. Anthropic itself encourages its employees to treat its use for refactors as a slot machine. [1]

It seems like an idea worth exploring formally but I haven't see that done anywhere. Is this a case of "perception of winning" while one is actually losing? Or it it that the winning is in aggregate and people who like LLM-based coding are just more tolerant of the volatility to get there?

The only study I've seen testing the actual observable impact on velocity showed a modest decrease in output for experienced engineers who were using LLMs for coding.

[1] https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

avalys•2h ago

I have a degree in CS from MIT and did professional software engineering from 2004 - 2020.

I recently started a company in another field and haven’t done any real development for about 4 years.

Earlier this summer I took a vacation and decided to start a small software hobby project specific to my industry. I decided to try out Cursor for the first time.

I found it incredibly helpful at saving time implementing all the bullshit involved in starting a new code base - setting up a build system, looking up libraries and APIs, implementing a framework for configuration and I/O, etc.

Yes, I still had to do some of the hard parts myself, and (probably most relevant) I still had to understand the code it was writing and correct it when it went down the wrong direction. I literally just told Cursor “No, why do it that way when you could do it much simpler by X”, and usually it fixed it.

A few times, after writing a bunch of code myself, I compiled the project for the first time in a while and (as one does) ran into a forest of inscrutable C++ template errors. Rather than spend my time scrolling through all of them I just told cursor “fix the compile errors”, and sure enough, it did it.

Another example - you can tell it things like “implement comparison operators for this class”, and it’s done in 5 seconds.

As the project got more complicated, I found it super useful to write tests for behaviors I wanted, and just tell it “make this test pass”. It really does a decent job of understanding the codebase and adding onto it like a junior developer would.

Using an IDE that gives it access to your whole codebase (including build system and tests) is key. Using ChatGPT standalone and pasting stuff in is not where the value is.

It’s nowhere near able to do the entire project from scratch, but it saved me from a bunch of tedious work that I don’t enjoy anyway.

Seems valuable enough to me!

dnh44•1h ago

Last summer I came back to software after about 12 years away and I pretty much had an identical experience to you with using AI as a helper to come back. I've now spent the last 6 months coding as much as I can in between consulting gigs. I'm not sure if I would have been able to get caught up so quickly without AI.

I haven't had this much fun programming since I was at university hacking away on sun workstations, but admittedly I only write about 10% of the code myself these days.

I'm currently getting Claude Code to pair program with GPT-5 and they delegate the file edits to Gemini Flash. It's pretty cool.

zhivota•46m ago

> I'm currently getting Claude Code to pair program with GPT-5 and they delegate the file edits to Gemini Flash. It's pretty cool.

This sounds cool, any more details or any write up on how to do something like this?

dnh44•34m ago

I use a program called RepoPrompt to do it. The dev has a video here:

https://www.youtube.com/watch?v=JzVnXzmZweg&t

8organicbits•57m ago

> all the bullshit involved in starting a new code base

Have you looked at cookiecutter or other template repos? That's my go to for small projects and it works pretty well. I'd worry the LLM would add bugs that a template repo wouldn't, as the latter is usually heavily reviewed human written code.

CityOfThrowaway•1h ago

I have a feeling this person is using far-from-frontier models, totally disconnected from the development environment.

Using, like, gpt-4o is extremely not useful for programming. But using Claude Code in your actual repo is insanely useful.

Gotta use the right tool + model.

tyfighter•1h ago

How is anyone just supposed to know that? It's not hard to find vim, but no one says, "You need to be running this extra special vim development branch where people are pushing vim to the limits!" Yes, it's fragmented, and changing fast, but it's not reasonable to expect people just wanting a tool to be following the cutting edge.

dbalatero•58m ago

I agree with your comment, but I also chuckled a bit, because Neovim _is_ a fast changing ecosystem with plugins coming out to replace previous plugins all the time, and tons of config tweakers pushing things to the limit. That said… one does not have to replace their working Neovim setup just because new stuff came out. (And of course, minimalist vim users don't use any plugins!)

solarkraft•1h ago

> Using, like, gpt-4o is extremely not useful for programming

I disagree! It can produce great results for well defined tasks. And I love the “I like this idea, now implement it in VSCode” flow ChatGPT desktop provides on macOS.

neom•1h ago

I think back to the first version of ChatGPT and I would pick it up once in a while, ask it something or chat with it, and then be like... this is cool but I don't know wtf I would use it for, now I use a GPT at least a couple of times a day. Granted, the LLMs have obviously become considerably more capable, but I do believe part of it is I've also learned how to use them and what to use them for, I'm at the point now where I can generally predict the output of what I'm asking for - I don't know if that's the norm (it mostly gives me exactly what I want) I do know how I use them today and how I used them when they first came out is quite different. I guess all that is to say, imo how you prompt them really matters, and that takes time to learn.

Workaccount2•1h ago

At what point does clinging to driving a manual transmission round the track go from a sign of practical skill to a sign of stubborn arrogance?

SoftTalker•1h ago

Haven’t even really tried them. The sand is shifting way too fast. Once things stabilize and other people figure out how to really use them I’ll probably start but for now it just feels like effort that will have been wasted.

neom•1h ago

All the models feel a bit different to use, and part of being good with LLMs (I suspect) is being able to assess a model before you really start using it, and, learning the nuances in the models that you will use, for that alone I think it's worth spending time with them.

moritzwarhier•46m ago

"All the screwdrivers feel a bit different to use, and part of being good with screwdrivers (I suspect) is being able to assess a screwdriver before you really start using it, and, learning the nuances in the screwdrivers that you will use, for that alone I think it's worth spending time with them"

Sounds dubious to me

"All the SSRIs feel a bit different, and part of being good with SSRIs (I suspect) is being able to assess an SSRI before you really start using it, and, learning the nuances in the SSRIs that you will use, for that alone I think it's worth spending time with them"

Hm, that comparison sounds off, but not as much to me as to many other people.

"All the IDEs and text editors feel a bit different to use, and part of being good with IDEs (I suspect) is being able to assess an IDE before you really start using it, and, learning the nuances in the editors that you will use, for that alone I think it's worth spending time with them"

Sounds reasonable.

Substituting the subject back to AI coding agents, I'm struggling to make out your argument. What method of assessment would you recommend other than _starting to use_ a coding assistant model?

I guess you were referring to anecdats and reviews/posts, or were you referring to specific objective properties like context size, RAG capabilities etc?

Groxx•1h ago

yeah, tbh I think that even if they are the cat's pajamas and they end up taking over absolutely all text-based work everywhere and literally everyone agrees they're better at it than humans...

... the current state-of-the-art won't be what we use, and the prompts people are spending tons of time crafting now will be useless.

so I don't think there's all that much FOMO to F over. either the hype bubble pops or literally everyone in those trades will be starting over with brand new skills based on whatever was developed in the past 6 months. people who rode the wave will have something like 6 months of advantage...

... and their advantage will quickly be put into GPTs and new users won't need to learn that either ("you are a seasoned GPT user writing a prompt..."). unless you worry endlessly about Roko's Basilisk, it's kinda ignorable I think. either way you still need to develop non-GPT skills to be able to judge the output, so you might as well focus on that.

allenu•1h ago

I kept hearing about Claude Code for a while and never really tried it until a week ago. I used it to prototype some Mac app ideas and I quickly realized how useful it was at getting prototypes up and running very, very quickly, like within minutes. It saves so much time with boilerplate code that I would've had to type out by hand and have done hundreds of times before.

With my experience, I wonder what the author of this blog post has tried to do to complete a task as that might make a difference on why they couldn't get much use out of it. Maybe other posters can chime in on how big of a difference programming language and size of project can make. I did find that it was able to glean how I had architected an app and it was able to give feedback on potential refactors, although I didn't ask it to go that far.

Prior to trying out Claude Code, I had only used ChatGPT and DeepSeek to post general questions on how to use APIs and frameworks and asking for short snippets of code like functions to do text parsing with regexes, so to be honest I was very surprised at what the state of the art could actually do, at least for my projects.

martinald•1h ago

I'm completely equally lost the other way.

I've went through multiple phases of LLM usage for development.

GPT3.5 era: wow this is amazing, oh. everything is hallucinated. not actually as useful as I first thought

GPT4 era: very helpful as stackoverflow on steroids.

Claude 3.5 Sonnet: have it open pretty much all the time, constantly asking questions and getting it to generate simple code (in the web UI) when it goes down actually feels very old school googling stuff. Tried a lot of in IDE AI "chat" stuff but hugely underwhelmed.

Now: rarely open IDE as I can do (nearly) absolutely everything in Claude Code. I do have to refactor stuff every so often "manually", but this is more for my sanity and understanding of the codebase..

To give an example of a task I got Claude code to do today in a few minutes which would take me hours. Had a janky looking old admin panel in bootstrap styles that I wanted to make look nice. Told Claude code to fetch the marketing site for the project. Got it to pull CSS, logos, fonts from there using curl and apply similar styling to the admin panel project. Within 10 mins it was looking far, far better than I would have ever got it looking (at least without a designers help). Then got it to go through the entire project (dozens of screens) and update "explanation" copy - most of which was TODO placeholders to explain what everything did properly. I then got it to add an e2e test suite to the core flows.

This took less than an hour while I was watching TV. I would have almost certainly _never_ got around to this before. I'd been meaning to do all this and I always sigh when I go into this panel at how clunky it all is and hard to explain to people.

kaashif•1h ago

Yeah, as a primarily backend engineer dealing with either weird technical problems Claude can't get quite right or esoteric business domain problems Claude has no idea about (and indeed, it may be only a few people in one company could help with) - Claude isn't that useful.

But random stuff like make a web app that automates this thing or make an admin panel with auto complete on these fields and caching data pulled from this table.

It is like infinity times faster on this tedious boilerplate because some of this stuff I'd just have never done before.

Or I'd have needed to get some headcount in some web dev team to do it, but I just don't need to. Not that I'd have ever actually bothered to do that anyway...

martinald•1h ago

One thing I'd recommend for weird business domain projects as a starter is getting it to create a "wiki" of markdown files of all the logic (I suspect this may have been on your to-do list anyway!). You may be pleasantly surprised at how well it does it, and then you can update your claude.md file to point to them (or even put it in there, but it is maybe overkill).

PaulHoule•16m ago

Asking for documentation is good. I had a data structure in Java that was a bunch of nested Map<String, Object>(s) that I partially understood and asked my agent to write me some sample JSON documents and... it just did it and it was a big help for manual coding against that structure.

When I can't figure out something about a library I have often loaded the git repository into my IDE and checked out the version I was using, then used the IDE to study the code. Now I do that and ask the agent questions about the code like "How do I do X?" (often it sees the part in the doumentation that I missed) or "Where is the code that does Y?" It greatly accelerates the process of understanding code.

throwawaysleep•1h ago

> while I was watching TV

This to me is one of the real benefits. I can vibe code watching TV. I can vibe code in bed. I can vibe code on the plane waiting for takeoff with GitHub Copilot Agents.

VladVladikoff•1h ago

To me this is sad. I love programming. It was always a fun job. I don’t want to watch TV instead of programming.

johnfn•50m ago

Then don't. You'll likely outperform those who do. Not making any value judgements; both have their place. Sometimes I want to be locked in and sometimes I want to let Claude spin while I take a walk.

throwawaysleep•34m ago

Do you love finding every weird CSS important hack, every RBAC refactor, or marching through the codebase to put a key where every piece of text is when i18n comes along?

Lots of coding work is interesting, but plenty is just tedious.

jvanderbot•50m ago

I'm convinced the vast difference in outcome with LLM use is a product of the vast difference in jobs. For front end work it's just amazing. Spits out boilerplate and makes alterations without any need of help. For domain specific backend, for example robotics, it's bad. Tries to puke bespoke a-star, or invents libraries and functions. I'm way better off hand coding these things.

The problem is this is classic Gell Mann Amnesia. I can have it restyle my website with zero work, even adding StarCraft 2 or NBA Jam themes, but ask it to work in a planning or estimation problem and I'm annoyed by its quality. Its probably bad at both but I don't notice. If we have 10 specializations required on an app, I'm only mad about 10℅. If I want to make an app entirely outside my domain, yeah sure it's the best ever.

Ezhik•1h ago

In the end, the greatest use I get from coding agents and stuff is hijacking the Stack Overflow principle - it's much easier to trick myself into correcting the poor code Claude generates than it is to start writing code from a blank slate.

tyfighter•1h ago

You're (they're?) not alone. This mirrors every experience I've had trying to give them a chance. I worry that I'm just speaking another language at this point.

EDIT: Just to add context seeing other comments, I almost exclusively work in C++ on GPU drivers.

almostgotcaught•55m ago

Same - I work on a cpp GPU compiler. All the LLMs are worthless. Ironically the compiler I work on is used heavily for LLM workloads.

dmezzetti•1h ago

A lot of this is classic gaslighting. Gaslighting is the manipulation of someone into questioning their perception of reality (per Wikipedia).

Basically, a lot of people who are experts are being told this story and they think they are the only one who doesn't get it.

There are plenty of gains to be had with AI/LLMs but just not in the way it's typically marketed.

throwawa14223•1h ago

This exactly mirrors my experience. I can't see the whole LLM/GPT thing as anything but another blockchain level scam. It isn't zero value it is actually a negative value as the time it takes is an opportunity cost.

Barrin92•1h ago

I spend a fair amount of time on open source and one thing I noticed is that in real pieces of software it doesn't look like all these 10x and 100x AI engineers are anywhere to be found.

VLC has like 4000 open issues. Why aren't the AI geniuses fixing these? Nobody has ever any actual code to show, and if they do it's "here's an LED that blinks every time my dog farts, I could've never done it on my own!". I'm feeling like Charlie in that episode of It's Always Sunny with his conspiracy dashboard. All these productivity gurus don't actually exist in the real world.

Can anybody show me their coding agent workflow on a 50k LOC C codebase instead of throwaway gimmick examples? As far as I'm concerned these things can't even understand pointers

channel_t•47m ago

It is somewhat surprising to me that there are a huge number of smart working professionals in the industry who still haven't figured out how to make use of generative AI in ways that upgrade the quality of their day-to-day lives in previously impossible ways. Don't get me wrong, LLMs are highly flawed and there's an overwhelming amount of dishonest magical BS narratives around them, but it seems like the people who aren't getting much out of it at this point either aren't trying hard enough or are looking in all of the wrong places.

It definitely can take a hefty amount of research and experimentation in context engineering to find recipes that work for the specific problems that one might be trying solve. The whole process involved in this seems exactly like the kind of thing that would normally drive most software development-minded folks down obsessive rabbit holes that lead to fun and interesting solutions, but for some reason there's something about generative AI that really isn't igniting the spark for a lot of people.

zmmmmm•47m ago

One thing to openly recognise is that FOMO is one of the core marketing strategies applied in any hype bubble to get people on board. There seem to be multiple blog posts a day on HN that are thinly veiled marketing about AI and most follow a predictable pattern: (a) start by implying a common baseline that is deliberately just beyond where your target market sits (example: "how I optimised my Claude workflow") and (b) describe the solution to the problem just well enough to hint there's an answer but not well enough to allow people to generalise. By doing this you strongly hint that people should just buy into whatever the author is selling rather than try to build fundamental knowledge themselves.

Putting aside the FOMO, the essential time tested strategy is simply to not care and follow what interests you. And the progress in AI is simply astonishing, it's inherently interesting, this shouldnt be hard. Don't go into with it with the expectation of "Unless it vibe coded and entire working application for me on it's a failure". Play with it. Poke it, prod it. Then try to resolve the quirks and problems that pop up. Why did it do that? Don't expect an outcome. Just let it happen. The people who do this now will be the ones to come through the hype bubble at the end with actual practical understanding and deployable skills.

cwyers•42m ago

The author calling them "GPTs" suggests to me maybe not keeping up with the state of the art.

Just Buy Nothing: A fake online store to combat shopping addiction

How I code with AI on a budget/free

Abusing Entra OAuth for fun and access to internal Microsoft applications

Show HN: The current sky at your approximate location, as a CSS gradient

Don't “let it crash”, let it heal

My Lethal Trifecta talk at the Bay Area AI Security Meetup

A CT scanner reveals surprises inside the 386 processor's ceramic package

OpenFreeMap survived 100k requests per second

R0ML's Ratio

Debian 13 "Trixie"

GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it

Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and API

People returned to live in Pompeii's ruins, archaeologists say

Who got arrested in the raid on the XSS crime forum?

Quickshell – building blocks for your desktop

Long-term exposure to outdoor air pollution linked to increased risk of dementia

A Simple CPU on the Game of Life (2021)

How I use Tailscale

Consistency over Availability: How rqlite Handles the CAP theorem

Suzhou Imperial Kiln Ruins Park and Museum of Imperial Kiln Brick (2018)

Did California's fast food minimum wage reduce employment?

An AI-first program synthesis framework built around a new programming language

An engineer's perspective on hiring

Stanford to continue legacy admissions and withdraw from Cal Grants

Curious about the training data of OpenAI's new GPT-OSS models? I was too

ESP32 Bus Pirate 0.5 – A hardware hacking tool that speaks every protocol

Testing Bitchat at the music festival

MCP overlooks hard-won lessons from distributed systems

Ratfactor's illustrated guide to folding fitted sheets

Isle FPGA Computer: creating a simple, open, modern computer