Ask HN: Are the newest LLMs better than you at programming?

3•par1970•2h ago

I've been programming for 10+ years. From my usage of ChatGPT 5.4, it seems to me that it's better than me at programming. I never thought this about any of the ChatGPT 4.* models that I tried.

How do your abilities compare to the newest models?

edit: Please specify which model and version you are talking about.

Comments

drrob•1h ago

They're still pretty dreadful. They're better than I was at 21, so I'd say they're good for graduate level, but nothing beyond that.

par1970•1h ago

Which models + versions are you using? Can you give a specific problem that you found them to be bad at?

drrob•1h ago

The most recent logic I tried getting it to code for me was to make me some recursive C# functions to reverse navigate a node map (a Microsoft Project plan with various feeding chains) to calculate all possible paths, and return them as a list of objects.

It kept producing code that looked to eye that it might work, but each time I ran it it would just throw schoolboy exceptions. I got tired of telling it to correct the things it kept forgetting to check for (nulls, path starts, empty lists), and just coded it from scratch myself.

I find ChatGpt is like pair-programming with a junior, except I'm not getting paid to coach them like I would if it were an actual graduate hire.

chiengineer•1h ago

Your prompts are zero out of 10 quality

Learn how to prompt better you'll be fine

drrob•1h ago

I think I'm doing just fine, thanks for your concern.

multidude•1h ago

keeping context is a thing that they are bad at. For now, i admit, but they are.

Given a long haul goal with instructions and everything they will reinvent the wheel four times and one of those you will get a square. Reminds me of that monkey paw wish thing. You look at your finished app. Looks beautiful, but its inner workings are a ball of confusion.

multidude•1h ago

I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4. My answer may not be useful to you, but i'll talk about my experience with that and hope it may help you.

So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other. At times it feels like doing pair programming with a guy who is so crazy fast that im left behind with my head spinning, wondering how we just jumped from a hello world to a working thing that would have taken me ten iterations. And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.

At the beginning i was almost euphoric about my new friend, now im sometimes disappointed, sometimes confused, but i am learning to give better, more concise instructions, to do smaller development jumps. It is tempting to set a long haul goal and let it do. But, i think for now, even if it is much faster at the small things, it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.

par1970•1h ago

> I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4.

Are you using Sonnet 4.6?

> So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other.

I've only built small things (< 1000 lines) with the systems, so I might be missing this problem.

Is it better than you at building small self-contained things?

> And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.

Do you ask it to generate test suites for the things that it builds?

> it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.

noted

multidude•1h ago

i started working with this two weeks ago, so im learning as i go (or should i say stumble and fall). Weird as it may sound what i found so trustworthy at the beginning, it sounded so rational and logic as it really knew better and i liked letting it do. Obviously it dis not go so well, and i had to correct a lot. But i am learning, what can i say? And yes, i gave it many commandements like "thouh shalt always test before releasing" and it sounded so convincing when it confirmed what an excellent idea that was that i was surprised at least -imagine that- when something did not go as planned on prod because of , well you know...

par1970•33m ago

Did you tell it that it should test, or did you have it generate actual tests that you could run if you wanted to?

beanjuiceII•1h ago

I am not sure better is the metric, that answer is definitely mostly no..but faster? absolutely.

codingdave•1h ago

It is better at syntax and boilerplate. It writes cleaner code than I would have. But it is absolute shit at actually designing systems, in particular if you are integrating multiple platforms and stacks.

par1970•1h ago

What models + versions are you using?

Is it bad at designing systems that don't have a bunch of integrations?

valentinconan•1h ago

In my opinion, AI agents are currently just as capable as novice developers. Their main advantage is that they’re much faster than we are when the task involves generating a lot of code.

If the task is simple, I spend more time telling it what to do than doing it myself. But if the task is complex, I use certain skills/commands and create intermediate files (more than necessary) between each step (analysis, planning, design, workflow, and implementation) and clear the context between each of them. The result is fairly accurate, but not perfect.

My take is, we remain the architects of our code, and AI agents are an excellent tool that we need to master.

par1970•1h ago

If your project requires the solution of a tricky algorithmic issue, then is the AI system able to solve that part, or do you have to give it the solution?

valentinconan•26m ago

I haven't yet tried to solve truly complex algorithmic problems.

Generally speaking, if the problem is common, the model has likely already been trained to solve it.

If it's truly complex and/or specific to my needs, I can try using a reasoning model to think through a solution before moving on to implementation.

I use the agent to conduct research, find resources to understand the complexity, best practices, feedback, etc., and to write a Markdown analysis file on the topic.

Then I can use this file as a basis to precisely define what I want to do and brainstorm with the agent in thinking mode. The more the task is described and defined, the more accurate the result will be.

yibers•50m ago

Opus 4.6 (high) is doing for me things that i don't know to do myself. Moreover, I don't understand enough what it did after it did it. But it works. The domain is automated debugging and RE.

par1970•34m ago

How much domain experience do you have? Is it helping you solve problems for paying customers?

Building a UI Framework [pdf]

IdeaClaw – one sentence, get a camera-ready paper, BP, DD reports, health report

What's in a name? – The unknown faces of history

Making an Argument for (Voluntary) Online Identity Verification

To Catholic thinkers, Pentagon's AI demands violate 'human dignity'

I built a database scoring what separates high-scoring pitch decks from the rest

House speaker, Intel chiefs make new push to renew surveillance law

Replacing Anki: what I learned building a language app (1k users, $21 MRR)

Agent-rendered: the pattern that replaces runtime infra with build-time AI

Vulnerabilities in OpenClaw: A Complete Enterprise Security Analysis

Minecraft Source Code Is Interesting

AI Pentester

Update iOS to protect your iPhone from web attacks

New "PolyShell" flaw allows unauthenticated RCE on Magento e-stores

Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Kernels

Delve (YC W24) – Fake Compliance as a Service – Part I

M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

COW Fork: Zero-Copy Sandbox Cloning for AI Agents

Netcup Increases Prices over 21%

Location of French aircraft carrier leaked in real time via Strava user on board

360B tokens, 3M customers, 6 engineers

Beat Paxos

Things That Turbo Pascal Is Smaller Than (2011)

Justice Department Disrupts Iranian Cyber Enabled Psychological Operations

US Jobless Claims Fell Last Week to Lowest Since January

Kalshi in Hot Water – What This Means for Startups Like PolyBets

Crypto.com lays off 12% of workforce as latest company to cite AI in job cuts

Young Adults Are Not Happy

Tricks That Made Amiga Games Look 3D

Firefox Update Will Prompt Users to Accept Terms of Use at Startup

Ask HN: Are the newest LLMs better than you at programming?

Comments

Building a UI Framework [pdf]

IdeaClaw – one sentence, get a camera-ready paper, BP, DD reports, health report

What's in a name? – The unknown faces of history

Making an Argument for (Voluntary) Online Identity Verification

To Catholic thinkers, Pentagon's AI demands violate 'human dignity'

I built a database scoring what separates high-scoring pitch decks from the rest

House speaker, Intel chiefs make new push to renew surveillance law

Replacing Anki: what I learned building a language app (1k users, $21 MRR)

Agent-rendered: the pattern that replaces runtime infra with build-time AI

Vulnerabilities in OpenClaw: A Complete Enterprise Security Analysis

Minecraft Source Code Is Interesting

AI Pentester

Update iOS to protect your iPhone from web attacks

New "PolyShell" flaw allows unauthenticated RCE on Magento e-stores

Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Kernels

Delve (YC W24) – Fake Compliance as a Service – Part I

M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

COW Fork: Zero-Copy Sandbox Cloning for AI Agents

Netcup Increases Prices over 21%

Location of French aircraft carrier leaked in real time via Strava user on board

360B tokens, 3M customers, 6 engineers

Beat Paxos

Things That Turbo Pascal Is Smaller Than (2011)

Justice Department Disrupts Iranian Cyber Enabled Psychological Operations

US Jobless Claims Fell Last Week to Lowest Since January

Kalshi in Hot Water – What This Means for Startups Like PolyBets

Crypto.com lays off 12% of workforce as latest company to cite AI in job cuts

Young Adults Are Not Happy

Tricks That Made Amiga Games Look 3D

Firefox Update Will Prompt Users to Accept Terms of Use at Startup