It’s kind of crazy to me how the cool kid take on software development, as recent as 3 years ago, was: strictly-typed everything, ‘real men’ don’t use garbage collection, everything must be optimized to death even when it isn’t really necessary, etc. and now it seems to be ‘you don’t seriously expect me to look at ‘every single line of code’ I submit, do you?’
What’s changed isn’t that the same engineers did a 180 on principles, it’s that the discourse got hijacked by a new set of people who think shipping fast with AI is cooler than sweating over type systems. The obsession with performance purity was always more of a niche cultural flex than a universal law, and now the flex du jour is “look how much I can outsource to the machine.”
Your read on the situation concurs with mine. Cheers.
I'm using Typescript and Rust and I think it's critical to use strict typing with LLMs to catch simple bugs.
I've worked at Uber as an infra engineer and at Gem as an engineering manager so I do consider myself an "actual professional developer". The critical bit is the context of the project I'm working on. If I were at a tech company building software, I'd be much more reticent to ship AI generated PRs whole cloth.
Having to instead express all that (including the business-related part, since the agent has no context of that) in a verbose language (English) feels counter-productive, and is counter-productive in my experience.
I've successfully one-shotted easy self-contained, throwaway tasks ("make me a program that fills Redis with random keys and values" - Claude will one-shot that) but when it comes to working with complex existing codebases I've never seen the benefits - having to explain all the context to the agent and correcting its mistakes takes longer than just doing it myself (worse, it's unpredictable - I know roughly how long something will take, but it's impossible to tell in advance whether an agent will one-shot it successfully or require longer babysitting than just doing it manually from the beginning).
IME it's faster to not try to edit the same code in parallel because of the cost of merging.
The check-ins are much more frequent and the instructions much lower level than what you’d give to a team if you were running it.
Do you have an example of a large application you’ve released with this methodology that has real paying users that isn’t in the AI space?
I have tried READMEs scattered through the codebase but I still have trouble keeping the agent aware of the overall architecture we built.
The disk in question was an HDD and the problem disappeared (or is better hidden) after symlinking the log dir to an SSD.
As for code itself, I've never had an issue with slowness. If anything it's the verbosity of wanting to explain itself and excess logging in the code it creates.
Currently they're better at locating problems than fixing them without direction. Gemini seems smarter and better at architecture and best practices. Claude seems dumber but is more focused on getting things done.
The right solution is going to be a variety of tools and LLMs interacting with each other. But it's going to take real humans having real experience with LLMs to get there. It's not something that you can just dream up on paper and have it work out well since it depends so much on the details of the current models.
> very good at writing design docs
sorry if this is a newbie-is question, but where does the information for design docs come from?from 'product briefs' or something else?
Initially I would barely read any of the code generated and as my project has grown in size, I have approached the limits of that approach.
Often because Claude Code makes very poor architectural choices.
I call this problem the "goldilocks" problem. The task has to be large enough that it outweighs the time necessary to write out a sufficiently detailed specification AND to review and fix the output. It has to be small enough that Claude doesn't get overwhelmed.
The issue with this is, writing a "sufficiently detailed specification" is task dependent. Sometimes a single sentence is enough, other times a paragraph or two, sometimes a couple of pages is necessary. And the "review and fix" phase again is totally dependent and completely unknown. I can usually estimate the spec time but the review and fix phase is a dice roll dependent on the output of the agent.
And the "overwhelming" metric is again not clear. Sometimes Claude Code can crush significant tasks in one shot. Other times it can get stuck or lost. I haven't fully developed an intuition for this yet, how to differentiate these.
What I can say, this is an entirely new skill. It isn't like architecting large systems for human development. It isn't like programming. It is its own thing.
You articulated what I was wrestling with in the post perfectly.
The big issue is that, even though there is a logical side to it, part is adapting to a close system that can change under your feet. New model, new prompt, there goes your practice.
Absolutely. And what I find fascinating that this experience is highly personal. I read probably 876 different “How I code with LLMs” and I can honestly say not a single thing I read and tried (and I tried A LOT) “worked” for me…
There is maybe some truth to the LLM vibe coding and there maybe is some truth to the “old guard” saying “this is shit”, because this might be shit for very good reasons.
- those fighting HARD to tell you at the top of their lungs “oh this is sh
t, I tried it and it is baaaad- those going “hmmm let me see how I can learn etc to get to the point where I am also a lot more productive, if ____ and ____ can learn it so can I…”
You always want to be in the second camp…
EDIT: typo
I do this in a task_descrtiption.md file and I include the clarifications in its own section (the files follow a task.template.md format).
I think it's undeniable that in narrow well controlled use cases the AI does give you a bump. Once you move beyond that though the time you have to spend on cleanup starts to seriously eat into any efficiency gains.
And if you're in a domain you know very little about, I think any use case beyond helping you learn a little quicker is a net negative.
It's management!
I find myself asking very similar questions to you: how much detail is too much? How likely is this to succeed without my assistance? If it does succeed, will I need to refactor? Am I wasting my time delegating or should I just do it?
It's almost identical to when I delegate a task to a junior... only the feedback cycle of "did I guess correctly here" is a lot faster... and unlike a junior, the AI will never get better from the experience.
In my experience, the real "pain" of programming lies in forcing yourself to absorb a flood of information and connecting the dots. Writing code is, in many ways, like taking a walk: you engage in a cognitively light activity that lets ideas shuffle, settle, and mature in the background.
When LLMs write all the code for you, you lose that essential mental rest. The quiet moments where you internalize concepts, spot hidden bugs, and develop a mental map of the system.
If you have a really detailed, well thought out spec, you do TDD and you have regular code review and refactor loops, agentic coding stays manageable.
There is way too much babysitting with these things.
I’m sure somehow somebody makes it work but I’m incredibly skeptical that you can let an LLM run unsupervised and only review its output as a PR.
> The amount of time writing that spec can take more time than just doing it by hand.
one thing about doing it by hand is you also notice holes/deficiencies in the spec and can go back and update it, make the product better, but just throwing it to an llm 'til its perfect-to-spec probably means its just going to be average quality at best...tho tbh most software isn't really 'stunning' imo so maybe thats fine as far as most businesses are concerned... (sad face)
But AI reviewers can do little beyond checking coding standards.
Writing code is my favorite part of the job, why would I outsource it so I can spend even more time reading and QAing?
People who vibe code don't care about the code, but about producing something that delivers value, whatever that may be. Code is just an intermediate artifact to achieve that goal. ML tools are great for this.
People who program care about the code. They want to understand how it works, what it does, in addition to whether it achieves what they need. They may also care about its quality, efficiency, maintainability, and other criteria. ML tools can be helpful for programming, but they're not a panacea. There is no shortcut for building robust, high quality software. A human still needs to understand whatever the tool produces, and ensure that it meets their quality criteria. Maybe this will change, and future generations of this tech will produce high quality software without hand-holding, but frankly, I wouldn't bet on the current approaches to get us there.
When I write piece of code that is elegant, efficient, and -- "right" -- I get a dopamine rush, like I finished a difficult crossword puzzle. Seems like that joy is going to go away, replaced by something more akin to developing a good relatioship with a slightly quirky colleague who happens to be real good (and fast) at some things -- especially things management likes, like N LOC per week -- but this colleague sucks up to everyone, always thinks they have the right answer, often seems to understand things on a superficial level, and oh -- works for $200 / month...
Shades of outsourcing to other continents...
I have an emerging workflow orchestrated by Claude Code custom commands and subagents that turns even an informal description of a feature into a full fledged PRD, then an "architect" command researches and produces a well thought out and documented technical design. I can review that design document and then give it to the "planner" command, which breaks it down into Phases and Tasks. Then I have a "developer" command iterate through through and implement the Phases one by one. After each phase it runs a detailed code review using my "review" subagent.
Since I've started using this document-driven, guided workflow I've seen quality of the output noticeably improve.
My pattern with claude code is to let stuff simmer in the background with a detailed PRD, and just turn the screws with progressively more testing and type checking. I'll use repomix to put my entire codebase into gemini 2.5 pro, chat with it for a bit and then ask it to generate a highly detailed work plan for claude code to make the codebase more production hardened/launch ready. If I don't burn my plan tokens first, that gemini prompt can keep claude running for like ~3 hours usually. If you repeat this gemini plan -> claude implement step a few times gemini will eventually start to tell you to stop being a chicken and launch your great app.
My hunch is that good automated testing is an enormous factor with respect to how productive you can get with coding agent tools.
Thorough tests? Just like working without LLMs you can confidently make changes without fear of breaking other parts of the application.
No tests at all? Any change you make is a roll of the dice with respect to how it affects the rest of your existing code.
I don’t find the issue to be breaking other parts of the app, more-so that new features don’t work as advertised by Claude.
One of my takeaways here is that I should give Claude an integration test harness and tell it that it must finish running that successfully before committing any code.
I now think TDD can play a big part. I don’t have much of a background in unit testing. For a recent TypeScript utility mini project, I took an outside-in approach using mocks where necessary. This started as a prototyping and modelling phase, getting the design right before committing to implementation code. This was about refining the types and function signatures, and mocking the components that didn’t exist at that point. The LLM didn’t have involvement at this stage, as it was about the problem domain, the shape and flow of the data. Moving on from there, I was able to save a lot of time because SuperMaven in Cursor had enough context and understanding at that point to make very precise guesses about what I wanted, so I could tab autocomplete through a reasonable amount of boilerplate implementation code. I was also able to get away with writing a couple of happy path tests for most components, and get the agentic LLM to generate sad path tests. Most of which I kept, including one that smoked out a flaw in my design.
That’s essentially the process I’m gravitating towards. Human begins the process, models the design, sets the constraints, and then the LLM saves time in a limited and supervised way whilst being kept on a short leash.
I do not enjoy spelling out tasks in English and checking that they are done correctly.
Waiting for an AI to complete its task isn't a fun thing at all, and I'd chose the fast 70% correct response any day over the slow 90% correct one. Because by the time the slow one gives you its first attempt, you'd have clarified you need and fixed the output from the fast one.
Sure if we get to the point where the slow system is 100% right, then it's no big deal if it's slow, but we're still far from that point.
Nonetheless it's ability to produce code that works is impressive, it's useful for learning, to generate throwaway code...
For example I can ask for a piece of code generating stats from logs. The code is not meant to last and will have few users (the devs), so maintainability is not an issue.
But the insights gleaned from that battle are (for Claude) lost forever as soon as I start on a new task.
The way LLM's (fail to) handle memory and in-situ learning (beyond prompt engineering and working within the context window) is just clearly deficient compared to how human minds work.
I dunno.
However AI’s are great for quickly learning how to use external tools/libraries (like JasperReports) and for quickly writing parser functions.
It is like any other tool: good for some things bad for others.
falcor84•5mo ago
I don't get this, how many git hooks do you need to identify that Claude had hallucinated a library feature? Wouldn't a single hook running your tests identify that?
sc68cal•5mo ago
deegles•5mo ago
robertfw•5mo ago
AstroBen•5mo ago
Works every time
manmal•5mo ago
• Good news! The code is compiling successfully (the errors shown are related to an existing macro issue, not our new code).
When infact, it managed to insert 10 compilation errors that were not at all related with any macros.
lkramer•5mo ago
loandbehold•5mo ago
pluto_modadic•5mo ago
thrown-0825•5mo ago
kiitos•5mo ago
cpursley•5mo ago
kiitos•5mo ago
thrown-0825•5mo ago
kiitos•5mo ago
jmvldz•5mo ago
My workflow is often to plan with ChatGPT and what I was getting at here is ChatGPT can often hallucinate features of 3rd party libraries. I usually dump the plan from ChatGPT straight into Claude Code and only look at the details when I'm testing.
That said, I've become more careful in auditing the plans so I don't run in to issues like this.
CuriouslyC•5mo ago
jmvldz•5mo ago
cpursley•5mo ago
CuriouslyC•5mo ago