Another conclusion is that we could benefit from benchmarks for architectural quality.
Once you’ve got a general gist of a solution, you can try coding it. Coding with no plan is generally a recipe for disaster (aka can you answer “what am I trying to do?” clearly)
Instant answers, correct or not.
Cheaper per answer by magnitudes.
Solutions provided with extensive documentation.
Solutions provided with extensive _made up_ documentation.
I'm still learning though
Ask it to write out the approach in a series of extensive markdown files that you will use to guide the build-out. Tell it to use checklists. Once you're happy with the full proposal, use @file mentions to keep the files in context as you prompt it through the steps. Works wonders.
It's important to have this self awareness. Don't let AI trick you into thinking it can build anything good. When starting a project like in the article, your time is probably better spent taking a step back, learning the finer points of the new language (like, from a book or proper training course) and going from there. Otherwise, you're going to be spending even more time debugging code you don't understand.
It's the same thing with a crappy consultant. It seems great to have someone build something for you, but you need to make preparations for when something breaks after their contract is terminated.
Overall, it makes you think, what is the point? We can usually find useful crowd-sourced code snippets online, on stack exchange, etc. We have to treat them the same way, but, it's basically free compared to AI, and keeping the crowd-sourced aspect alive makes sure there's always documentation for future devs.
Seriously, though, within the context of software development, these are all issues I've encountered as well, and I don't know how to program: sweeping solutions, inability to resolve errors, breaking down all components to base levels to isolate problems.
But, again, I don't know how to program. For me, any consultant is better than no consultant. And like the author, I've learned a ton on how to ask for what I want out of Cursor.
Just this morning my CTO was crowing about how he was able to use Claude to modify the UI of one of our internal dev tools. He absolutely cannot wait to start replacing devs with AI.
Nobody wanted to hear it back when software development was easy street, but maybe we should have unionized after all...
> Nobody wanted to hear it back when software development was easy street, but maybe we should have unionized after all...
Thanks rugged libertarian individualists!
Software engineering is full of dumb people who think they're sooooo clever.
Then they put my code into chatgpt or whatever they use and ask it to adapt to their code
After a while we (almost) all realized that was just doing a huge clusterfuck
BTW, I think it would have been much better to start from scratch with their own implementation given we're analyzing different datasets. And it might not make sense to try to convert the code for a dataset structure to another. A colleague didn't manage to draw a heatmap with my code and a simple csv for God know what reasons. And I think just asking a plot from scratch from a csv would be quite easy for a llm
The amount of time I save just by not having to write tests or jsdocs anymore is amazing. Refactoring is amazing.
And that's just the code - I also use AI for video production, 3d model production, generating art and more.
Would you mind sharing more about your workflow with aider? Have you tried the `--watch-files` option? [0] What makes the architect mode [1] way better in your experience?
[0] https://aider.chat/docs/usage/watch.html
[1] https://aider.chat/docs/usage/modes.html#architect-mode-and-...
For most of the day I use Gemini Pro 2.5 in non-architect mode (or Sonnet when Gemini is too slow) and never really run into the issue of it making the wrong changes.
I suspect the biggest trick I know is being completely on top of the context for the LLM. I am frequently using /reset after a change and re-adding only relevant files, or allowing it to suggest relevant files using the repo-map. After each successful change if I'm working on a different area of the app I then /reset. This also purges the current chat history so the LLM doesn't have all kinds of unrelated context.
Edit: Also the Aider leaderboards show the success rate for diff adherence separately, it's quite useful [1]
More reports of 'vibe-coding' causing chaos because one trusted what the LLM did and it 'checked' that the code was correct. [0] As always with vibe-coding:
Zero tests whatsoever. It's no wonder you see LLMs not being able to understand their own code that they wrote! (LLM cannot reason)
Vibe coding is not software engineering.
My own anecdote with a codebase I'm familiar with is indeed, as the article mentions, it's a terrible architect. The problem I was solving ultimately called for a different data structure, but it never had that realization, instead trying to fit the problem shape into an existing, suboptimal way to represent the data.
When I mentioned that this part of the code was memory-sensitive, it indeed wrote good code! ...for the bad data structure. It even included some nice tests that I decided to keep, including memory benchmarks. But the code was ultimately really bad for the problem.
This is related to the sycophancy problem. AI coding assistants bias towards assuming the code they're working with is correct, and that the person using them is also correct. But often neither is ideal! And you can absolutely have a model second-guess your own code and assumptions, but it takes a lot of persistent work because these damn things just want to be "helpful" all the time.
I say all of this as a believer in this paradigm and one who uses these tools every day.
> Do not assume the person writing the code knows what they are doing. Also do not assume the code base follows best practices or sensible defaults. Always check for better solutions/optimizations where it makes sense and check the validity of data structures.
Just a quick draft. Would probably need waaaaaay more refinement. But might this help at least mitigating a bit of the felt issue?
I always think of AI as an overeager junior dev. So I tend to treat it that way when giving instructions, but even then...
... well, let's say the results are sometimes interesting.
No, this is way more fundamental than the sycophancy. It's related to the difficulty of older AI to understand "no".
Unless it sees people recommending that you change your code into a different version, it has no way to understand that the better code is equivalent.
That's why you should just write tests, before you write the code, so that you know what you are expecting with the code that is under test is doing. i.e Test driven development.
> And you can absolutely have a model second-guess your own code and assumptions, but it takes a lot of persistent work because these damn things just want to be "helpful" all the time.
No. Please do not do this. These LLMs have zero understanding / reasoning about the code they are outputting.
Recent example from [0]:
>> Yesterday I wanted to move 40GB of images from my QR menu site qrmenucreator . com from my VPS to R2
>> I asked gemini-2.5-pro-max to write a script to move the files
>> I even asked it to check everything was correct
>> Turns out for some reason the filenames got shortened somehow, which is a disaster because the QR site is quite basic and the image paths are written in the markdown of the menus
>> Of course the script already deleted 40GB of images from the VPS
>> But lesson learnt: be very careful with AI code, it made a mistake, couldn't even find the mistake when I asked it to double check the code, and because the ENDs of the filenames looked same I didn't notice it cut the beginnings off
>> And in this case AI can't even find its own mistakes
Just like the 2010s with the proliferation with dynamically typed languages creeping into the backend with low-quality code, we now will have vibe-coded low-quality software causing destruction because their authors do not know what their code does and also have not bothered to test it or even know what to test for.I've tried this too. They find ways to cheat the tests, sometimes throwing in special cases that match the specific test cases. It's easy to catch in the small scale but not when in a larger coding session.
> No. Please do not do this. These LLMs have zero understanding / reasoning about the code they are outputting.
This is incorrect. LLMs do have the ability to reason, but it's not the same reasoning that you or I do. They are actually quite good at checking for a variety of problems, like if the code you're writing is sensitive to memory pressure and you want to account for it. Asking them to examine the code with several constraints in mind often does give reasonable advice and suggestions to change. But this requires you to understand those changes to be effective.
When you hire a team of consultants, it is typically the case that you are doing so because you have an incomplete view of the problem and are expecting them to fill in the gaps for you.
The problem arises due to the fact that the human consultants can be made to suffer certain penalties if they don't provide reasonable advice. A transformer model ran in-house cannot experience this. You cannot sue yourself for fucking up your own codebase.
It would be interesting to see a LLM trained in a completely different way. There's got to be some tradeoff between how coherent the generations are and how interesting they are.
I also use it as a cognitive assistant, I've always found that talking about a design with a colleague helped me to think more clearly and better organize my ideas, even with very little insights from the other side. In this case the assistant is often a bit lacking on the skepticism side but it does not matter too much.
But back on point, I found AI works best when given a full set of guardrails around what it should do. The other day I put it to work generating copy for my website. Typically it will go off the deep end if you try to make it generate entire paragraphs but for small pieces of text (id say up to 3 sentences) it does surprisingly well and because it's outputting such small amounts of text you can quickly make edits to remove places where it made a bad word choice or didn't describe something quite right.
But I would say I only got ChatGPT to do this after uploading 3-4 large documents that outline my product in excruciating detail.
As for coding tasks again it works great when given max guardrails. I had several pages that had strings from an object and I wanted those strings to be put back in the code and taken out of the object. This object has ~500 lines in it so it would have taken all day but I ended up doing it in about an hour by having AI do most of the work and just going in after the fact and verifying. This worked really well but I would caution folks that this was a very very specific use case. I've tried vibe coding once for shits and giggles and I got annoyed and stopped after about 10 minutes, IMHO if you're a developer at the "Senior" level, dealing with AI output is more crumbsome than just writing the damn code yourself.
They emerge from the simple assumptions that:
- LLMs fundamentally pattern match bytes. It's stored bytes + user query = generated bytes.
- We have common biases and instinctively use heuristics. And we are aware of some of them. Like confirmation bias or anthropomorphism.
Some tricks:
1. Ask for alternate solutions or let them reword their answers. Make them generate lists of options.
2. When getting an answer that seems right, query for a counterexample or ask it to make the opposite case. This can sometimes help one to remember that we're really just dealing with clever text generation. In other cases it can create tension (I need to research this more deeply or ask an actual expert). Sometimes it will solidify one of the two, answers.
3. Write in a consistent and simple style when using code assistants. They are the most productive and reliable when used as super-auto-complete. They only see the bytes, they can't reason about what you're trying to achieve and they certainly can't read your mind.
4. Let them summarize previous conversations or a code module from time to time. Correct them and add direction whenever they are "off", either with prompts or by adding comments. They simply needed more bytes to look at to produce the right ones at the end.
5. Try to get wrong solutions. Make them fail from time to time, or ask too much of them. This develops a intuition for when these tools work well and when they don't.
6. This is the most important and reflected in the article: Never ask them to make decisions, for the simple fact that they can't do it. They are fundamentally about _generating information_. Prompt them to provide information in the form of text and code so you can make the decisions. Always use them with this mindset.
Also, there's a lot of value already in a crappy but fast and cheap consultant.
If I understand the problem well enough, and have a really good description of what I want, like I'm explaining it to a junior engineer, then they do an OK job at it.
At my last job, we had a coding "challenge" as part of the interview process, and there was a really good readme that described the problem, the task, and the goal, which we gave the candidate at the start of the session. I copy/pasted that readme into copilot, and it did as good a job as any candidate we'd ever interviewed, and it only took a few minutes.
But whenever there are any unknowns or vagaries in the task, or I'm exploring a new concept, I find the AIs to be more of a hindrance. They can sometimes get something working, but not very well, or the code they generate is misleading or takes me down a blind path.
The thing for me, though, is I find writing a task for a junior engineer to understand to be harder than just doing the task myself. That's not the point of that exercise, though, since my goal is to onboard and teach the engineer how to do it, so they can accomplish it with less hand-holding in the future, and eventually become a productive member of the team. That temporary increase in my work is worth it for the future.
With the AI, though, its not going to learn to be better, I'm not teaching it anything. Every time I want to leverage it, I have to go through the harder tasks of clearly defining the problem and the goal for it.
I have been thinking about this angle a lot lately and realizing how much of a skill it is to write up the components and description of a task thoroughly and accurately. I’m thinking people who struggle with this skill are having a tougher time using LLMs effectively.
realbenpope•3h ago
echelon•3h ago
AI code completion is god mode. While I seldom prompt for new code, AI code autocompletion during refactoring is 1000x faster than plumbing fields manually. I can do extremely complicated and big refactors with ease, and that's coming from someone who made big use of static typing, IDEs, and AST-based refactoring. It's legitimately faster than thought.
And finally, it's really nice to ask about new APIs or pose questions you would normally pour over docs or Google and find answers on Stack Overflow. It's so much better and faster.
We're watching the world change in the biggest way since smartphones and the internet.
AI isn't a crappy consultant. It's an expansion of the mind.
skydhash•3h ago
Unless you know Vim!
bayindirh•3h ago
or the IDE (or text editor for that matter) well. People don't want to spend time understanding, appreciating and learning the tool they use, and call them useless...
layer8•2h ago
bayindirh•2h ago
I don't bend the tool, even. It's what it's designed to do.
bayindirh•3h ago
Tech is useful, how it's built is very unethical, and how it's worshiped is sad.
jaoane•3h ago
bayindirh•3h ago
jaoane•2h ago
bayindirh•2h ago
danielbln•2h ago
bayindirh•2h ago
As I said, the network doesn't carry citation/source information. IOW, when it doesn't use a tool, it can't know where it ingested that particular piece of information.
This is a big no no, and it's the same reason they hallucinate and they'll continue doing that.
As a tangent, I see AI agents hit my digital garden for technical notes, and I'll probably add Anubis in front of that link in short order.
[0]: https://news.ycombinator.com/item?id=43972807
danielbln•2h ago
bayindirh•2h ago
ninetyninenine•2h ago
On the opposite end of the spectrum of worshippers there are naysayers and deniers. It’s easy to see why there are delusional people at both ends of the spectrum.
The reason is that the promise of AI both heralds an amazing future of machines and a horrible future where machines surpass humanity.
bayindirh•2h ago
For the third time [1] [2], I'll divide the line between core network and tools that core network uses. Agents are nothing new, and they expand capabilities of the LLMs, yes that's true. But they still can't answer the question "how did you generate this code and which source repositories you did use" when the LLM didn't use any tools.
The core network doesn't store citation/source information. It's not designed and trained in a way to do that.
geez.
[0]: https://notes.bayindirh.io/notes/Lists/Discussions+about+Art...
[1]: https://news.ycombinator.com/item?id=43972807
[2]: https://news.ycombinator.com/item?id=43972892
ninetyninenine•28m ago
Second the question you brought up can’t be answered even by a human. It’s a stupid question right? You blindfold a human and prevent him from using any tools and then ask him what tools he used? What do you expect will happen. Either the human will lie to you about what he did or tell you what he didn’t do. No different from an LLM.
The core network doesn’t store anything except generalization curve. Similar to your brain. You didn’t store those references in your brain right? You looked that shit up. The agentic LLM will do the same and the UI literally tells you it’s doing a search across websites.
Geeze.
codechicago277•2h ago
skydhash•2h ago
Even on a greenfield project, I rarely spend more than a day setting up the scaffolding and that’s for something I’ve not touched before. And for refactoring and tests, this is where Vim/Emacs comes in.
cess11•2h ago
I've been surprised by this for a long time, having seen coworkers spend days typing in things manually that they could have put there with IDE capabilities, search-replace, find -exec or a five minute script.
tim333•1h ago
unyttigfjelltol•3h ago
AI accelerates complex search 10x or maybe 100x, but still will occasionally respond to recipe requests by telling you to just substitute some anti-matter for extra calories.
bayindirh•2h ago
or emit (or spew) pages of training data or output when you "please change all headers to green", which I experienced recently.
ninetyninenine•2h ago
What you’re referring to is popular opinion. AI has become so pervasive in our lives that we are used to it and the magnitude of achievement has been lost on us. The fact that it went from stochastic parrot to idiot savant to crappy consultant is from people in denial about reality and then slowly coming to terms with it.
In the beginning literally everyone on HN called it a stochastic parrot with the authority of an expert. Clearly they were all wrong.
SketchySeaBeast•2h ago
ninetyninenine•27m ago
bee_rider•2h ago
daveguy•1h ago
* where "polly-want-a-cracker" is some form of existing, common fizz-buzz-ish code.