My own experience using Claude Code and similar tools tells me that "hidden requirements" could include:
* Make sure DESIGN.md is up to date
* Write/update tests after changing source, and make sure they pass
* Add integration test, not only unit tests that mock everything
* Don't refactor code that is unrelated to the current task
...
These are not even project/language specific instructions. They are usually considered common sense/good practice in software engineering, yet I sometimes had to almost beg coding agents to follow them. (You want to know how many times I have to emphasize don't use "any" in a TypeScript codebase?)
People should just admit it's a limitation of these coding tools, and we can still have a meaningful discussion.
Humans would execute that code and validate it. From plausible it'd becomes hey, it does this and this is what I want. LLMs skip that part, they really have no understanding other than the statistical patterns they infer from their training and they really don't need any for what they are.
What a shame your human reasoning and "true understanding" led you astray here.
They are buying a service. As long as the service 'works' they do not care about the other stuff. But they will hold you liable when things go wrong.
The only caveat is highly regulated stuff, where they actually care very much.
Just copy and paste from an open source relational db repo
Easy. And more accurate!
Cherry picked AI fail for upvotes. Which you’ll get plenty of here an on Reddit from those too lazy to go and take a look for themselves.
Using Codex or Claude to write and optimize high performance code is a game changer. Try optimizing cuda using nsys, for example. It’ll blow your lazy little brain.
A few tips for a quickstart:
Give yourself permission to play.
Understand basic concepts like context window, compaction, tokens, chain of thought and reasoning, and so on. Use AI to teach you this stuff, and read every blog post OpenAI and Anthropic put out and research what you don't understand.
Pick a hard coding problem in Python or Typescript and take a leap of faith and ask the agent to code it for you.
My favorite phrase when planning is: "Don't change anything. Just tell me.". Save this as a tmux shortcut and use it at the end of every prompt when planning something out.
Use markdown .md docs to create a planning doc and keep chatting to the agent about it and have it update the plan until you're super happy, always using the magic phrase "Don't change anything. Just tell me." (I should get myself a patent on that little number. Best trick I know)
Every time you see an anti-AI post, just move on. It's lazy people making lazy assumptions. Approach agentic coding with a sense of love, excitement, optimism, and take massive leaps of faith and you'll be very very surprised at what you find.
Best of luck Serious Angel.
Your answer is to play with it. Cool. But why cant you and others put together a proper guide lol? It cant be that hard.
Go ahead and do it - it'll challenge the Anti-AI posters you are referencing. I and others want to see that debate.
One of the rare resources I found recently was the OpenClaw guys interview on Lex. He drops a few bangers that are really valuable and will save you having to spend a long time figuring it out.
Also there's a very strong disincentive for anyone to write right now because we're competing against the noise and the slop in the space. So best to just shut the fuck up and create as fast as we can, and let the outcome speak for itself. You're going to see a lot more products like OpenClaw where the pace of innovation is rapid, and the author freely admits that they're coding agentically and not writing a single line.
I think the advantage that Peter has (openclaw author) is that he has enough money and success to not give a fuck about what people say re him writing purely agentically, so he's been very open about it which has been great for others who are considering doing the same.
But if you have a software engineering career or are a public figure with something to lose, you tend to STFU if you're doing pure agentic coding on a project.
But that'll change. Probably over the next few months. OpenClaw broke the ice.
Electronic synthesisers went from "it's a piano, but expensive and sounds worse" to every weird preset creating a whole new genre of electronic music.
So it seems plausible, like Claude's code, that our complaints about unmaintainable code are from trying to use it like a piano, and the rave kids will find a better use for it.
This series of articles is gold.
Unsurprisingly, writing good software with AI follows the same principles as writing it without AI. Keep scopes small. Ship, refactor, optimize, and write tests as you go.
You're glossing over so much stuff. Moreover, how does the Junior grow and become the senior with those characteristics, if their starting point is LLMs?
Related:
- <http://archive.today/2026.03.07-020941/https://lr0.org/blog/...> (I'm not consulting an LLM...)
- <https://web.archive.org/web/20241021113145/https://slopwatch...>
If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
If you tell them the code is slow, they'll try to add optimized fast paths (more code), specialized routines (more code), custom data structures (even more code). And then add fractally more code to patch up all the problems that code has created.
If you complain it's buggy, you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
If you ask to unify the duplication, it'll say "No problem, here's a brand new metamock abstract adapter framework that has a superset of all feature sets, plus two new metamock drivers for the older and the newer code! Let me know if you want me to write tests for the new adapters."
Nevermind the fact that it only migrated 3 out of 5 duplicated sections, and hasn’t deleted any now-dead code.
You need to do this when coding manually as well, but the speed at which AI tools can output bad code means it's so much more important.
I think programming is giving people a false impression on how intelligent the models are, programmers are meant to be smart right so being able to code means the AI must be super smart. But programmers also put a huge amount of their output online for free, unlike most disciplines, and it's all text based. When it comes to problem solving I still see them regularly confused by simple stuff, having to reset context to try and straighten it out. It's not a general purpose human replacement just yet.
Are you using plan mode? I used to experience the do a poor approach and dig issue, but with planning that seems to have gone away?
It's a tool. It's a wildly effective and capable tool. I don't know how or why I have such a wildly different experience than so many that describe their experiences in a similar manner... but... nearly every time I come to the same conclusion that the input determines the output.
> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
Yes, when the prompt/instructions are overly broad and there's no set of guardrails or guidelines that indicate how things should be done... this will happen. If you're not using planning mode, skill issue. You have to get all this stuff wrapped up and sorted before the implementation begins. If the implementation ends up being done in a "not-so-great" approach - that's on you.
> If you tell them the code is slow
Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized. Then you discuss those options with it (this is where you do the part from the previous paragraph, where you direct the approach so it doesn't do "no-so-great approach") until you get to a point where you like the approach and the model has shown it understands what's going on.
Then you accept the plan and let the model start work. At this point you should have essentially directed the approach and ensured that it's not doing anything stupid. It will then just execute, it'll stay within the parameters/bounds of the plan you established (unless you take it off the rails with a bunch of open ended feedback like telling it that it's buggy instead of being specific about bugs and how you expect them to be resolved).
> you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
This is an area I will agree that the models are wildly inept. Someone needs to study what it is about tests and testing environments and mocking things that just makes these things go off the rails. The solution to this is the same as the solution to the issue of it keeping digging or chasing it's tail in circles... Early in the prompt/conversation/message that sets the approach/intent/task you state your expectations for the final result. Define the output early, then describe/provide context/etc. The earlier in the prompt/conversation the "requirements" are set the more sticky they'll be.
And this is exactly the same for the tests. Either write your own tests and have the models build the feature from the test or have the model build the tests first as part of the planned output and then fill in the functionality from the pre-defined test. Be very specific about how your testing system/environment is setup and any time you run into an issue testing related have the model make a note about that and the solution in a TESTING.md document. In your AGENTS.md or CLAUDE.md or whatever indicate that if the model is working with tests it should refer to the TESTING.md document for notes about the testing setup.
Personally, I focus on the functionality, get things integrated and working to the point I'm ready to push it to a staging or production (yolo) environment and _then_ have the model analyze that working system/solution/feature/whatever and write tests. Generally my notes on the testing environment to the model are something along the lines of a paragraph describing the basic testing flow/process/framework in use and how I'd like things to work.
The more you stick to convention the better off you'll be. And use planning mode.
It's probably a good idea to improve your test suite first, to preserve correctness.
Anything they happen to get "correct" is the result of probability applied to their large training database.
Being wrong will always be not only possible but also likely any time you ask for something that is not well represented in it's training data. The user has no way to know if this is the case so they are basically flying blind and hoping for the best.
Relying on an LLM for anything "serious" is a liability issue waiting to happen.
If you've made a significant investment in human capital, you're even more likely to protect it now and prevent posting valuable stuff on the web.
This means an LLM can autogenerated millions of code problem prompts, attempt millions of solutions (both working and non-working), and from the working solutions, penalize answers that have poor performance. The resulting synthetic dataset can then be used as a finetuning dataset.
There are now reinforcement finetuning techniques that have not been incorporated into the existing slate of LLMs that will enable finetuning them for both plausibility AND performance with a lot of gray area (like readability, conciseness, etc) in between.
What we are observing now is just the tip of a very large iceberg.
If Im the govt, Id be foaming at the mouth - those projects that used to require enormous funding now will supposedly require much less.
Hmmm, what to do? Oh I know. Lets invest in Digital ID-like projects. Fun.
https://news.ycombinator.com/item?id=47280645
It is more about LLMs helping me understand the problem than giving me over engineered cookie cutter solutions.
idk what to say, just because it's rust doesn't mean it's performant, or that you asked for it to be performant.
yes, llms can produce bad code, they can also produce good code, just like people
I don't always write correct code, either. My code sure as hell is plausible but it might still contain subtle bugs every now and then.
In other words: 100% correctness was never the bar LLMs need to pass. They just need to come close enough.
Write a lambda that takes an S3 PUT event and inserts the rows of a comma separated file into a Postgres database.
Naive implementation: download the file from s3 and do a bulk insert - it would have taken 20 minutes and what Claude did at first.
I had to tell it to use the AWS sql extension to Postgres that will load a file directly from S3 into a table. It took 20 seconds.
I treat coding agents like junior developers.
They just write code that is (semantically) similar to code (clusters) seen in its training data, and which haven't been fenced off by RLHF / RLVR.
This isn't that hard to remember, and is a correct enough simplification of what generative LLMs actually do, without resorting to simplistic or incorrect metaphors.
marginalia_nu•2h ago
No exaggeration it floundered for an hour before it started to look right.
It's really not good at tasks it has not seen before.
tartoran•2h ago
vdfs•2h ago
mekael•2h ago
tartoran•2h ago
marginalia_nu•2h ago
Interesting shortcoming, really shows how weak the reasoning is.
cat_plus_plus•2h ago
comex•2h ago
Opus would probably do better though.
tartoran•2h ago
msephton•2h ago
comex•2h ago
Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles:
https://simonwillison.net/tags/pelican-riding-a-bicycle/
But they're still not very good.
tartoran•1h ago
boxedemp•37m ago
tempest_•2h ago
astrange•2h ago
jshmrsn•2h ago
Given a harness that allows the model to validate the result of its program visually, and given the models are capable of using this harness to self correct (which isn't yet consistently true), then you're in a situation where in that hour you are free to do some other work.
A dishwasher might take 3 hours to do for what a human could do in 30 minutes, but they're still very useful because the machine's labor is cheaper than human labor.
marginalia_nu•2h ago
TBH I would have just rendered a font glyph, or failing that, grabbed an image.
Drawing it with vector graphics programmatically is very hard, but a decent programmer would and should push back on that.
zeroxfe•2h ago
If an LLM did that, people would be all up in arms about it cheating. :-)
For all its flaws, we seem to hold LLMs up to an unreasonably high bar.
marginalia_nu•1h ago
Just about anyone can eventually come up with a hideously convoluted HeraldicImageryEngineImplFactory<FleurDeLis>.
ehnto•2h ago
I think some industries with mostly proprietary code will be a bit disappointing to use AI within.
internet2000•1h ago
It basically just re-created the wikipedia article fleur-de-lis, which I'm not sure proves anything beyond "you have to know how to use LLMs"
robertcope•33m ago