Compared to the original simple HTML site it’s really surprising to see from the grugbrain.dev author!
it is using astro, we are scaling down the use of tailwind (I wanted to give it a try, but didn't really click with it.)
I don't mind someone doing something kind of fun with the website and trying something new out, I know some people don't like it but some people do. All good.
clear text with minimal markup has many desirable properties IMHO
It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.
LLM, being a tiresome little helper, will gladly output hundreds of lines, hacks, and what have you.
I don’t think any amount of tests, prompts, harnesses and other “my shaman is a better shaman” will help it to acquire this trait. Some other AI architecture someday maybe — just not today.
And that’s why it is good at what it is and really bad at stuff like code “design” (unless it is a well-known solution being baked in the training set)
We actually have pretty good models for how long it takes to forget things. It's the same basic math that powers Anki. To oversimplify, if you force yourself to remember something right before you would have otherwise forgetten it, you will remember it roughly 2.5 times as long before forgetting it again. (This changes at both the shortest time intervals and the longer ones, so treat it as a rough rule of thumb, not an exact formula.)
But this provides a handy bound! If you've been doing something professionally for 20 years, you should expect to remember it for another 50. At which point you're likely well into old-age, and memory performance may decrease for other reasons.
Where AI kills you is actually at the other end: initial learning. You are much less likely to need to recall something after 1 day, 2.5 days, 6.25 days, etc. And thanks to the lack of the "testing effect", memory formation will be much weaker.
In other words, I would naively expect AI to make long-used skills a bit rusty, but to drastically impede formation of new skills and knowledge.
It’s not human, of course, and I think this problem actually relates to the fact that LLMs don’t have a world model. They don’t study and think through a design in the way that humans do. They don’t form a mental model of how everything fits together and how that design can be tweaked to most elegantly support a change.
I suspect that this is a fundamental limitation of LLMs, and that design will remain a weak point until some sort of bespoke design AI is bolted onto the side. In the meantime, we’ve got a lot of people producing a lot of code very quickly, and I think the debt in that code is going to be a millstone around our necks for a long time to come.
I disagree. Have a conversation with it about your problem and work through design decisions with it. When I do that, I find it gives me a lot of good ideas.
Disclaimer: I'm not working on anything groundbreaking (like most people)
Nobody knows everything, so of course LLMs can be useful sometimes. More useful than plain old search, books, or even discussion with real humans? Maybe.
Search can offer a much broader context than an LLM hyperfocused on just generating text. Books may lead you to realize you were asking the wrong questions. Discussions will provide an overall "vibe" of the topic.
These are not competing options. We can and should be using all of them when possible.
Many developer criticism of AI coders could be easily directed at 95%+ of human developers. Much coding is monkey see, monkey do and keep trying until it does the things we want it to do. AI can certainly do that cheaper and faster and really this is why automated testing became such an important software discipline with or without AI.
The second issue is: what was tooling and the prompt approach?
(To be clear, I have no problem with the premise of the write up. But without some details like this, it's sort of like saying "I had a bad board on my deck, and my tape measure wasn't able to help me remove the nails. What a bad tape measure."
The series of prompts weren't particularly interesting or innovative on my part: a paste in of the user report then a few back and forths on fixing it, me reviewing the changes and coming up with the final answer.
Shameless plug: https://open.substack.com/pub/deimos28/p/the-friction-collap...
i tried it before with sonnet and the results weren't very good
went back to react
This made me chuckle. I will steal this from you.
I've used with great success prompts like "when implementing this feature, did you encounter sections of code that were needlessly complex, that were making it hard for you to work? what would you change in the design/architecture to make it leaner?"
I have found being Socratic in my questions, and trying to get the AI to arrive at my intended design via such conversations supplies the right level of context for properly solving the problem. It’s token intensive, without a doubt, but I find the result is the AI tends to be better equipped to handle the many micro decisions that need to be made along the way.
The contrast to this is I give it a detailed prompt where it then asks questions of me, which also generally works but I find the AI tends to not be as well equipped for decisions it needs to make mid implementation.
It’s not perfect, and maybe not even a good fit for some. I also never know what to think when people tell me their idiosyncratic ways of using AI. Ultimately I think the most effective way is whatever lets you translate the vision in your head into the end result.
But the problem is that when you ask ai to solve a problem on its own, its default plan can suck. You can mitigate that by research and context but it doesn't mean the initial problem is solved. But even that requires skill and human judgement (both ai conversation research or traditional research) and a lot of people want to skip that entirely.
This article will be part of the next model training set, and probably it will be able to solve it despite not understanding anything about world or not studying or thinking.
Planner / executor separation can make a huge difference in performance. LLMs are fantastic at coming up with a lot of elaborate narratives regarding what should be done. They are terrible about doing that prescribed work all at once. This impedance mismatch is best resolved with a simple role separation. Placing a shared collection of tasks between these roles is how you can decouple them. The executors need significantly more tokens than your planners to get the job done. It's probably in the range of 10-100x more for really complicated jobs with a lot of iterations through compiler feedback, sql provider errors, etc. This is why you can't do both things in the same context very well.
They turned the english language into enterprise java and my train of thought is now a series of NullPointerExceptions
- Start in ask mode - "I'm planning on doing X to achieve Y; are there any alternative approaches? What problems might I run into?"
- Chat for a bit and get the high level approach, switch to plan mode and ask for a nicely formatted plan
- What's kicked out is already in the rough shape of the discussion so far, so it's a case of following a nicely formatted doc through and highlighting sections of text and asking for clarification or changes
- Hitting "build" and then reviewing what's been done
For a new service I might spend an hour in ask/plan mode - but then it gets 95% of the build itself right first time.
Do you do the same with different results, or is there a different stack/methodology you go through?
I suspect there's also a strong sociological bias at play: LLMs are being made by people who are familiar with coding but aren't software engineers. So they design their RL policies around the idea that the LLM must learn how to code, not that they must learn to design a maintenable piece of software.
Part of that is critical thinking and projecting forward / simulating potential issues, and part of that is that memory which in humans we probably would see as "wisdom".
I don't know if that's a fundamental limitation of LLMs, or, rather, that this can be solved moving forward with better memory systems, harnesses, and context windows.
Instead of asking it to generically to analyze and do X, you can use brainstorming skills like those from superpowers [1].
This makes it approach the problem better and keeps you in the loop.
Another step is then to have it review its plans by another LLM acting doing adversarial review. I have a claude skill [2] that calls codex to do it, and they chat among each other.
It's a tremendous boost in design quality.
[1] https://github.com/obra/Superpowers
[2] https://gist.github.com/enricopolanski/6c5038a8e20cc4098cd99...
recursivedoubts•17h ago
it was a rather mundane bug, but i thought the interaction was interesting and worth analyzing to show where AI is very strong and where it is not as strong
hugeBirb•11h ago
AloysB•10h ago
The example is mundane but to the point; and I very much enjoyed this article. It's a concrete example which is rare to read when it comes to using LLMs.
To the risk of being told that we "hold it wrong", it resonates with my experience of using LLMs.