I've been using Grok 4 to write 6502 assembly language and it's been a bit of a slog but honestly the issues I've encountered are due mostly my to naivety. If I'm disciplined and make sure it has all of the relevant information and I'm (very) incremental, I've had some success writing game logic. You can't just tell it to build an entire game in a prompt, but if you're gradual about it you can go places with it.
Like any tool, if you understand its idiosyncrasies you can cater for them, and be productive with it. If you're not then yeah, it's not going to go well.
I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.
It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.
This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.
I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?
LLMs themselves have no goals beyond predicting new words for a document that "fit" the older words. It may turn 2+2 into 2+2=4, but it's not actually doing math with the goal of making both sides equal.
The models don’t have a model of the world. Hence they cannot reason about the world.
Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.
Also, I think OP did rather well considering BASIC is hardly used anymore.
I was actually pretty impressed that it did as well as it did in a largely forgotten language and outdated platform. Looks like a vibe coding win to me.
I have a web site that is sort of a cms. I wanted users to be able to add a list of external links to their items. When a user adds a link to an entry, the web site should go out and fetch a cached copy of the site. If there are errors, it should retry a few times. It should also capture an mhtml single file as well as a full page screenshot. The user should be able to refresh the cache, and the site should keep all past versions. The cached copy should be viewable in a modal. The task also involves creating database entities, DTOs, CQRS handlers, etc.
I asked Claude to implement the feature, went and took a shower, and when I came out it was done.
What settings are you using to get it to just do all of that without your feedback or approval?
Are you also running it inside a container, or setting some sort of command restrictions, or just yoloing it on a regular shell?
There is a lot of nuance in how X is said.
As the context length increases, undesirable things happen.
It uses some built inet ftp tooling thats terrible and barely works, even internally anymore.
We are replacing it with a winscp implementation since winscp can talk over a COM object.
unsuprisingly the COM object in basic works great - the problem is that I have no idea what I am doing. I spent hours doing something like
WINSCP_SESSION'OPEN(WINSCP_SESSION_OPTIONS)
when i needed
WINSCP_SESSION'OPEN(*WINSCP_SESSION_OPTIONS)
It was obvious after because it was a pointer type of setup, but i didnt find it until pages and pages deep into old PDF manuals.
However the vibecode of all the agents did not understand the syntax of the system, it did help me analyse the old code, format it, and at least throw some stuff at the wall.
I finished it up friday, hopefully i deploy monday.
It's believable that we might either see an increase in the number of new programming languages since making new languages is becoming more accessible, or we could see fewer new languages as the problems of the existing ones are worked around more reliably with LLMs.
Yet, what happens to adoption? Perhaps getting people to adopt new languages will be harder as generations come to expect LLM support. Would you almost need to use LLMs to synthesize tons of code examples that convert into the new language to prime the inputs?
Once conversational intelligence machines reach a sort of godlike generality, then maybe they could very quickly adapt languages from much fewer examples. That still might not help much with the gotchas of any tooling or other quirks.
So maybe we'll all snap to a new LLM super-language in 20 years, or we could be concreting ourselves into the most popular languages of today for the next 50 years.
However I will just mention a few things. When you make an article like this please take note of the particular language model used and acknowledge that they aren't all the same.
Also realize that the context window is pretty large and you can help it by giving it information from manuals etc. so you don't need to rely on the intrinsic knowledge entirely.
If they used o3 or o3 Pro and gave it a few sections of the manual it might have gotten farther. Also if someone finds a way to connect an agent to a retro computer, like an Atari BASIC MCP that can enter text and take screenshots, "vibe coding" can work better as an agent that can see errors and self-correct.
It's the absolute proof that they are still dumb prediction machines, fully relying on the type of content they've been trained on. They can't generalize (yet) and if you want to use them for novel things, they'll fail miserably.
I run HomeAssistant, I don't get to play/use it every day. Here, LLM's excel at filling in the (legion) of blanks in both the manual and end user devices. There is a large body of work for it to summarize and work against.
I also play with SBC's. Many of these are "fringe" at best. LLM's are as you say "not fit for purpose".
What kind of development you are using LLM's for will determine your experience with them. The tool may or may not live up to the hype depending how "common", well documented and "frequent" your issue is. Once you start hitting these "walls" you realize that no, real reason, leaps of inference and intelligence are still far away.
If I would program Atari Basic, after finishing my Atari Emulator on my C64, I would learn the environment and test my assumptions. Single shot LLMs questions won't do it. A strong agent loop could probably.
I believe that LLMs are yanking the needle to 80%. This level is easy achievable for professionals of the trade and this level is beyond the ability of beginners. LLMs are really powerful tools here. But if you are trying for 90% LLMs are always trying to keep you down.
And if you are trying for 100%, new, fringe or exotic LLMs are a disaster because they do not learn and do not understand, even while being inside the token window.
We learn that knowledge, (power) and language proficiency are an indicator for crystalline but not fluid intelligence
80 percent of what, exactly? A software developer's job isn't to write code, it's understanding poorly-specified requirements. LLMs do nothing for that unless your requirements are already public on Stackoverflow and Github. (And in that case, do you really need an LLM to copy-paste for you?)
This comment is detached from reality. LLMs in general have been proven to be effective at even creating complete, fully working and fully featured projects from scratch. You need to provide the necessary context and use popular technologies with enough corpus to allow the LLM to know what to do. If one-shot approaches fail, a few iterations are all it takes to bridge the gap. I know that to be a fact because I do it on a daily basis.
Cool. How many "complete, fully working" products have you released?
Must be in the hundreds now, right?
Jokes aside, they are pretty different languages. I imagine you'd have much better luck going from .Net to Java.
Especially when asking the LLM to create a drawing program and a game the author would have probably received working code if he supplied the ai with documentation to the graphics function and sprite rendering using ATARI BASIC.
It confirm a bias for some, it triggers others who might have the opposite position (and maybe have a bias too on the other end).
Perfect combo for successful social media posts... literally all about "attention" from start to finish.
For example "Prompt: Write me an Atari BASIC program that draws a blue circle in graphics mode 7."
You need to know that there are various graphics modes and that mode 7 is the best for your use-case. Without that preexisting knowledge, you get stuck very quickly.
I started another project recently basically vibe coding in PHP. Instead of a single page app like I made before, it's just page by page single loading. Which means the AI also only needs to keep a few functions and the database in its head, not constantly work on some crazy ui management framework (what that's called).
It's made in a few days what would have taken me weeks as an amateur. Yet I know enough to catch a few 'mistakes' and remind it to do it better.
I'm happy enough.
I still agree with you for large applications but for these simple examples anyone with a basic understanding of vibe coding could wing it.
I believe many in this debate are conflating tools and magic wands.
For those that don't know. x87 was the FPU for 32-bit x86 architectures. It's not terribly complicated, but it uses stack-based register addressing with a fixed size (eight entry) stack.
All operations work on the top-of-stack register and one other register operand, and push the result onto the top of the stack (optionally popping the previous top of stack before the push).
It's hard but not horribly so for humans to write.. more a case of annoyingly slow and having to be methodical, because you have to reason about the state of the stack at every step.
I'd be very curious as to whether a token-prediction machine can get anywhere with this kind of task, as it requires a strong mental model of what's actually happening, or at least the ability to consistently simulate one as intermediate tokens/words.
In the error feedback cycle, it kept blaming Go, not itself. A bit eye opening.
When I struggle to write Go ASM, I also blame Go and not myself.
A man visits his friend's house. There is a dog in the house. The friend says that the dog can play poker. The man is incredulous, but they sit at a table and have a game of poker; the dog actually can play!
The man says: "Wow! Your dog is incredibly, fantastically smart!"
The friend answers: "Oh, well, no, he's a naïve fool. Every time he gets a good hand, he starts wagging his tail."
Whether you see LLMs impressively smart or annoyingly foolish depends on your expectations. Currently they are very smart talking dogs.
"I taught my dog to whistle!"
"Really? I don't hear him whistling."
"I said I taught him, not that he learnt it."
For example, if the llm had a compile tool it would likely have been able to correct syntax errors.
Similarly, visual errors may also have been caught if it were able to run the program and capture screens.
firesteelrain•5h ago
Try a local LLM then train it
ofrzeta•4h ago
How do you do this?
sixothree•3h ago
firesteelrain•5m ago
oharapj•2h ago
firesteelrain•4m ago
1. Gather training data
2. Format it into JSONL or Hugging Face Dataset format
3. Use Axolotl or Hugging Face peft to fine-tune
4. Export model to GGUF or HF format
5. Serve via Ollama or llama.cpp