At the end of the day, programming languages are an abstraction to make development easier for humans. Using an LLM to write code is like having someone else chew your food and spit it into your mouth--it gets the job done, but it would be faster to just cut out the middleman.
To that end, I'm using a diffusion model to generate images for a Von Neumann VM. The model treats the VM machine state as an image (literally saveable as a .bmp) with pixels representing bits. To reduce nondeterminism and noise, several direct x0 predictions are made and their logits are averaged, with the result thresholded into binary pixels.
Because the diffusion model may still make occasional pixel-level errors, the image stores important logical bits multiple times, and the decoded value is chosen by majority vote.
It's not perfect (and right now it's just capable of basic arithmetic) but the result is an executable image, generated in one-shot, stored in a small 72x72px .bmp.
I'd love to hear some thoughts on this!