That's what working with GPT-5-Codex on actual code also feels like.
I wonder if there is a consistent way to force structural revisions. I have found Nano Banana particularly terrible at revisions, even something like "change the image dimensions to..." it will confidently claim success but do nothing.
I think that the problem here is that svg is structured information and an image is unstructured blob, and the translation between them requires planning and understanding. Maybe if instead of treating an svg like a raster image in the prompt is wrong. I think that prompting the image like code (which svg basically is) would result in better outputs.
This is just my uninformed opinion.
And as you say, they cheerfully assert that they've done the job, for real this time, every time.
The naive approach that gets you results like ChatGPT is to produce output tokens based on the prompt and generate a new image from the output. It is really difficult to maintain details from the input image with this approach.
A more advanced approach is to generate a stream of "edits" to the input image instead. You see this with Gemini, which sometimes maintains original image details to a fault; e.g. it will preserve human faces at all cost, probably as a result of training.
I think the round-trip through SVG is an extreme challenge to train through and essentially forces the LLM to progressively edit the SVG source, which can result in something like the Gemini approach above.
This wasn’t just “add more details”—it was “make this mechanically coherent.”
The overall text doesn’t appear to be AI written, making this all the more confusing. Is AI making people write this way now on their own? Or is it actually written by an LLM and just doesn’t look like it?This is a better version of what I tried but suffers from the same problem - the models seem to stick close to their original shapes and add new details rather than creating an image from scratch that's a significantly better variant of what they tried originally.
For the svg generation, it would be an interesting experiment to seed it with increasingly poor initial images and see at what point if any the models don’t anchor on the initial image and just try something else
If first output is crappy, the next 3 iterations will improve the same crap.
This was not a good test.
davesque•2h ago