When you paste a screenshot of a broken UI and it immediately spots the misaligned div or padding issue—is it actually doing visual analysis, or just pattern-matching against common UI bugs from training data?
The speed feels almost too fast for real vision processing. And it seems to understand spatial relationships and layout in a way that feels different from just describing an image.
Are these tools using standard vision models or is there preprocessing? How much comes from the image vs. surrounding code context?
Anyone know the technical details of what's actually happening under the hood?
yawpitch•1h ago
That said, one can reasonably infer that an LLM-based system isn’t doing any form of visual processing at all… it’s just looking at your HTML and CSS and flagging where it diverts from the statistical mean of all such structures in the training data (modulo some stochastic wandering and that it may, somehow, have mixed some measure of Rick Astley or Goatse into its multidimensional lookup table).