When you paste a screenshot of a broken UI and it immediately spots the misaligned div or padding issue—is it actually doing visual analysis, or just pattern-matching against common UI bugs from training data?
The speed feels almost too fast for real vision processing. And it seems to understand spatial relationships and layout in a way that feels different from just describing an image.
Are these tools using standard vision models or is there preprocessing? How much comes from the image vs. surrounding code context?
Anyone know the technical details of what's actually happening under the hood?
yawpitch•4mo ago
That said, one can reasonably infer that an LLM-based system isn’t doing any form of visual processing at all… it’s just looking at your HTML and CSS and flagging where it diverts from the statistical mean of all such structures in the training data (modulo some stochastic wandering and that it may, somehow, have mixed some measure of Rick Astley or Goatse into its multidimensional lookup table).
chistev•4mo ago
Please explain
iswapna_•4mo ago
chistev•4mo ago
muzani•4mo ago
yawpitch•4mo ago
iswapna_•4mo ago
yawpitch•4mo ago
You know none of the variables. None of the constants. None of the exponents. No one does, really, but even if you did it wouldn’t help, because no one bothered to write down the operators and the parentheses are randomly shifted around every time the equation is resolved.
All you know is that if you ask it for tea, it will always, invariably, and forever, give you back something that is almost, but not quite entirely, unlike tea. Sometimes it might be more unlike coffee, some times more unlike vodka and cow urine.
What you’ll never, ever, ever reliably know is what’s in the cup.
That’s about the best way I know to explain black box abstractions. In a few decades we might have a workable theory as to why these things function, to the degree that they do, though I’ll bet a rather large amount of money that we won’t.