E.g. syllogistic arguments based on linguistic semantics can lead you deeply astray if you those arguments don't properly measure and quantify at each step.
I ran into this in a somewhat trivial case recently, trying to get ChatGPT to tell me if washing mushrooms ever really actually matters practically in cooking (anyone who cooks and has tested knows, in fact, a quick wash has basically no impact ever for any conceivable cooking method, except if you wash e.g. after cutting and are immediately serving them raw).
Until I forced it to cite respectable sources, it just repeated the usual (false) advice about not washing (i.e. most of the training data is wrong and repeats a myth), and it even gave absolute nonsense arguments about water percentages and thermal energy required for evaporating even small amounts of surface water as pushback (i.e. using theory that just isn't relevant when you actually properly quantify), and only after a lot of prompts and demands to only make claims supported by reputable sources, did it finally find McGee's and Kenji Lopez's actual empirical tests showing that it just doesn't matter practically.
D-Machine•1h ago
It is somewhat complicated by the fact LLMs (and VLMs) are also trained in some cases on more than simple language found on the internet (e.g. code, math, images / videos), but the same insight remains true. The interesting question is to just see how far we can get with (2) anyway.