I'm tired of these posts; LLMs are good for happy-path demos, that's it. And even then, their success rate depends on the prompter already knowing the answer!
Literally any out-of-distribution project in which I used LLMs lead to catastrophic failure. The models can't "see" stuff outside their training data.
semiquaver•24m ago
I legitimately can’t tell if you’re being serious. It kind of seems like you might be trying to parody LLM detractors that will never admit to their usefulness. If you’re serious, why choose to say so in this post, which includes hard evidence that you’re wrong?
behnamoh•19m ago
> which includes hard evidence that you’re wrong?
You should already know what to ask to extract the answer OpenAI claims gpt-5.2-pro gave them.
Then you should be lucky to get an answer that makes sense.
Then you should already know how to verify the model's response.
Only after all these steps should you cherry-pick the one-in-a-million successful response to feature on your website.
And finally, you should prove that the answer didn't already exist in the training data. It's highly likely that the problem was solved before and the model picked that up. I have yet to see a genuinely novel discovery these models can produce.
* I'm an LLM researcher, but that doesn't mean I should close my eyes to the unjustified hype around language models.
MajimasEyepatch•7m ago
According to the post, this result was first derived for gluons in a previous paper. That paper was provided to the model as context, and then the model was asked to derive an analogous result for gravitons, which presumably has not been done before. The authors claim it would have taken "considerable time" for human experts to derive the graviton result.
I don't see any reason to believe that this exact problem was solved before in the training data, but it's definitely an incremental result based on a very similar problem that the model had seen before.
behnamoh•34m ago
Literally any out-of-distribution project in which I used LLMs lead to catastrophic failure. The models can't "see" stuff outside their training data.
semiquaver•24m ago
behnamoh•19m ago
You should already know what to ask to extract the answer OpenAI claims gpt-5.2-pro gave them.
Then you should be lucky to get an answer that makes sense.
Then you should already know how to verify the model's response.
Only after all these steps should you cherry-pick the one-in-a-million successful response to feature on your website.
And finally, you should prove that the answer didn't already exist in the training data. It's highly likely that the problem was solved before and the model picked that up. I have yet to see a genuinely novel discovery these models can produce.
* I'm an LLM researcher, but that doesn't mean I should close my eyes to the unjustified hype around language models.
MajimasEyepatch•7m ago
I don't see any reason to believe that this exact problem was solved before in the training data, but it's definitely an incremental result based on a very similar problem that the model had seen before.