Is this 4o-mini, 4o, o4-mini, o4-mini-high, o3 (etc)?
No matter how strange their naming, they all vary dramatically in problems like that presented in the article.
Which one the experience was with is critical to drawing any kind of conclusions.
all those hints we receive from LLM use make us return to the point, "but can it do procedural reasoning - is it reliable"?
That something proceeds in a direction of "being more and more convincing" is a bad direction, detrimental, not progressive, when what counts is "actually having the juice".
> Is this ...? ... they all vary dramatically in problems
If the above were achieved, it would be an architectural revolution, and we would have been informed. If it is "more of the same, but more advanced", then the submitted shows a structural problem.
o3, for example, nailed it on the first try: https://chatgpt.com/share/68231cfc-d258-8013-aad2-5115eba880...
paul7986•4h ago
Now it can't do that but does some cool things based off of what you try with it like recently in an Icelandic restaurant I took a pic of the menu, uploaded to GPT and asked it to create a mirror image yet show me the menu in English and US dollars (not Icelandic Krona). That was very handy as I then shared it with my travel friends in the restaurant and those in the hotel.
Overall love hearing how people are using it uniquely too! Used it to count my calories for a year as I eat out daily at healthy chains (Cava and others) and GPT can easily grab calories from their sites and calculate.