So what is the software development task that this plane excels at? Other than bullshitting one's manager.
It is not supposed to find an answer that matches my persistence, its supposed to tell the truth or admit that it does not know. And even if there is an alabamer in the training set, that is either something else, not a US state, or a misspelling, in neither case should it end up on the list.
You just like the title?
But that doesn't mean that it is not extremely useful. It only means I shouldn't ask it to spell stuff.
If a human is unable to count the n's in 'banana' we expect them to be barely functional. Articles like this one try to draw the same inference about the LLM: it can't count 'n's, so it must not be able to do anything else either.
But it's a bad argument, and I'm tired of hearing it.
Your overall conclusion though seems a little free of context. Average people (i.e. my mom googling something) absolutely do not have the wherewithal to keep track of the various pros and cons of the underlying system that generates the magical giant blue box at the top of their search that has all the answers. They are being deliberately duped by the salesmen-in-chief of these giant companies, as are all of their investors.
LLMs are also bad at many things that humans don't notice immediately.
That is a problem because it leads humans to trust LLMs with tasks at which LLMs currently are bad, such as picking stocks, screening job applicants, providing life advice...
[0] e.g., by promoting AIs as having equivalent capacities of humans of various education levels because they could pass tests that were part of the standards for, and correlate for humans with other abilities of, people with that educational background.
OpenAI even claims "reasoning" is available.
> Built-in agents – deep research, ChatGPT agent, and Codex can reason across your documents, tools, and codebases to save you hours
I have been waiting for GPT 5 to hit my account and kept asking it the model, it was 4o until this morning.
Then this morning it said it was GPT 5 and would I like to code and design a stress test for it to compete against 4o, it kept assisting this was something I should do even though I didn't ask and then kept skirting around it when I told it to do it, before it realised it couldn't.
eurekin•6mo ago