It works well and can be used for a lot of things, but still.
In contrast, human thinking doesn’t involve picking a word at a time based on the words that came before. The mechanics of language can work that way at times - we select common phrasings because we know they work grammatically and are understood by others, and it’s easy. But we do our thinking in a pre-language space and then search for the words that express our thoughts.
I think kids in school ought to be made to use small, primitive LLMs so they can form an accurate mental model of what the tech does. Big frontier models do exactly the same thing, only more convincingly.
1. Model architecture. Calculation of outputs from inputs.
2. Training algorithm, alters parameters in the architecture based on training data, often input, outputs vs. targets, but can be more complex than that.
3. The class of problem being solved, i.e. approximation, prediction, etc.
4. The actual instance of the problem being solved, i.e. approximation of chemical reaction completion vs. temperature, or prediction of textual responses.
5. The embodiment of the problem, i.e. the actual data. How much, how complex, how general, how noisey, how accurate, how variable, how biased, ...?
6. The algorithm that is actually learned from (5) in the form of (3), in order to perform (4), which has no limit in complexity, or sub-problems, which must be solved for successful results.
Data can be unbounded in complexity. Therefore, actual (successful) solutions are necessarily unbounded in complexity.
The "no limit, unbounded" part of (6) is missed by many people. To perform accurate predictions, of say the whole stock market, would require a model to learn everything from economic theory, geopolitics, human psychology, natural resources and their extraction, crime, electronic information systems and their optimizations, game theory, ...
That isn't a model I would call "just a stock price predictor".
The misconception that training a model to predict, creates something that "just" predicts is prevalent but ... well I struggle for words to describe how deeply ignorant, wrong, category-violating that misconception is.
Human language is an artifact created by complex beings. A high level of understanding of how those complex beings operate in conversation, writing, speeches, legal theory, .... on and on ... their knowledge, their assumptions, their modeling of each other in their interactions, ... on and on ... becomes necessary to mimic general written artifacts between people even a little bit.
LLM's, at the point of being useful, were never "just" prediction machines.
I am astonished there were technical people still saying such a thing.
And also, they are still "just predicting the next word", literally in terms of how they function and are trained. And there are still cases where it's useful to remember this.
I'm thinking specifically of chat psychosis, where people go down a rabbit hole with these things, thinking they're gaining deep insights because they don't understand the nature of the thing they're interacting with.
They're interacting with something that does really good - but fallible - autocomplete based on 3 major inputs.
1) They are predicting the next word based on the pre-training data, internet data, which makes them fairly useful on general knowledge.
2) They are predicting the next word based on RL training data, which causes them to be able to perform conversational responses rather than autocomplete style responses, because they are autocompleting conversational data. This also causes them to be extremely obsequious and agreeable, to try to go along with what you give them and to try to mimic it.
3) They are autocompleting the conversation based on your own inputs and the entire history of the conversation. This, combined with 2), means you are, to a large extent, talking yourself, or rather something that is very adept at mimicing and going along with your inputs.
Who, or what, are you talking to when you interact with these? Something that predicts the next word, with varying accuracy, based on a corpus of general knowledge plus a corpus of agreeable question/answer format plus yourself. The general knowledge is great as long as it's fairly accurate, the sycophantic mirror of yourself sucks.
anishgupta•1h ago