I don't know how you get here from "predict the next word."

https://www.grumpy-economist.com/p/refine

100•qsi•2h ago

Comments

pushedx•1h ago

Yes, most people (including myself) do not understand how modern LLMs work (especially if we consider the most recent architectural and training improvements).

There's the 3b1b video series which does a pretty good job, but now we are interfacing with models that probably have parameter counts in each layer larger than the first models that we interacted with.

The novel insights that these models can produce is truly shocking, I would guess even for someone who does understand the latest techniques.

measurablefunc•1h ago

What's the latest novel insight you have encountered?

brookst•57m ago

Not the person you asked, and “novel” is a minefield. What’s the last novel anything, in the sense you can’t trace a precursor or reference?

But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.

I’m sure there’s prior art out there, but that’s true for pretty much everything.

measurablefunc•50m ago

I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.

I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.

pushedx•41m ago

Which agents are you using, and are you using them in an agent mode (Codex, Claude Code etc.)?

The difference in quality of output between Claude Sonnet and Claude Opus is around an order of magnitude.

The results that you can get from agent mode vs using a chat bot are around two orders of magnitude.

measurablefunc•21m ago

The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...).

brookst•26m ago

It’s taken me a while to get good at using them.

My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.

Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”

measurablefunc•11m ago

The specification is linked below & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.

auraham•1h ago

I highly recommend Build a large language model from scratch [1] by Sebastian Raschka. It provides a clear explanation of the building blocks used in the first versions of ChatGPT (GPT 2 if I recall correctly). The output of the model is a huge vector of n elements, where n is the number of tokens in the vocabulary. We use that huge vector as a probability distribution to sample the next token given an input sequence (i.e., a prompt). Under the hood, the model has several building blocks like tokenization, skip connections, self attention, masking, etc. The author makes a great job explaining all the concepts. It is very useful to understand how LLMs works.

[1] https://www.manning.com/books/build-a-large-language-model-f...

phreeza•55m ago

But this is missing exactly the gap which OP seems to have, which is going from a next token predictor (a language model in the classical sense) to an instruction finetuned, RLHF-ed and "harnessed" tool?

belZaah•1h ago

It’s called emergent behavior. We understand how an llm works, but do not have even a theory about how the behavior emerges from among the math. We understand ants pretty well, but how exactly does anthill behavior come from ant behavior? It’s a tricky problem in system engineering where predicting emergent behavior (such as emergencies) would be lovely.

devmor•1h ago

The good news is that despite being incredibly complex, it’s still a lot simpler than ants because it is at least all statistical linguistics (as far as LLMs are concerned anyways).

themafia•1h ago

> but do not have even a theory about how the behavior emerges

We fully do. There is a significant quality difference between English language output and other languages which lends a huge hint as to what is actually happening behind the scenes.

> but how exactly does anthill behavior come from ant behavior?

You can't smell what ants can. If you did I'm sure it would be evident.

kristiandupont•1h ago

I am very curious about this significant hint, could you point me to some material?

spiralcoaster•1h ago

Two very big revelations here that I would love to know more about:

1. Can you reveal "what's actually happening behind the scenes" beyond the hint you gave? I can't figure it out.

2. Can you explain how an ants sense of smell leads to anthills?

jen729w•1h ago

> 2. Can you explain how an ants sense of smell leads to anthills?

Ant 0: doesn’t seem to be dangerous here. I’ll drop a scent.

Ant 1: oh cool, a safe place. And I didn’t die either. I’ll reinforce that.

Ant 142,857,098,277: cool anthill.

fc417fc802•34m ago

The dynamics of ant nest creation are way more complicated than that. The evolved biological parallel of a procedural generation algorithm. In addition, the completed structure has to be compatible with the various programmed behaviors of the workers.

canjobear•1h ago

> There is a significant quality difference between English language output and other languages

floren•1h ago

They're saying LLMs do better when outputting English than other languages, an assertion I'm not really able to test but have heard elsewhere.

bryanrasmussen•1h ago

and this is somehow not related to the size and availability of corpora in English?

floren•46m ago

No, I'm quite sure that's why it's better.

bryanrasmussen•33m ago

OK but then that goes back to their other assertion that it gives a huge hint at what is going on behind the scenes, is that huge hint just "more data gives better results!" if so, that doesn't seem at all important since that is the absolutely central idea of an LLM. That is not behind the scenes at all, that is the introduction to the play as written by the author.

Not your fault obviously, but they have not yet described what that huge hint is, and I'm just at the edge of my seat with anticipation here.

fc417fc802•1h ago

> but do not have even a theory about how the behavior emerges from among the math

Actually we have an awful lot of those.

I'm not sure if emergent is quite the right term here. We carefully craft a scenario to produce a usable gradient for a black box optimizer. We fully expect nontrivial predictions of future state to result in increasingly rich world models out of necessity.

It gets back to the age old observation about any sufficiently accurate model being of equal complexity as the system it models. "Predict the next word" is but a single example of the general principle at play.

hnfong•53m ago

> black box optimizer

This is admission we don't know how it emerges.

Sure, we expect the behavior to emerge, but we don't know how.

fc417fc802•27m ago

No, as I said, we have _lots_ of theories about exactly that at various levels of detail. The theories vary based on (at least) the specifics of the loss function being employed to construct the gradient. Giving an overview of that is far beyond the scope of this comment section (but it's well trodden ground so you can just go ask an LLM).

The "black box" bit refers to a generic, interchangeable optimization algorithm that simply makes the number go down (or up or whatever).

There are certainly various details about the internal workings of models that we don't properly understand but a blanket claim about the whole is erroneous.

netfortius•1h ago

I'd rather go the route of bats [1]

[1] https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F

WD-42•1h ago

This is really hard to judge because by the looks of it, finance papers mostly consist of gobbledygook and extensive filler to begin with.

sp4cemoneky•1h ago

This. Verbalism lands really well to verbalism.

cyanydeez•1h ago

Economics is the attempt to take sociology and add numbers to make it look like a hard science. The fintechbros then seem to think because they can make numbers go up that this proof it's a hard science.

Tarq0n•1h ago

That's entirely missing the point. "All models are wrong, but some are useful". You can test hypotheses and learn things even about chaotic or emergent systems.

friendzis•29m ago

> You can test hypotheses and learn things even about chaotic or emergent systems.

Ah yes, the famous "Cut GDP in half, abolish public schooling and use that as a control" experiment. Majority of economic "models" are entirely correlational without any mechanistic explanation whatsoever or an explanation so superficial that it contradicts either itself or observed reality.

If you look deeper and read explanatory notes of economic laws, the model may refer some publications, but then the actual figures plugged in the model are explained as "these values have been observed to lead to the desired outcomes, therefore are set without any modeling or validation, hope for the best, lesssgoooo".

tolerance•1h ago

It’s interesting to read about the use and leverage of LLMs outside of programming.

I’m not too familiar with the history, but the import of this article is brushing up on my nose hairs in a way that makes me think a sort of neo-Sophistry is on the horizon.

themafia•1h ago

> The comments it offered were on the par of the best comments I’ve received on a paper in my entire academic career.

Sort of the lowest hanging fruit imaginable. Just because it became "fundamental" to the process doesn't mean it gained any quality.

libraryofbabel•1h ago

I have come to think “predict the next token” is not a useful way to explain how LLMs work to people unfamiliar with LLM training and internals. It’s technically correct, but at this point saying that and not talking about things like RLVR training and mechanistic interpretability is about as useful as framing talking with a person as “engaging with a human brain generating tokens” and ignoring psychology.

At least AI-haters don’t seem to be talking about “stochastic parrots” quite so much now. Maybe they finally got the memo.

qsera•1h ago

>“predict the next token” is not a useful way

That is the exact thing to say because that is exactly what it does, despite how it does so.

It is not useful to say it if you are an AI-shill though. You bought up AI-hater, so I think I am entitled to bring up AI-shills.

vasco•46m ago

My neurons are also just passing electric signals back and forward and exchanging water and salts with the rest of my body.

qsera•22m ago

> just passing electric signals back and forward

Ok, feel free to call yourselves a toaster, I don't mind!

dylan604•1h ago

I think talking to people unfamiliar with LLM training using words like "RLVR training and mechanistic interpretability" is about as useful as a grave robber in a crematorium.

libraryofbabel•41m ago

Obviously you don’t just say those words and leave it at that. Both those things can be explained in understandable terms. And even having a superficial sense of what they are gives people a better picture of what modern LLMs are all about than tired tropes from three years ago like “they’re just trained to predict the next token in the training data, therefore…”

stephenr•1h ago

> stochastic parrots

I prefer to use the term "spicy autocomplete" myself.

measurablefunc•57m ago

Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.

goatlover•53m ago

Must one be an "AI-hater" to use the term "stochastic parrot"? Which is probably in response to all the emergent AGI claims and pointless discussions about LLMs being conscious.

imiric•52m ago

Technical concepts can be broken down into ideas anyone can understand if they're interested. Token prediction is at the core of what these tools do, and is a good starting point for more complex topics.

On the other hand, calling these tools "intelligent", capable of "reasoning" and "thought", is not only more confusing and can never be simplified, but dishonest and borderline gaslighting.

Alex_L_Wood•26m ago

“Stochastic parrots” only stopped because AI fanboys stopped screaming “AGI” and “it will replace everyone”. Maybe they finally got the memo?

wavemode•1h ago

> the kind of analysis the program is able to do is past the point where technology looks like magic. I don’t know how you get here from “predict the next word.”

You're implicitly assuming that what you asked the LLM to do is unrepresented in the training data. That assumption is usually faulty - very few of the ideas and concepts we come up with in our everyday lives are truly new.

All that being said, the refine.ink tool certainly has an interesting approach, which I'm not sure I've seen before. They review a single piece of writing, and it takes up to an hour, and it costs $50. They are probably running the LLM very painstakingly and repeatedly over combinations of sections of your text, allowing it to reason about the things you've written in a lot more detail than you get with a plain run of a long-context model (due to the limitations of sparse attention).

It's neat. I wonder about what other kinds of tasks we could improve AI performance at by scaling time and money (which, in the grand scheme, is usually still a bargain compared to a human worker).

selridge•1h ago

>You're implicitly assuming that what you asked the LLM to do is unrepresented in the training data.

This is just as stuck in a moment in time as "they only do next word prediction" What does this even mean anymore? Are we supposed to believe that a review of this paper that wasn't written when that model (It's putatively not an "LLM", but IDK enough about it to be pushy there) was trained? Does that even make sense? We're not in the regime of regurgitating training data (if we really ever were). We need to let go of these frames which were barely true when they took hold. Some new shit is afoot.

wavemode•59m ago

Statistical models generalize. If you train a model that f(x) = 5 and f(x+1) = 6, the number 7 doesn't have to exist in the training data for the model to give you a correct answer for f(x+2)

Similarly, if there are millions of academic papers and thousands of peer reviews in the training data, a review of this exact paper doesn't need to be in there for the LLM to write something convincing. (I say "convincing" rather than "correct" since, the author himself admits that he doesn't agree with all the LLM's comments.)

I tend to recommend people learn these things from first principles (e.g. build a small neural network, explore deep learning, build a language model) to gain a better intuition. There's really no "magic" at work here.

selridge•48m ago

Ok cool cool. Instead of pretending you need to teach me, you could engage with what I'm saying or even the OP!

"I don't know how you get here from "predict the next word"" is not really so much a statement of ignorance where someone needs you to step in but a reflection that perhaps the tech is not so easily explained as that. No magic needs to be present for that to be the case.

wavemode•11m ago

If you disagree with someone on the internet, you can just say "I disagree, and here's why". You don't have to aggressively accuse them of "not engaging" with the text.

I engaged. You just don't like what I wrote. That's okay.

c22•44m ago

> If you train a model that f(x) = 5 and f(x+1) = 6, the number 7 doesn't have to exist in the training data for the model to give you a correct answer for f(x+2)

This is an interesting claim to me. Are there any models that exist that have been trained with a (single digit) number omitted from the training data?

If such a model does exist, how does it represent the answer? (What symbol does it use for the '7'?)

wavemode•40m ago

When I say "model" here I'm referring to any statistical model (in this example, probably linear regression). Not specifically large language models / neural networks.

c22•35m ago

Gotcha, I don't think I know enough about it. What constitutes training data for a for a (non neural network) statistical model? Is this something I could play around with myself with pen and paper?

red75prime•42m ago

I think the relevant question is: can a statistical model (or a transformer, in particular) generalize to general reasoning ability?

kristiandupont•40m ago

I had Claude help me get a program written for Linux to compile on macOS. The program is written in a programming language the author invented for the project, a pretty unusual one (for example, it allows spaces in variable names).

Claude figured out how the language worked and debugged segfaults until the compiler compiled, and then until the program did. That might not be magic, but it shows a level of sophistication where referring to “statistics” is about as meaningful as describing a person as the statistics of electrical impulses between neurons.

compass_copium•32m ago

But the programming language has explicitly laid out rules. It was not trained on those sets of rules, but it was trained on many trillions of lines of code. It has a map of how programs work, and an explanation of this new language. It's using training data and data it's fed to generate that result.

selridge•25m ago

What doesn't that explain tho?

What behavior would you need to see for that explanation to no longer hold? Because it seems like it explains too much.

Kim_Bruning•6m ago

If you run an LLM in an autoregressive loop you can get it to emulate a turing machine though. That sort of changes the complexity class of the system just a touch. 'Just predicts the next word' hits different when the loop is doing general computation.

Took me a bit of messing around, but try to write out each state sequentially, with a check step between each.

anon7725•51m ago

“Represented in the training data” does not mean “represented as a whole in the training data”. If A and B are separately in the training data, the model can provide a result when A and B occur in the input because the model has made a connection between A and B in the latent space.

selridge•41m ago

Yes. I’m saying that “it’s just in the training data” is a cognitive containment of these models which is incomplete. You can insist that’s what’s happening, but you’ll be left unable to explain what’s going on beyond truisms.

jjmarr•29m ago

I created a code review pipeline at work with a similar tradeoff and we found the cost is worth it. Time is a non-issue.

We could run Claude on our code and call it a day, but we have hundreds of style, safety, etc rules on a very large C++ codebase with intricate behaviour (cooperative multitasking be fun).

So we run dozens of parallel CLI agents that can review the code in excruciating detail. This has completely replaced human code review for anything that isn't functional correctness but is near the same order of magnitude of price. Much better than humans and beats every commercial tool.

"scaling time" on the other hand is useless. You can just divide the problem with subagents until it's time within a few minutes because that also increases quality due to less context/more focus.

mnewme•1h ago

Is this an ad? Seems like it. The text is not really what the headline suggests.

pianom4n•42m ago

Do you think the submitter intended this as an ad? His post history doesn't seem suspicious.

Or do you think article's author wrote this an an ad? He's a reputable academic who seems impressed with an AI tool he used and is honestly sharing his thoughts.

For reference he published the 80 page inflation mini-book 2 weeks ago asking for feedback: https://www.grumpy-economist.com/p/inflation

callmeal•1h ago

The "predict the next word" to a current llm is at the same level as a "transistor" (or gate) is to a modern cpu. I don't understand llms enough to expand on that comparison, but I can see how having layers above that feed the layers below to "predict the next word" and use the output to modify the input leading to what we see today. It is turtles all the way down.

brookst•1h ago

It’s a good comparison. It’s about abstraction and layers. Modern LLMs aren’t just models, they’re all the infrastructure around promoting and context management and mixtures of experts.

The next-word bit may be slightly higher than an individual transistor, possibly functional units.

echelon•55m ago

Humans are future predictors. Our vision systems, our mental models of our careers. People that predict the future tend to do well financially.

Now the machines are getting better than we are. It's exciting and a little bit terrifying.

We were polymers that evolved intelligence. Now the sand is becoming smart.

qsera•44m ago

>Now the machines are getting better than we are

Then AI companies should stop looking for investors and instead play stock markets with all that predictive powers!

ejolto•53m ago

There is a big difference, because I understand how those transistors produce a picture on a screen, I don’t understand how LLMs do what they do. The difference is so big that the comparison is useless.

visarga•1h ago

> Nothing you write will matter if it is not quickly adopted to the training dataset.

That is my take too, I was surprised to see how many people object to their works being trained on. It's how you can leave your mark, opening access for AI, and in the last 25 years opening to people (no restrictions on access, being indexed in Google).

mbgerring•1h ago

People who produced the works LLMs are trained on are not compensated for the value they are now producing, and their skills are increasingly less valued in a world with LLMs. The value the LLMs are producing is being captured by employees of AI companies who are driving up rent in the Bay Area, and driving up the cost of electricity and water everywhere else.

Your surprise to people’s objections makes sense if you can’t count.

chii•45m ago

> People who produced the works LLMs are trained on are not compensated for the value they are now producing

the value being extracted via LLM techniques is new value, which did not previously exist. The producer(s) of the old data had an asking price, which was taken by the LLM trainers. They cannot make the argument that since the LLM is producing new value, they should retroactively update their old asking price for their works.

They could update their asking price for any new works they produce. They also have the right to ask their works not be used for training, etc. But they cannot ask their old works to be paid for by the new uses in LLM in a retroactive way.

GolfPopper•29m ago

>The producer(s) of the old data had an asking price, which was taken by the LLM trainers.

This is... blatantly untrue?

https://arstechnica.com/tech-policy/2026/02/microsoft-remove...

https://www.theatlantic.com/technology/archive/2025/03/libge...

heavyset_go•48m ago

Most people value their time and work and don't want to give it away for free to some billionaire so they can reproduce it as slop for their own private profit.

That's to say, most people recognize when they're getting fucked over and are correct to object to it.

retrac•58m ago

I know this sounds insane but I've been dwelling on it. Language models are digital Ouija boards. I like the metaphor because it offers multiple conflicting interpretations. How does a Ouija board work? The words appear. Where do they come from? It can be explained in physical terms. Or in metaphysical terms. Collective summing of psychomotor activity. Conduits to a non-corporeal facet of existence. Many caution against the Ouija board as a path to self-inflicted madness, others caution against the Ouija board as a vehicle to bring poorly understood inhuman forces into the world.

brookst•56m ago

Ouija boards are just collective negotiation among people.

nekusar•37m ago

There's 2 completely different ways to understand how a Ouija board works. Occult, and Scientific.

Scientific: It's a combined response from everyone's collective unconscious blend of everyone participating. In other words, its a probabilistic result of an "answer" to the question everyone hears.

Occult: If an entity is present, it's basically the unshielded response of that entity by collectively moving everyone's body the same way, as a form of a mild channel. Since Ouija doesn't specific to make a circle and request presence of a specific entity, there's a good chance of some being hostile. Or, you all get nothing at all, and basically garbage as part of the divination/communication.

But comparing Ouija to LLMs? The LLM, with the same weights, with the same hyperparameters, and same questions will give the same answers. That is deterministic, at least in that narrow sense. An Ouija board is not deterministic, and cannot be tested in any meaningful scientific sense.

ChaitanyaSai•58m ago

The whole next word thing is interesting isn't it. I like to see it with Dennett's "Competence and comprehension" lens. You can predict the next word competently with shallow understanding. But you could also do it well with understanding or comprehension of the full picture. A mental model that allows you to predict better. Are the AIs stumbling into these mental models? Seems like it. However, because these are such black boxes, we do not know how they are stringing these mental models together. Is it a random pick from 10 models built up inside the weights? Is there any system-wide cohesive understanding, whatever that means? Exploring what a model can articualate using self-reflection would be interesting. Can it point to internal cognitive dissonance because it has been fed both evolution and intelligent design, for example? Or these exist as separate models to invoke depending on the prompt context, because all that matters is being rewarded by the current user?

halyconWays•43m ago

Searle's Chinese Room experiment but without knowing what's in the room, and when you try to peek in you just see a cloud of fog and are left to wonder if it's just a guy with that really big dictionary or something more intelligent.

selridge•26m ago

It's an octopus, perhaps: https://aclanthology.org/2020.acl-main.463.pdf

There's also this blog post: https://julianmichael.org/blog/2020/07/23/to-dissect-an-octo... (which IMO is better to read than the paper)

grey-area•41m ago

Given their failure on novel logic problems, generation of meaningless text, tendency to do things like delete tests and incompetence at simple mathematics, it seems very unlikely they have built any sort of world model. It’s remarkable how competent they are given the way they work.

Predict the next word is a terrible summary of what these machines do though, they certainly do more than that, but there are significant limitations.

‘Reasoning’ etc are marketing terms and we should not trust the claims made by companies who make these models.

The Turing test had too much confidence in humans it seems.

shakna•14m ago

Probably worth remembering that ELIZA passed Turing tests, and was the definition of shallow prediction.

steve1977•8m ago

> Predict the next word is a terrible summary of what these machines do though, they certainly do more than that

What would that be?

basch•22m ago

It's honestly disheartening and a bit shocking how everyone has started repeating the predict the next syllable criticism.

The language model predicts the next syllable by FIRST arriving in a point in space that represents UNDERSTANDING of the input language. This was true all the way back in 2017 at the time of Attention Is All You Need. Google had a beautiful explainer page of how transformers worked, which I am struggling to find. Found it. https://research.google/blog/transformer-a-novel-neural-netw...

The example was and is simple and perfect. The word bank exists. You can tell what bank means by its proximity to words, such as river or vault. If you go from left to right, checking each word against each other word (in a sentence, paragraph, or paper,) you can give each word a meaning. You then add all the meanings together. Language models are making a frequency association of every word to every other word, and then summing it to create understanding of complex ideas, even if it doesn't understand what it is understanding and has never seen it before.

That all happens BEFORE "autocompleting the next syllable."

The magic part of LLMs is understanding the input. Being able to use that to make an educated guess of what comes next is really a lucky side effect. The fact that you can chain that together indefinitely with some random number generator thrown in and keep saying new things is pretty nifty, but a bit of a show stealer.

What really amazes me about transformers is that they completely ignored prescriptive linguistic trees and grammar rules and let the process decode the semantic structure fluidly and on the fly. This lets people create crazy run on sentences that break every rule of english (or your favorite language) but instructions that are still parsable.

It is really helpful to remember that transformers origins are language translation. They are designed to take text and apply a modification to it, while keeping the meaning static. They accomplish this by first decoding meaning. The fact that they then pivoted from translation to autocomplete is a useful thing to remember when talking to them. A task a language model excels at is taking text, reducing it to meaning, and applying a template. So a good test might be "take Frankenstein, and turn it into a magic school bus episode." Frankenstein is reduced to meaning, the Magic School Bus format is the template, the meaning is output in the form of the template. This is a translation, although from English to English, represented as two completely different forms. Saying "find all the Wild Rice recipes you can, normalize their ingredients to 2 cups of broth, and create a table with ingredient ranges (min-max) for each ingredient option" is closer to a translation than it is to "autocomplete." Input -> Meaning -> Template -> Output. With my last example the template itself is also generated from its own meaning calculation.

A lot has changed since 2017, but the interpreter being the real technical achievement still holds true imho. I am more impressed with AI's ability to parse what I am saying than I am by it's output (image models not withstanding.)

qsera•16m ago

>represents UNDERSTANDING of the input language.

It does not have an understanding, it pattern matches the "idea shape" of words in the "idea space" of training data and calculates the "idea shape" that is likely to follow considering all the "idea shape" patterns in its training data.

It mimics understanding. It feels mysterious to us because we cannot imagine the mapping of a corpus of text to this "idea space".

It is quite similar to how mysterious a computer playing a movie can appear, if you are not aware of mapping of movie to a set of pictures, pictures to pixels, and pixels to co-ordinates and colors codes.

mzhaase•13m ago

It always occurred to me that LLMs may be like the language center of the brain. And there should be a "whole damn rest of the brain" behind it to steer it.

LLMs miss very important concepts, like the concept of a fact. There is no "true", just consensus text on the internet given a certain context. Like that study recently where LLMs gave wrong info if there was the biography of a poor person in the context.

steve1977•6m ago

I think much along the same lines. LLMs are probably even just a part of the language center.

And of course they also miss things like embodiment, mirror neurons etc.

If an LLM makes a mistake, it will tell you it is sorry. But does it really feel sorry?

intended•48m ago

The article talks about LLMs reviewing Econ papers.

I’m hesitant to call this an outright win, though.

Perhaps the review service the author is using is really good.

Almost certainly the taste, expertise and experience of the author is doing unseen heavy lifting.

I found that using prompts to do submission reviews for conferences tended to make my output worse, not better.

Letting the LLM analyze submissions resulted in me disconnecting from the content. To the point I would forget submissions after I closed the tab.

I ended up going back to doing things manually, using them as a sanity check.

On the flip side, weaker submissions using generative tools became a nightmare, because you had to wade through paragraphs of fluff to realize there was no substantive point.

It’s to the point that I dread reviewing.

I am going to guess that this is relatively useful for experts, who will submit stronger submissions, than novices and journeymen, who will still make foundational errors.

ruhith•46m ago

Predict the next token' is true but not explanatory. It's like saying humans 'fire neurons.' Technically correct, explains nothing useful about the behavior you're actually observing. The debate isn't whether the description is accurate - it's whether it's at the right level of abstraction.

pharrington•42m ago

Why do the deliverables always take about 1 hour? Is this fully automated?

gammalost•31m ago

It is really interesting how great and also how terrible LLMs can be at the same time. For example, I had a really annoying bug yesterday, I missed one character, "_". Asking ChatGPT for help led to a lot of feedback that was arguably okay but not currently relevant (because there was a fatal flaw in the code).

Remade the conversation with personal information stripped here https://chatgpt.com/share/699fef77-b530-8007-a4ed-c3dda9461d...

GodelNumbering•27m ago

It is probably the first-time aha moment the author is talking about. But under the hood, it is probably not as magical as it appears to be.

Suppose you prompted the underlying LLM with "You are an expert reviewer in..." and a bunch of instructions followed by the paper. LLM knows from the training that 'expert reviewer' is an important term (skipping over and oversimplifying here) and my response should be framed as what I know an expert reviewer would write. LLMs are good at picking up (or copying) the patterns of response, but the underlying layer that evaluates things against a structural and logical understanding is missing. So, in corner cases, you get responses that are framed impressively but do not contain any meaningful inputs. This trait makes LLMs great at demos but weak at consistently finding novel interesting things.

If the above is true, the author will find after several reviews that the agent they use keeps picking up on the same/similar things (collapsed behavior that makes it good at coding type tasks) and is blind to some other obvious things it should have picked up on. This is not a criticism, many humans are often just as collapsed in their 'reasoning'.

LLMs are good at 8 out of 10 tasks, but you don't know which 8.

modeless•26m ago

It's clear that in the general case "predict the next word" requires arbitrarily good understanding of everything that can be described with language. That shouldn't be mysterious. What's mysterious is how a simple training procedure with that objective can in practice achieve that understanding. But then again, does it? The base model you get after that simple training procedure is not capable of doing the things described in the article. It is only useful as a starting point for a much more complex reinforcement learning procedure that teaches the skills an agent needs to achieve goals.

RL is where the magic comes from, and RL is more than just "predict the next word". It has agents and environments and actions and rewards.

sasjaws•22m ago

A while ago i did the nanogpt tutorial, i went through some math with pen and paper and noticed the loss function for 'predict the next token' and 'predict the next 2 tokens' (or n tokens) is identical.

That was a bit of a shock to me so wanted to share this thought. Basically i think its not unreasonable to say llms are trained to predict the next book instead of single token.

Hope this is usefull to someone.

sputknick•18m ago

I'd like to explore this idea, did you make a blog post about it? is it simple enough to post in the reply?

Alex_L_Wood•22m ago

Unless proven otherwise, assume everything coming from AI industry is an ad, a pitch to investors to raise money or a straight-up lie. AI is useful in some instances, but there are so much money riding on it that there are forces way bigger then us propping it all up.

And this is an ad, I assume.

tsunamifury•7m ago

I think it’s funny that at Google I invented and productized next word (and next action) predictor in Gmail and hangouts chat and I’ve never had a single person come to me and ask how this all works.

To me LLMs are incredibly simple. Next word next sentence next paragraph and next answer are stacked attention layers which identify manifolds and run in reverse to then keep the attention head on track for next token. It’s pretty straight forward math and you can sit down and make a tiny LLM pretty easily on your home computer with a good sized bag of words and context

To me it’s baffling everyone goes around saying constantly that not even Nobel prize winners know how this works it’s a huge mystery.

Has anyone thought to ask the actual people like me and others who invented this?

Google API keys weren't secrets, but then Gemini changed the rules

Jimi Hendrix was a systems engineer

First Website (1992)

How will OpenAI compete?

RAM now represents 35 percent of bill of materials for HP PCs

I don't know how you get here from "predict the next word."

The Pleasures and Pains of Coffee (1830)

Windows 11 Notepad to support Markdown

Making MCP cheaper via CLI

Artist who “paints” portraits on glass by hitting it with a hammer

Bus stop balancing is fast, cheap, and effective

Self-improving software won't produce Skynet

Even the Mars Rover Uses Zip Ties (2021)

Large-Scale Online Deanonymization with LLMs

Show HN: Respectify – A comment moderator that teaches people to argue better

The First Fully General Computer Action Model

Writers and Their Day Jobs

Tech companies shouldn't be bullied into doing surveillance

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

PA bench: Evaluating web agents on real world personal assistant workflows

An autopsy of AI-generated 3D slop

The Om Programming Language

Gauss's Weekday Algorithm, Visualized

Dissecting the CPU-memory relationship in garbage collection (OpenJDK 26)

Learnings from 4 months of Image-Video VAE experiments

Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub

Launch HN: TeamOut (YC W22) – AI agent for planning company retreats

GNU Texmacs

What Pressure Does to an Athlete's Body

Quasi-Zenith Satellite System

Google API keys weren't secrets, but then Gemini changed the rules

Jimi Hendrix was a systems engineer

First Website (1992)

How will OpenAI compete?

RAM now represents 35 percent of bill of materials for HP PCs

I don't know how you get here from "predict the next word."

The Pleasures and Pains of Coffee (1830)

Windows 11 Notepad to support Markdown

Making MCP cheaper via CLI

Artist who “paints” portraits on glass by hitting it with a hammer

Bus stop balancing is fast, cheap, and effective

Self-improving software won't produce Skynet

Even the Mars Rover Uses Zip Ties (2021)

Large-Scale Online Deanonymization with LLMs

Show HN: Respectify – A comment moderator that teaches people to argue better

The First Fully General Computer Action Model

Writers and Their Day Jobs

Tech companies shouldn't be bullied into doing surveillance

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

PA bench: Evaluating web agents on real world personal assistant workflows

An autopsy of AI-generated 3D slop

The Om Programming Language

Gauss's Weekday Algorithm, Visualized

Dissecting the CPU-memory relationship in garbage collection (OpenJDK 26)

Learnings from 4 months of Image-Video VAE experiments

Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub

Launch HN: TeamOut (YC W22) – AI agent for planning company retreats

GNU Texmacs

What Pressure Does to an Athlete's Body

Quasi-Zenith Satellite System

I don't know how you get here from "predict the next word."

Comments