We fully do. There is a significant quality difference between English language output and other languages which lends a huge hint as to what is actually happening behind the scenes.
> but how exactly does anthill behavior come from ant behavior?
You can't smell what ants can. If you did I'm sure it would be evident.
1. Can you reveal "what's actually happening behind the scenes" beyond the hint you gave? I can't figure it out.
2. Can you explain how an ants sense of smell leads to anthills?
Ant 0: doesn’t seem to be dangerous here. I’ll drop a scent.
Ant 1: oh cool, a safe place. And I didn’t die either. I’ll reinforce that.
Ant 142,857,098,277: cool anthill.
?
Actually we have an awful lot of those.
I'm not sure if emergent is quite the right term here. We carefully craft a scenario to produce a usable gradient for a black box optimizer. We fully expect nontrivial predictions of future state to result in increasingly rich world models out of necessity.
It gets back to the age old observation about any sufficiently accurate model being of equal complexity as the system it models. "Predict the next word" is but a single example of the general principle at play.
[1] https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F
I’m not too familiar with the history, but the import of this article is brushing up on my nose hairs in a way that makes me think a sort of neo-Sophistry is on the horizon.
Sort of the lowest hanging fruit imaginable. Just because it became "fundamental" to the process doesn't mean it gained any quality.
At least AI-haters don’t seem to be talking about “stochastic parrots” quite so much now. Maybe they finally got the memo.
That is the exact thing to say because that is exactly what it does, despite how it does so.
It is not useful to say it if you are an AI-shill though. You bought up AI-hater, so I think I am entitled to bring up AI-shills.
I prefer to use the term "spicy autocomplete" myself.
You're implicitly assuming that what you asked the LLM to do is unrepresented in the training data. That assumption is usually faulty - very few of the ideas and concepts we come up with in our everyday lives are truly new.
All that being said, the refine.ink tool certainly has an interesting approach, which I'm not sure I've seen before. They review a single piece of writing, and it takes up to an hour, and it costs $50. They are probably running the LLM very painstakingly and repeatedly over combinations of sections of your text, allowing it to reason about the things you've written in a lot more detail than you get with a plain run of a long-context model (due to the limitations of sparse attention).
It's neat. I wonder about what other kinds of tasks we could improve AI performance at by scaling time and money (which, in the grand scheme, is usually still a bargain compared to a human worker).
This is just as stuck in a moment in time as "they only do next word prediction" What does this even mean anymore? Are we supposed to believe that a review of this paper that wasn't written when that model (It's putatively not an "LLM", but IDK enough about it to be pushy there) was trained? Does that even make sense? We're not in the regime of regurgitating training data (if we really ever were). We need to let go of these frames which were barely true when they took hold. Some new shit is afoot.
Similarly, if there are millions of academic papers and thousands of peer reviews in the training data, a review of this exact paper doesn't need to be in there for the LLM to write something convincing. (I say "convincing" rather than "correct" since, the author himself admits that he doesn't agree with all the LLM's comments.)
I tend to recommend people learn these things from first principles (e.g. build a small neural network, explore deep learning, build a language model) to gain a better intuition. There's really no "magic" at work here.
The next-word bit may be slightly higher than an individual transistor, possibly functional units.
That is my take too, I was surprised to see how many people object to their works being trained on. It's how you can leave your mark, opening access for AI, and in the last 25 years opening to people (no restrictions on access, being indexed in Google).
Your surprise to people’s objections makes sense if you can’t count.
pushedx•50m ago
There's the 3b1b video series which does a pretty good job, but now we are interfacing with models that probably have parameter counts in each layer larger than the first models that we interacted with.
The novel insights that these models can produce is truly shocking, I would guess even for someone who does understand the latest techniques.
measurablefunc•41m ago
brookst•2m ago
But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.
I’m sure there’s prior art out there, but that’s true for pretty much everything.
auraham•30m ago
[1] https://www.manning.com/books/build-a-large-language-model-f...