Anyway, this looks like a case of human trying to understand article without reading it.
I’m guessing the iterative approach burns a lot of tokens though, though that may not matter too much with 8B Llama as the LLM.
Interesting, but title is definitely clickbait.
And if you think that is amazing, my bi-metallic strip thermostat "feels" the temperature and then modifies the environment because it "knows" if it's hot to turn on the A/C, and if it's cold to turn on the heat - no training or code!
All of this AI stuff is just unbelievably incredible - what a brave new world (of word games)!
Anyone who works with deep architectures and momentum-based optimizers knows that the first few updates alone provide large improvements in loss. In this paper the breakthrough is that computing these first few updates at test time enables one to describe the algorithm as "without training" and therefore attract hype.
But they aren't updating the model weights. They're iteratively updating the prompt. It's automating the process that humans use with generative models.
Agreed that it's conceptually equivalent though.
I’ll bite and say this is actually interesting — and the paper title is misleading.
What they’ve done here is hooked up a text-only LLM to multimodal critics, given it (mostly) an image diffusion generation task, and asked it to improve its prompting of the multimodal generation by getting a set of scores back.
This definitely works, based on their outputs. Which is to say, LLMs can, zero shot, with outside tool feedback, iteratively improve their prompting using only that tooling feedback.
Why is this interesting? Well, this did not work in the GPT-3 era; it seems to do so now. I see this as an interesting line to be added in the ‘model capabilities’ box as our models get larger and more sophisticated — the LLMs can perform some sort of internally guided search against a black box generator and use a black box scorer to improve at inference time.
That’s pretty cool. It’s also generalizable, and I think is worth keeping in mind on the stack of possible approaches for, say agentic coding, that you can use a critic to not just ‘improve’ generated output, but most likely do some guided search through output space.
This paper has ablations, although I didn’t read that section, so you could see where they say the effectiveness comes from. I bet you thought that it’s emergent from a bunch of different places.
FWIW, I don’t think LLMS will solve all our problems, so I too am skeptical of that claim. I’m not skeptical of the slightly weaker “larger models have emergent capabilities and we are probably not done finding them as we scale up”.
100% agree. I'd classify the time now as identifying the limits of what they can functionally do though, an it's a lot!
The one issue I keep finding with those approaches is that there’s already good tools for the problem, but we keep searching for wasteful approaches because “natural languages” for something humans are not going to interact without a good deal of training.
I do understand the hope of getting LLMs do the bulk of the work, and then after audit, we fix the errors. But both audit and fixing will require the same mental energy as writing the code in the first place. And possibly more time.
Specialist tools are always more expansive and offer more controls than general public tools. Most approaches with agentic coding is offering general interfaces instead of specialized interfaces, but redirecting you to a bespoke and badly designed specialized interface whenever you want to do anything useful.
I think of this less as audit misses, and more as developing a permanently useful tool. For open model weights, humanity will not (unless we’re talking real zombie apocalypse scenarios) lose these weights. They are an incredible global asset, so making them more generally useful and figuring out how to use them is super helpful.
This was accurate. But mostly humans gained from books. I think we will develop the social technology to use these tools over time; giving some things up and gaining others.
If we don’t, the Amish can just take over and be like “Stupid English, using the devil’s weights.” :)
I really wish we would find a different term for this.
Doing something always takes at least one attempt, i.e. "one shotting". "Zero shotting" is an oxymoron, which makes it a term that only creates more confusion rather than succinctly conveying something.
TCGs have a related "zero turn win" concept, where the opponent goes first and you win without getting a turn due to the set of cards you randomly drew and being able to activate them on the opponent's turn.
And if I hear someone say "banger", "cooking", "insane", or "crazy", one more time I'm going to sledge hammer my computer. Can't someone, under 40 please pick up a book and read. Yesterday Sam Altman tried to coin "Skillsmaxxing" in a tweet. I threw my coffee cup at my laptop.
Through the story and experience of a blind man, they end up getting into the question of what does it mean to see
The podcast is pretty straightforward, but it does end up showing that defining “seeing” is a philosophical question, rather than a simple obvious answer
This is the wrong approach to take. At minimum you have to say things like "well yes we're always on the lookout for this kind of thing". With him? Not a care in the world
underdeserver•8h ago
suddenlybananas•8h ago
dragonwriter•6h ago
suddenlybananas•5h ago