There Are No New Ideas in AI Only New Datasets

https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only

76•bilsbie•3h ago

Comments

ctoth•1h ago

Reinforcement learning from self-play/AlphaWhatever? Nah must just be datasets. :)

grumpopotamus•1h ago

https://en.wikipedia.org/wiki/TD-Gammon

Y_Y•58m ago

You raise a really interesting point. I'm sure it's just missed my notice, but I'm not familiar with any projects from antediluvian AI that have been resurrected to run on modern hardware and see where they'd really asymptote if they'd had the compute they deserved.

FeepingCreature•20m ago

To be fair, usually those projects would need considerable work to be ported to modern multicore machines, let alone GPUs.

NitpickLawyer•1h ago

And architecture stuff like actually useful long context. Whatever they did with gemini 2.5 is miles ahead in long context useful results compared to the previous models. I'd be very surprised if gemini 2.5 is "just" gemini 1 w/ better data.

nyrikki•38m ago

Big difference between a perfect information, completely specified zero sum game and the real world.

As a simple analogy, read out the following sentence multiple times, stressing a different word each time.

"I never said she stole my money"

Note how the meaning changes and is often unique?

That is a lens I to the frame problem and it's inverse, the specification problem.

The above problem quickly becomes tower-complete, and recent studies suggest that RL is reinforcing or increasing the weight of existing patterns.

As the open domain frame problem and similar challenges are equivalent to HALT, finding new ways to extract useful information will be important for generalization IMHO.

Synthetic data is useful, but not a complete solution, especially for tower problems.

kogus•1h ago

To be fair, if you imagine a system that successfully reproduced human intelligence, then 'changing datasets' would probably be a fair summary of what it would take to have different models. After all, our own memories, training, education, background, etc are a very large component of our own problem solving abilities.

jschveibinz•1h ago

I will respectfully disagree. All "new" ideas come from old ideas. AI is a tool to access old ideas with speed and with new perspectives that hasn't been available up until now.

Innovation is in the cracks: recognition of holes, intersections, tangents, etc. on old ideas. It has bent said that innovation is done on the shoulders of giants.

So AI can be an express elevator up to an army of giant's shoulders? It all depends on how you use the tools.

gametorch•1h ago

Exactly!

Can you imagine if we applied the same gatekeeping logic to science?

Imagine you weren't allowed to use someone else's scientific work or any derivative of it.

We would make no progress.

The only legitimate defense I have ever seen here revolves around IP and copyright infringement, which I couldn't care less about.

alfalfasprout•1h ago

Access old ideas? Yes. With new perspectives? Not necessarily. An LLM may be able to assist in interpreting data with new perspectives but in practice they're still fairly bad at greenfield work.

As with most things, the truth lies somewhere in the middle. LLMs can be helpful as a way of accelerating certain kinds and certain aspects of research but not others.

bcrosby95•40m ago

The article is discussing working in AI innovation vs focusing on getting more and better data. And while there have been key breakthroughs in new ideas, one of the best ways to increase the performance of these systems is getting more and better data. And how many people think data is the primary avenue to improvement.

It reminds me of an AI talk a few decades ago, about how the cycle goes: more data -> more layers -> repeat...

Anyways, I'm not sure how your comment relates to these two avenues of improvement.

jjtheblunt•31m ago

> I will respectfully disagree. All "new" ideas come from old ideas.

The insight into the structure of the benzene ring famously came in a dream, hadn't been seen before, but was imagined as a snake bitings its own tail.

tippytippytango•1h ago

Sometimes we get confused by the difference between technological and scientific progress. When science makes progress it unlocks new S-curves that progress at an incredible pace until you get into the diminishing returns region. People complain of slowing progress but it was always slow, you just didn’t notice that nothing new was happening during the exponential take off of the S-curve, just furious optimization.

ks2048•57m ago

The latest LLMs are simply multiplying and adding various numbers together... Babylonians were doing that 4000 years ago.

bobson381•55m ago

You are just a lot of interactions of waves. All meaning is assigned. I prefer to think of this like the Goedel generator that found new formal expressions for the Principia - because we have a way of indexing concept-space, there's no telling what we might find in the gaps.

voxleone•50m ago

I'd say with confidence: we're living in the early days. AI has made jaw-dropping progress in two major domains: language and vision. With large language models (LLMs) like GPT-4 and Claude, and vision models like CLIP and DALL·E, we've seen machines that can generate poetry, write code, describe photos, and even hold eerily humanlike conversations.

But as impressive as this is, it’s easy to lose sight of the bigger picture: we’ve only scratched the surface of what artificial intelligence could be — because we’ve only scaled two modalities: text and images.

That’s like saying we’ve modeled human intelligence by mastering reading and eyesight, while ignoring touch, taste, smell, motion, memory, emotion, and everything else that makes our cognition rich, embodied, and contextual.

Human intelligence is multimodal. We make sense of the world through:

Touch (the texture of a surface, the feedback of pressure, the warmth of skin0; Smell and taste (deeply tied to memory, danger, pleasure, and even creativity); Proprioception (the sense of where your body is in space — how you move and balance); Emotional and internal states (hunger, pain, comfort, fear, motivation).

None of these are captured by current LLMs or vision transformers. Not even close. And yet, our cognitive lives depend on them.

Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.

Swizec•43m ago

> The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.

Like Dr. Who said: DALEKs aren't brains in a machine, they are the machine!

Same is true for humans. We really are the whole body, we're not just driving it around.

skydhash•33m ago

Yeah, but are there new ideas or only wishes?

anon291•42m ago

I mean there's no new ideas for saas but just new applications and that worked out pretty well

luppy47474•38m ago

Hmmm

rar00•32m ago

disagree, there are a few organisations exploring novel paths. It's just that throwing new data at an "old" algorithm is much easier and has been a winning strategy. And, also, there's no incentive for a private org to advertise a new idea that seems to be working (mine's a notable exception :D).

tantalor•23m ago

> If data is the only thing that matters, why are 95% of people working on new methods?

Because new methods unlock access to new datasets.

Edit: Oh I see this was a rhetorical question answered in the next paragraph. D'oh

piinbinary•21m ago

AI training is currently a process of making the AI remember the dataset. It doesn't involve the AI thinking about the dataset and drawing (and remembering) conclusions.

It can probably remember more facts about a topic than a PhD in that topic, but the PhD will be better at thinking about that topic.

jayd16•10m ago

Its a bit more complex than that. Its more about baking out the dataset into heuristics that a machine can use to match a satisfying result to an input. Sometimes these heuristics are surprising to a human and can solve a problem in a novel way.

"Thinking" is too broad a term to apply usefully but I would say its pretty clear we are not close to AGI.

nkrisc•6m ago

> It can probably remember more facts about a topic than a PhD in that topic

So can a notebook.

EternalFury•19m ago

What John Carmack is exploring is pretty revealing. Train models to play 2D video games to a superhuman level, then ask them to play a level they have not seen before or another 2D video game they have not seen before. The transfer function is negative. So, in my definition, no intelligence has been developed, only expertise in a narrow set of tasks.

It’s apparently much easier to scare the masses with visions of ASI, than to build a general intelligence that can pick up a new 2D video game faster than a human being.

Kapura•18m ago

Here's an idea: make the AIs consistent at doing things computers are good at. Here's an anecdote from a friend who's living in Japan:

> i used chatgpt for the first time today and have some lite rage if you wanna hear it. tldr it wasnt correct. i thought of one simple task that it should be good at and it couldnt do that.

> (The kangxi radicals are neatly in order in unicode so you can just ++ thru em. The cjks are not. I couldnt see any clear mapping so i asked gpt to do it. Big mess i had to untangle manually anyway it woulda been faster to look them up by hand (theres 214))

> The big kicker was like, it gave me 213. And i was like, "why is one missing?" Then i put it back in and said count how many numbers are here and it said 214, and there just werent. Like come on you SHOULD be able to count.

If you can make the language models actually interface with what we've been able to do with computers for decades, i imagine many paths open up.

b0a04gl•11m ago

if datasets are the new codebases ,then the real IP can be dataset version control. how you fork ,diff ,merge and audit datasets like code. every team says 'we trained on 10B tokens' but what if we can answer 'which 5M tokens made reasoning better', 'which 100k made it worse'. then we can start being targeted leverage

krunck•7m ago

Until these "AI" systems become always-on, always-thinking, always-processing, progress is stuck. The current push button AI - meaning it only processes when we prompt it - is not how the kind of AI that everyone is dreaming of needs to function.

Show HN: Snippet Curator, Evernote alternative and SingleFile viewer for notes

Kuo: Apple to release cheaper MacBook powered by iPhone processor

Show HN: We're two coffee nerds who built an AI app to track beans and recipes

Pets Allowed (2014)

JetBrains announces price hike for YouTrack

Show HN: Chrome DevTools MCP Server

The Anatomy of Quick Wins

Workflows 1.0: A Lightweight Framework for Agentic systems

The JTAG in your Qualcomm/Snapdragon device's USB port

Mechanistic Interpretability of Emotion Inference in Large Language Models

Why are AI agents not allowed to roam freely on the internet?

Datadog's $65M/year customer mystery solved

OpenTelemetry Is Great, but Who the Hell Is Going to Pay for It?

How Long Contexts Fail

Brain rot isn't new – but now we're all talking about it

AI Economy and Mass Unemployment

Meta Joins Kotlin Foundation

Ask HN: How have you shared computers with your young child (~3 to 5)

Don't Be Ashamed to Say "I Don't Know"

I'm a software engineer – what next?

Rubik's Cube Solver

Show HN: Validated Daily SaaS Ideas

What's the difference between bio and non bio detergent? (2022)

Amateurs Talk Algorithms, Professionals Talk Data Cleaning

Show HN: BugBlaze – a CLI tool that explains your code errors using AI

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

The Bear Manifesto – Herman's Blog

Alex Trebek Stamps, Sheet of 20

Show HN: Kichan.ai a free Chrome extension to generate and inject scripts

Show HN: Attach Gateway – one-command OIDC/DID auth for local LLMs