Meta Superintelligence's surprising first paper

https://paddedinputs.substack.com/p/meta-superintelligences-surprising

120•skadamat•3h ago

https://arxiv.org/abs/2509.01092

Comments

bigyabai•2h ago

> Long awaited first paper from Meta Superintelligence Labs is not a model layer innovation. What does this mean?

It means you're reading into it too much and need to be let down, gently, from the hype train.

nine_k•2h ago

A great post, it starts with this:

TL;DR

• MSI’s first paper, REFRAG, is about a new way to do RAG.

• This slightly modified LLM converts most retrieved document chunks into compact, LLM-aligned chunk embeddings that the LLM can consume directly.

• A lightweight policy (trained with RL) decides which chunk embeddings should be expanded back into full tokens under a budget; the LLM runs normally on this mixed input.

• The net effect is far less KV cache and attention cost, much faster first-byte latency and higher throughput, while preserving perplexity and task accuracy in benchmarks.

I wish more long posts followed this model of a scientific paper.

jongjong•2h ago

Interesting. All developers I know who tinkered around with embeddings and vector similarity scoring were instantly hooked. The efficiency of computing the embeddings once and then reusing as many times as needed, comparing the vectors with a cheap <30-line function is extremely appealing. Not to mention the indexing capabilities to make it work at scale.

IMO vector embedding is the most important innovation in computing of the last decade. There's something magical about it. These people deserve some kind of prize. The idea that you can reduce almost any intricate concept including whole paragraphs to a fixed-size vector which encapsulates its meaning and proximity to other concepts across a large number of dimensions is pure genius.

_jayhack_•2h ago

Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale

senderista•2h ago

Yup, when I was at MSFT 20 years ago they were already productizing vector embedding of documents and queries (LSI).

ekidd•2h ago

Vector embeddings are slightly interesting because they come pre-trained with large amounts of data.

But similar ways to reduce huge numbers of dimensions to a much smaller set of "interesting" dimensions have been known for a long time.

Examples include principal component analysis/single value decomposition, which was the first big breakthrough in face recognition (in the early 90s), and also used in latent semantic indexing, the Netflix prize, and a large pile of other things. And the underlying technique was invented in 1901.

Dimensionality reduction is cool, and vector embedding is definitely an interesting way to do it (at significant computational cost).

liampulles•1h ago

If you take the embedding for king, subtract the embedding for male, add the embedding for female, and lookup the closest embedding you get queen.

The fact that dot product addition can encode the concept of royalty and gender (among all other sorts) is kind of magic to me.

mountainriver•2h ago

This was a very obvious next step, I played around with implementing something similar at one point.

In general we need to make it simpler for LLMs to take in different forms of embeddings. At least frameworks that simplify it.

cm2012•2h ago

At first I thought the super intelligence wrote a novel scientific paper

Imnimo•1h ago

I'm curious whether this is work that was specifically begun under the "superintelligence" umbrella, or if it's just that the people who were working on it had been shifted to the Superintelligence team by the time they wrote the paper. I would guess the former?

naasking•1h ago

> the core insight here is actually: if embeddings are generated by layers within the LLM, it makes no sense to convert them back to natural language, just for another LLM to compress those tokens back to embeddings.

Doesn't this tie the two layers together in a way that they can't evolve separately?

xvector•1h ago

Working in big tech it's pretty wild to see how integral AI has become to our work internally, vs the public perception of it. People are NOT prepared.

fishmicrowaver•1h ago

Not prepared for what? Seems like the rest of the world is desperate to be shown the way to unlock something of value?

Workaccount2•1h ago

I think at this point it's software devs looking for the value unlock.

Non-software devs are actually making functional programs for themselves for the first time ever. The value is crazy.

ceejayoz•1h ago

It’s not the first time ever. People did the same with Access and HyperCard in the 90s.

fishmicrowaver•1h ago

Sure but I'm the real world do you think businesses are going to deploy piles of code into production generated this way? No, non technical people will continue to whip up MS PowerApps. AI generated code has no value to many businesses.

terminalshort•1h ago

1. Hyperbolic statement about LLM capabilities with no concrete examples

2. Wild claim that the companies that sell LLMs are actually downplaying their capabilities instead of hyping them

danielmarkbruce•1h ago

Yup, he's totally lying. Not happening. Just carry on.

BoorishBears•58m ago

Agreed, but why are they lying?

crorella•35m ago

Personal experience here in a FAANG, there has been a considerable increase in: 1. Teams exploring how to leverage LLMs for coding. 2. Teams/orgs that already standardized some of the processes to work with LLMs (MCP servers, standardized the creation of the agents.md files, etc) 3. Teams actively using it for coding new features, documenting code, increasing test coverage, using it for code reviews etc.

Again, personal, experience, but in my team ~40-50% of the PRs are generated by Codex.

incompatible•33m ago

I've heard of one study that said AI slows developers down, even when they think it's helping.

https://www.infoworld.com/article/4061078/the-productivity-p...

godelski•1h ago

It's kinda funny, Meta has long had some of the best in the field, but left them untapped. I really think if they just took a step back and stop being so metric focused and let their people freely explore then they'd be winning the AI race. But with this new team, I feel like meta mostly hired the people who are really good at gaming the system. The people that care more about the money than the research.

A bit of this is true at every major lab. There's tons of untapped potential. But these organizations are very risk adverse. I mean why not continue with the strategy that got us to the point we're at in the first place. Labs used to hire researchers and give them a lot of free reign. But those times ended and AI progress also slowed down. Maybe if you want to get ahead you gotta stop thinking like everyone else

Well meta... you can "hold me hostage" for a lot cheaper than those guys. I'm sure this is true for hundreds of passionate ML researchers. I'd take a huge pay cut to have autonomy and resources. I know for a fact there's many working at Mets right now that would do the same. Do maybe if you're going to throw money at the problem, diversify a bit and look back at what made SV what it is today and what made AI take leaps forward

bobxmax•40m ago

I thought Alex Wang was a very curious choice. There are so many foundational AI labs with interesting CEOs... I get that Wang is remarkable in his own right, but he basically just built MTurk and timed the bubble.

Doesn't really scream CEO of AGI to me.

thereitgoes456•8m ago

The reportings at the time said that he was Mark’s 5th choice or similar. It is fairly clear he would prefer Ilya, Murati, Mark Chen, and perhaps others, but they said no, and Alex Wang was the first one to say yes.

godelski•8m ago

A lot of people also don't know that many of the well known papers are just variations on small time papers with a fuck ton more compute thrown at the problem. Probably the strongest feature that correlates to successful researcher is compute. Many have taken this to claim that the GPU poor can't contribute but that ignores so many other valid explanations... and we wonder why innovation has slowed...

didip•9m ago

I always wonder about that. Those $100m Mathematicians... how can they have rooms to think under Meta's crushing IMPACT pressure?

ipsum2•1h ago

This has nothing to do with superintelligence, it's just the people that were working on the paper prior to the re-org happened to publish after the name change.

Though it is notable that contrary to many (on HN and Twitter) that Meta would stop publishing papers and be like other AI labs (e.g. OpenAI). They're continued their rapid pace of releasing papers AND open source models.

bigcat12345678•1h ago

https://docs.lamini.ai/memory_rag/ Similar approaches have been tried before already

pppoe•1h ago

I find it absurd that, compared to the past, large companies now have more abundant stock prices and cash than ever before, yet nearly every AI Lab in these companies is facing greater pressure than ever, being asked to generate short-term profits. In the midst of AI's unprecedented boom, the research environment and atmosphere in the industry seem to have worsened compared to the past.

signatoremo•1h ago

Is this Meta’s lab pressured to generate short term profits?

Which other under pressure labs are you talking about?

foldl2022•1h ago

So, show me the model weights, please.

yalogin•1h ago

I am not surprised because the culture at meta is not at all, even in the slightest, to focus on science for the sake of it. It’s actively actively purged out of you. The focus is on metrics and how the bottom line is impacted. So this is in line with that

rhetocj23•36m ago

Yeah and this problem is near impossible to fix once it has infested into the culture of the firm.

DangitBobby•14m ago

It's not always a bad thing though, like in this case they looked for a practical win and found one because impractical wins can't make them money.

CShorten•37m ago

Here is a video I made diving into the paper, hopefully helpful!

https://www.youtube.com/watch?v=Ek0tZootK00

nmca•22m ago

This is not work by any of the high profile new hires, in case folks are confused.

elyobo•21m ago

Can we have a more informative, less clickbaity, title?

Meta Superintelligence's surprising first paper

Vancouver Stock Exchange: Scam capital of the world (1989) [pdf]

Is Odin just a more boring C?

China's New Rare Earth and Magnet Restrictions Threaten US Defense Supply Chains

LineageOS 23

Google blocks Android hack that let Pixel users enable VoLTE anywhere

Microsoft only lets you opt out of AI photo scanning 3x a year

My First Murder

How Apple designs a virtual knob (2012)

Testing two 18 TB white label SATA hard drives from datablocks.dev

Show HN: rift – a tiling window manager for macOS

Rating 26 years of Java changes

The World Trade Center under construction through photos, 1966-1979

The <output> Tag

Windows Subsystem for FreeBSD

People regret buying Amazon smart displays after being bombarded with ads

Superpowers: How I'm using coding agents in October 2025

GNU Health

Paper2Video: Automatic Video Generation from Scientific Papers

Vibing a non-trivial Ghostty feature

The story of X-Copy on the Amiga

A Guide for WireGuard VPN Setup with Pi-Hole Adblock and Unbound DNS

Japan's summers have lengthened by 3 weeks over 42 years, say resaerchers

All-New Next Gen of UniFi Storage

Show HN: Solving the cluster 1 problem with vCluster standalone

Ask HN: Abandoned/dead projects you think died before their time and why?

A quiet change to RSA

Immutable Value

Indonesia says 22 plants in industrial zone contaminated by caesium 137

Beyond indexes: How open table formats optimize query performance

Meta Superintelligence's surprising first paper

Comments

Meta Superintelligence's surprising first paper

Vancouver Stock Exchange: Scam capital of the world (1989) [pdf]

Is Odin just a more boring C?

China's New Rare Earth and Magnet Restrictions Threaten US Defense Supply Chains

LineageOS 23

Google blocks Android hack that let Pixel users enable VoLTE anywhere

Microsoft only lets you opt out of AI photo scanning 3x a year

My First Murder

How Apple designs a virtual knob (2012)

Testing two 18 TB white label SATA hard drives from datablocks.dev

Show HN: rift – a tiling window manager for macOS

Rating 26 years of Java changes

The World Trade Center under construction through photos, 1966-1979

The <output> Tag

Windows Subsystem for FreeBSD

People regret buying Amazon smart displays after being bombarded with ads

Superpowers: How I'm using coding agents in October 2025

GNU Health

Paper2Video: Automatic Video Generation from Scientific Papers

Vibing a non-trivial Ghostty feature

The story of X-Copy on the Amiga

A Guide for WireGuard VPN Setup with Pi-Hole Adblock and Unbound DNS

Japan's summers have lengthened by 3 weeks over 42 years, say resaerchers

All-New Next Gen of UniFi Storage

Show HN: Solving the cluster 1 problem with vCluster standalone

Ask HN: Abandoned/dead projects you think died before their time and why?

A quiet change to RSA

Immutable Value

Indonesia says 22 plants in industrial zone contaminated by caesium 137

Beyond indexes: How open table formats optimize query performance