We reverse-engineered Flash Attention 4

https://modal.com/blog/reverse-engineer-flash-attention-4

134•birdculture•4mo ago

Comments

petters•4mo ago

Is reading the source code reverse engineering?

charles_irl•4mo ago

Hey, one of the authors here!

Reductively, software engineering means taking an idea and mapping it into code. So one form of "reverse" engineering would be taking the code and extracting the ideas. That's what we did here.

Because the source is public, there's quite a lot to work with from the start -- the warp specializations are named and there are helpful comments in many places.

But for many components, we didn't have much. Maybe the clearest case of "reverse engineering" explained in the post is with the cubic approximation for the rational part of the exponentiation. That required staring at some inline assembly and doing math.

metadat•4mo ago

I've never heard of this definition of reverse engineering -- when one has the unobfuscated actual source code I'd usually call it: reading the code, or something like summarization.

Not trying to be uncharitable, I found your article informative. Reverse engineering has historically been reserved for cases where there is an adversial aspect, as in binaries or server APIs. Anyhow, Cheers and thank you, sincerely.

Zacharias030•4mo ago

That time when I reverse engineered JRR Tolkien‘s Lord of the rings from symbols engraved on dead trees. Took me three summers…

pests•4mo ago

Having the source code and understanding how it works is two different things, especially when running on state of the art hardware. If I had just read the source I would not have gained as much knowledge as this article taught me. Where did this extra info come from? They read the source too, but then they did something more. I wouldn’t call it summarization either, as again any summary I wrote about the code would pale in comparison.

VBprogrammer•4mo ago

I think "explained" is a reasonable term for this. If I remember correctly there where books of the form "The Linux Source Code Explained".

Certainly I can't get on board with reverse engineered.

heavyset_go•4mo ago

You've never had to reverse engineer the thinking and ideas that went behind code written by someone else/you a year ago?

greatgib•4mo ago

No, because so far you "engineered" nothing. You just studied it, tried to understand it, and explain or teach it.

If you had reverse engineered it, you would have tried to "recreate something" that does not exist to do the same.

So, if you have a binary code, you recreate the source code that in theory could allow you to recreate the binary.

If you have the source code, I guess that would be when you are missing pieces of info that allows you to run this code like it is done by others...

hackinthebochs•4mo ago

You guys are being obtuse. Engineering is turning a spec into a more technical artifact, whether that's source code, machine code, physical hardware or something else. Reverse engineering is then reserving the process of engineering, recovering the semantic artifact from the engineering artifact. That the OP is using the term in the sense of recovering the semantic insights from the cuda kernels is a fine application of the concept.

heavyset_go•4mo ago

Disagree that reverse engineering necessarily requires something to be recreated.

For example, simple hardware reversing can just be learning what, how and why something works, you don't need to "recreate" anything other than ideas.

unnah•4mo ago

That is the traditional explanation of why it is called reverse engineering. The term originated in hardware engineering. When it was originally applied to software, it was common to create requirements documents and design documents before coding, even if the actual process did not strictly follow the "waterfall" idea.

Thus it was natural to call the process of producing design documents from undocumented software "reverse engineering". These days coding without any formal design documents is so common that it seems the original meaning of reverse engineering has become obscured.

knome•4mo ago

What time period and area did you come across this usage? As I ever saw it used, 'reverse engineering' generally referred to creating docs from executables or watching network protocols rather than from source.

unnah•4mo ago

Back in the 1990's. As an example, back then the Rational Rose design software had a feature to generate UML diagrams from existing source code, and it was called "reverse engineering".

https://en.wikipedia.org/wiki/IBM_Rational_Rose

cmrx64•4mo ago

it’s more properly just software archaeology; recovering design intent from artifacts https://en.m.wikipedia.org/wiki/Software_archaeology

varispeed•4mo ago

I reverse engineered above comment by reading it and extracting the idea.

billy99k•4mo ago

It's the 'hacker' argument all over again.

saagarjha•4mo ago

I have to say this is kind of funny given that you also had this in the blog post:

> cudnn kernels are closed source, so Jensen only knows what’s going on in there.

edunteman•4mo ago

I'd argue that understanding disassembled assembly could be considered reverse engineering, which would logically extend to source code unless we draw the line at compilation

taneq•4mo ago

We kinda... do? Draw the line there, I mean. Reverse engineering, as I've always heard the term used, is taking the final artifact and working backwards to infer the original design, and ideally some of the reasons for the decisions made. If you take a shipped binary, disassemble/decompile it, figure out what the variables mean and how it all works, that's reverse engineering. It's the equivalent of taking a mechanism, pulling it apart, and figuring out the cause and effect of how it works, to the extent that you can duplicate it and even modify the functionality.

Starting from high level source code is like starting from engineering drawings or the CAD model. You've already been handed most or all of the info that reverse engineering is attempting to recover.

hackinthebochs•4mo ago

Source code doesn't inherently contain the "why" of the operations. Code itself is an engineering artifact, so recovering the why is a kind of reverse engineering.

bthornbury•4mo ago

I'm pretty sure it's called "reading the code". That said, it is difficult enough in its own right.

LoganDark•4mo ago

Reading the source code is one thing, understanding it is another. Reverse engineering source code can be as simple as figuring out the original meaning/intent behind the code when it isn't immediately obvious or documented.

lelanthran•4mo ago

> Reading the source code is one thing, understanding it is another. Reverse engineering source code can be as simple as figuring out the original meaning/intent behind the code when it isn't immediately obvious or documented.

I would get some pretty weird looks if I changed my CV to replace "maintained legacy application that I did not write" with "reverse engineering".

Similarly, I would get instant hoots of laughter if told my dev managers over the last 28 years that I reversed engineered the legacy application I was hired to work on.

I mean, I get what you're saying, but when you use the term "reverse engineering" in the context of software, you're just going to confuse everyone who already knows what it means.

magicalhippo•4mo ago

So, what would you call studying the code for the fast inverse square root[1] in the Quake source code so you truly understand it, to the point you can explain what it does to someone else without invoking words like "magic" or similar?

Because I'm pretty sure most devs would not just read the code and go "ah yes, of course".

[1]: https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...

bongodongobob•4mo ago

Reverse engineering has always been without code. I didn't reverse engineer a project at work by reading the code. I literally just read the code. How is reading code you didn't write reverse engineering?

baq•4mo ago

Code doesn’t tell you why it is like it is. Code is just the what. Engineering is the why and why not.

bongodongobob•4mo ago

That's just learning the codebase. Reverse engineering is absolutely not that. Why is this even in question.

quotemstr•4mo ago

Learning codebases involves a significant amount of reverse engineering! You have to get into the head of the authors and make guesses about why things work the way they do.

bongodongobob•4mo ago

Yeah, that's just understanding the code. Reverse engineering is figuring out the code when you don't have it.

quotemstr•4mo ago

What does it mean to "have" "it"? You might have assembly. That's an "it" that you can "have". Plenty of people derive meaning from it. Some people, like the retro gaming people, even use the assembly as the "form preferred for modification" if they don't have the original source. How is inferring the intent of, say, dense uncommented C or dense CUDA much different?

bongodongobob•4mo ago

This is just stupid semantic arguing. In the situations where you have assembly, its from getting it in some arcane way that is not supposed to happen. Building something to rip Nintendo roms for example. Looking at a codebase isn't reverse engineering.

vasachi•4mo ago

Are… Are you pulling out the “It depends on what the meaning of the word ‘is’ is.” trick?

LoganDark•4mo ago

Reverse engineering is not only deriving source code from an executable. Reverse engineering is figuring out what resulted in any given solution. This could be the source code that resulted in a given executable, or it could be the design decisions, considerations, and reasoning behind some given source code. You can go even further and reverse engineer those requirements to guess at the problems they were meant to solve, and so on and so forth. Reverse engineering is literally just going backwards: from machine code to source code, from source code to ideas and thoughts, from ideas and thoughts to the inciting problems or even more fundamental things. You can also reverse from further in the other direction, i.e. reverse a binary from a desired output (superoptimization!), reverse the desired output from the result of a calculation involving it (hello password cracking), etc.

Though password cracking is not necessarily the best example, some (very bad!) hashing algorithms can actually be reversed that way. Figuring out the reverse is, reverse engineering. You would reverse engineer the algorithm to figure out how to create a collision that way. Same as superoptimizers sort of reverse engineer the behavior you want to come up with a very efficient implementation. I'm using the term reverse engineer a bit loosely there but you get the point. It has nothing to do with source code really, you can just as easily reverse engineer physical objects. Or artwork. Or the psyche.

So yes, you can reverse engineer source code to understand on a deeper level how it works. Sometimes reading it over once or twice is enough for this, sometimes even reading the API documentation or observing behavior is enough, but sometimes you have to do a bit of thinking and/or testing to fully understand it.

hackinthebochs•4mo ago

We need a name for the phenomenon where a popularization of a term is more narrow than its original usage and then people who only encountered the popularized word insist that the narrow application is its only meaning.

baq•4mo ago

'ignorance'?

msl•4mo ago

You keep on asserting that, but what are you basing it on?

According to Wikipedia[1], "In 1990, the Institute of Electrical and Electronics Engineers (IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction" in which the "subject system" is the end product of software development." It goes on to clarify that "Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product."

Further, "There are two components in reverse engineering: redocumentation and design recovery."

Are you arguing that the work here does not fit the definition or that the definition is wrong? In the latter case, could you please share your definition, and maybe even explain why it is superior to IEEE's?

[1] https://en.wikipedia.org/wiki/Reverse_engineering#Software

magicalhippo•4mo ago

Just reading non-trivial code often does not give you any insight into why the code does what it does, or why it doesn't do something else, or even sometimes what it really does.

If reverse engineering is reserved for cases without source code, which I assume also means no decompilation which often is an option, then what do we call figuring out what some piece of code does and why it does what it does? And why is it sufficiently different from reverse engineering to warrant a separate term?

WithinReason•4mo ago

analyzing

SteveJS•4mo ago

The content is good. I’m glad i ignored a similar negative reaction to the reverse engineering framing.

refibrillator•4mo ago

Great exposition, loved the touch of humor. Please do the backward pass when it’s published.

As a fellow Tri Dao groupie and lucky duck who gets to build on Hopper/Blackwell clusters, I find it amazing how difficult it is becoming to write kernels that saturate GPU hardware.

When I squint, there appears to be a trend emerging across work like FA4, monolithic (mega) kernels, etc. Namely, a subversion of the classic CUDA programming model in the form of fine grained task based parallelism, managed entirely in “user space”.

Not exactly sure what’s ahead but I’m strapping in for a wild ride…

charles_irl•4mo ago

Thanks! I think computers are fun and I want reading about them to be fun too.

I was also reminded of HazyResearch's MegaKernels. Didn't want to distract from the main thrust of the post, but definitely think that's a promising approach.

emaadm•4mo ago

There's some interesting work in NeurIPS this year on fused kernels for MoE too: https://flash-moe.github.io/

kweezar•4mo ago

Any great learning resources for beginners friendly GPU programming?

arthurcolle•4mo ago

Modal's CUDA Book is cool

patrulek•4mo ago

It seems that in spec-driven development era "reverse enginnering" gonna change its meaning...

askl•4mo ago

Quite confusing name. I was hoping this was something about Adobe flash.

greatgib•4mo ago

Looking at the title of this post, when you do PR reviews, you are "reverse engineers"...

This question set aside, I'm not fan at all of this blog post content, might be me being too stupid, but I don't think that it is well understandable. Very few concrete info and a lot of digressions. Like the constant reference to research article or reference on related topics. Looks like low value research papers trying to show that you did your work with lot of references.

quatonion•4mo ago

This is really interesting. I always wondered how it works.

Couple of years ago I did some experiments using a surrogate for attention using a feed forward network (MLP) to avoid the quadratic explosion.

It worked but had problems at the time, and my mind wasn't really in it.

This has dug it back out again with the benefit of time and additional insights.

So now I'm thinking, you can use a lot of the insights in the work here, but also shoot for a full linear scaling surrogate.

The trick is to use the surrogate as a discriminator under an RL regime during training.

Instead of just applying better/faster math and optimizations alone, have the model learn to work with a fundamentally better inference approach during training.

If you do that, you can turn the approximation error present in the FFN surrogate inference method into a recovery signal encoded into the model itself.

I haven't tried it, but don't see a reason it shouldn't work. Will give it a go on a GPT-2 model ASAP.

Thanks again for the awesome article.

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw Respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw Respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

We reverse-engineered Flash Attention 4

Comments