As a fellow Tri Dao groupie and lucky duck who gets to build on Hopper/Blackwell clusters, I find it amazing how difficult it is becoming to write kernels that saturate GPU hardware.
When I squint, there appears to be a trend emerging across work like FA4, monolithic (mega) kernels, etc. Namely, a subversion of the classic CUDA programming model in the form of fine grained task based parallelism, managed entirely in “user space”.
Not exactly sure what’s ahead but I’m strapping in for a wild ride…
I was also reminded of HazyResearch's MegaKernels. Didn't want to distract from the main thrust of the post, but definitely think that's a promising approach.
This question set aside, I'm not fan at all of this blog post content, might be me being too stupid, but I don't think that it is well understandable. Very few concrete info and a lot of digressions. Like the constant reference to research article or reference on related topics. Looks like low value research papers trying to show that you did your work with lot of references.
Couple of years ago I did some experiments using a surrogate for attention using a feed forward network (MLP) to avoid the quadratic explosion.
It worked but had problems at the time, and my mind wasn't really in it.
This has dug it back out again with the benefit of time and additional insights.
So now I'm thinking, you can use a lot of the insights in the work here, but also shoot for a full linear scaling surrogate.
The trick is to use the surrogate as a discriminator under an RL regime during training.
Instead of just applying better/faster math and optimizations alone, have the model learn to work with a fundamentally better inference approach during training.
If you do that, you can turn the approximation error present in the FFN surrogate inference method into a recovery signal encoded into the model itself.
I haven't tried it, but don't see a reason it shouldn't work. Will give it a go on a GPT-2 model ASAP.
Thanks again for the awesome article.
petters•4mo ago
charles_irl•4mo ago
Reductively, software engineering means taking an idea and mapping it into code. So one form of "reverse" engineering would be taking the code and extracting the ideas. That's what we did here.
Because the source is public, there's quite a lot to work with from the start -- the warp specializations are named and there are helpful comments in many places.
But for many components, we didn't have much. Maybe the clearest case of "reverse engineering" explained in the post is with the cubic approximation for the rational part of the exponentiation. That required staring at some inline assembly and doing math.
metadat•4mo ago
Not trying to be uncharitable, I found your article informative. Reverse engineering has historically been reserved for cases where there is an adversial aspect, as in binaries or server APIs. Anyhow, Cheers and thank you, sincerely.
Zacharias030•4mo ago
pests•4mo ago
VBprogrammer•4mo ago
Certainly I can't get on board with reverse engineered.
heavyset_go•4mo ago
greatgib•4mo ago
If you had reverse engineered it, you would have tried to "recreate something" that does not exist to do the same.
So, if you have a binary code, you recreate the source code that in theory could allow you to recreate the binary.
If you have the source code, I guess that would be when you are missing pieces of info that allows you to run this code like it is done by others...
hackinthebochs•4mo ago
heavyset_go•4mo ago
For example, simple hardware reversing can just be learning what, how and why something works, you don't need to "recreate" anything other than ideas.
unnah•4mo ago
Thus it was natural to call the process of producing design documents from undocumented software "reverse engineering". These days coding without any formal design documents is so common that it seems the original meaning of reverse engineering has become obscured.
knome•4mo ago
unnah•4mo ago
https://en.wikipedia.org/wiki/IBM_Rational_Rose
cmrx64•4mo ago
varispeed•4mo ago
billy99k•4mo ago
saagarjha•4mo ago
> cudnn kernels are closed source, so Jensen only knows what’s going on in there.
edunteman•4mo ago
taneq•4mo ago
Starting from high level source code is like starting from engineering drawings or the CAD model. You've already been handed most or all of the info that reverse engineering is attempting to recover.
hackinthebochs•4mo ago
bthornbury•4mo ago
LoganDark•4mo ago
lelanthran•4mo ago
I would get some pretty weird looks if I changed my CV to replace "maintained legacy application that I did not write" with "reverse engineering".
Similarly, I would get instant hoots of laughter if told my dev managers over the last 28 years that I reversed engineered the legacy application I was hired to work on.
I mean, I get what you're saying, but when you use the term "reverse engineering" in the context of software, you're just going to confuse everyone who already knows what it means.
magicalhippo•4mo ago
Because I'm pretty sure most devs would not just read the code and go "ah yes, of course".
[1]: https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...
bongodongobob•4mo ago
baq•4mo ago
bongodongobob•4mo ago
quotemstr•4mo ago
bongodongobob•4mo ago
quotemstr•4mo ago
bongodongobob•4mo ago
vasachi•4mo ago
LoganDark•4mo ago
Though password cracking is not necessarily the best example, some (very bad!) hashing algorithms can actually be reversed that way. Figuring out the reverse is, reverse engineering. You would reverse engineer the algorithm to figure out how to create a collision that way. Same as superoptimizers sort of reverse engineer the behavior you want to come up with a very efficient implementation. I'm using the term reverse engineer a bit loosely there but you get the point. It has nothing to do with source code really, you can just as easily reverse engineer physical objects. Or artwork. Or the psyche.
So yes, you can reverse engineer source code to understand on a deeper level how it works. Sometimes reading it over once or twice is enough for this, sometimes even reading the API documentation or observing behavior is enough, but sometimes you have to do a bit of thinking and/or testing to fully understand it.
hackinthebochs•4mo ago
baq•4mo ago
msl•4mo ago
According to Wikipedia[1], "In 1990, the Institute of Electrical and Electronics Engineers (IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction" in which the "subject system" is the end product of software development." It goes on to clarify that "Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product."
Further, "There are two components in reverse engineering: redocumentation and design recovery."
Are you arguing that the work here does not fit the definition or that the definition is wrong? In the latter case, could you please share your definition, and maybe even explain why it is superior to IEEE's?
[1] https://en.wikipedia.org/wiki/Reverse_engineering#Software
magicalhippo•4mo ago
If reverse engineering is reserved for cases without source code, which I assume also means no decompilation which often is an option, then what do we call figuring out what some piece of code does and why it does what it does? And why is it sufficiently different from reverse engineering to warrant a separate term?
WithinReason•4mo ago
SteveJS•4mo ago