If only there were some way to release the source code for your userland programs so that the computing public could look at the code, then offer a fix for a bug such as this.
Unfortunately, so far as I'm aware, there is no way to do this and having a few people who are working against what has to be a large number of deadlines look at extremely low-level code for very sophisticated software is the only way forward for these things.
"No way to prevent this" says proprietary codebases where this always happens
"No way to prevent this" say programmers of only languages where this regularly happens.
This only happens if you have the worst version of Tony's Billion Dollar Mistake. So C, C++, Zig, Odin and so on but not Rust.
It's a use-after-free, a category of mistake that's impossible in true GC languages, and also impossible in safe Rust. We have known, for many years, how to not have this problem, but some people who ought to know better insist they can't stop it, exactly like America's gun violence.
What is my_ptr->member but unwrapping an optionally null pointer.
It’s the whole make it easy to write good code—not impossible to write incorrect code philosophy of the language.
> This includes memory allocations of type NV01_MEMORY_DEVICELESS which are not associated with any device and therefore have the pGpu field of their corresponding MEMORY_DESCRIPTOR structure set to null
This does look like the type of null deref that Zig does prevent.
Looking at the second issue in the chain, I believe standard Zig would have prevented that as well.
The C code had an error that caused the call to free to be skipped:
threadStateInit(&threadState, THREAD_STATE_FLAGS_NONE);
status = rmapiMapWithSecInfo(/*…*/); // null deref here
threadStateFree(&threadState, THREAD_STATE_FLAGS_NONE);
Zig’s use of ‘defer’ would ensure that free is called even if an error occurred: threadStateInit(&threadState, THREAD_STATE_FLAGS_NONE);
defer threadStateFree(&threadState, THREAD_STATE_FLAGS_NONE);
status = try rmapiMapWithSecInfo(/*…*/); // null deref hereFollowed by never touching the variable ever again.
"'No Way to Prevent This,' Says Only Nation Where This Regularly Happens"
If your bar for mistakes is “what if you forget to add literally the next line of code in the incredibly common pattern”, I don’t really care to have a discussion about programming languages anymore.
You can forget to increment a loop and have your program not terminate so why don’t you program with language of exclusively primitive recursive functions?
Note that the mention of Zig that I responded to was in reference to Tony Hoare's "billion dollar mistake", which was making null a valid value of a pointer type, not free after use, which is a quite different issue. As I noted, the mistake doesn't occur in Zig because null is not a valid value for a pointer, only an optional pointer, which must be unwrapped with an explicit null test.
I do think it's a bit too easy to forget a deferred free, although it's possible for tools to detect them. Unfortunately Andrew Kelley is prone to being extremely opinionated about language design (GingerBill is another of that sort) and so macros are forever banned from Zig, but macros are the only mechanism for encapsulating a scoped feature like defer.
However actually in practice for this nVidia bug Zig's "defer" is just irrelevant, which is why nVidia's "fix" doesn't attempt the most similar C equivalent strategy and instead now performs a heap allocation (and thus free) on the happy path.
There's a kernel Oops, likely in someone else's code. When that happens our stack goes away. In Rust they can (I don't happen to know if they do in Rust for Linux but it is commonly used in some types of application) recover from disasters and unwind the stack before it's gone, such as removing the threadState from that global state. In Zig that's prohibited by the language design, all panics are fatal.
A kernel oops isn’t a panic at least however zig or rust defines a panic. So zig saying things about panics don’t apply here.
Rust fails here the same exact way if drop semantics aren’t upheld (they aren’t afaik). Also Rust’s soundness goes immediately out the window if UB happens in unsafe code. So immediately when a kernel Oops happens safety is moot point.
I’m not sure if Zig has a clean way to kill a thread, unwind the stack, and run deferred code. Zig is a pre-1.0 language after all so it’s allowed to be missing features.
Zig deliberately only has fatal panic. This isn't a "missing feature" it's intentional
Yeah not really sure why I bother. I think I just get bothered that Rust gets touted everywhere as a silver bullet.
> Tony Hoare's "billion dollar mistake", which was making null a valid value of a pointer type
It’s funny how we got stuck with his biggest mistake for decades and his (probably not entirely his) algebraic types / tagged unions have just started to get first class support now.
Still, there are languages with guardrails, and then there are languages with guardrails, and the order for memory safety is probably something like C < C++ < Zig < Rust < managed (GC) languages.
It's a dereference of a pointer that might be null and thereby yield undefined behavior; there's no required unwrapping under an explicit test for null, as is required by Zig. In Zig, my_ptr cannot be null in my_ptr.member -- null is not a valid pointer value. If my_ptr is an optionally null pointer then the pointer value must be unwrapped by first checking whether it is null ... the dereference can only occur in the test branch where the pointer isn't null.
Note that the mention of Zig that I responded to was in reference to Tony Hoare's "billion dollar mistake", which was making null a valid value of a pointer type. As I noted, the mistake doesn't occur in Zig because null is not a valid value for a pointer, only an optional pointer, which must be unwrapped with an explicit null test.
If you had no idea what I was referring to, you might have asked. Rather, you asked a rhetorical question with no question mark, implying the falsehood that my_ptr->member is "unwrapping an optionally null pointer" when it's nothing of the sort.
The billion dollar mistake is making NULL a valid pointer value, not use after free--which has nothing to do with null pointers and which I didn't comment on, as the comment I responded to only mentioned Zig in regard to the billion dollar mistake. The billion dollar mistake doesn't occur in Zig because null is not a valid value for a pointer, only an optional pointer, which must be unwrapped with an explicit null test.
The approach taken is the same as in virtually every other language that has avoided the billion dollar mistake -- null is not a valid pointer value, and instead there's an additional union type (called Optional, Maybe, etc.) that can hold Some(pointer) or None. Zig, like some other languages, extends this union beyond pointers to other types.
These bugs are in the already open sourced kernel modules, the userland components are largely irrelevant as long as an attacker can just do invoke the affected ioctl directly.
See Spectre and Meltdown - if it was easy to exploit then we would all be pwned unpatched just by running the Windows installer - just like how Windows XP machines used to do that back in the day....
If your exploit requires lots of disassembling, decrypting random ad-hoc custom crypto, and even finding what you're looking for in some random 100MB .dll, that just isn't very likely to be found except by the nationstate guys. The signal-to-noise ratio is a wonderful thing. It's much easier to hide something amongst very mundane things (most secrets are boring) than to heavily guard something and advertise "SECRETS ARE HERE". There's quite a few examples of this in various programs and web services, you obviously don't know because you didn't find it!
My point is that I suspect that the Nvidia driver is a decades-long project, and dropping everything and rewriting in Rust isn't really realistic .
mustache_kimono•3mo ago
> 2025-08-11 NVIDIA reiterated the request to postpone disclosure until mid-January 2026.
> 2025-08-12 Quarkslab replied that the bugs were first reported in June 18th and mid-January was well past the standard 90 day normally agreed for coordinated disclosure and that we did not see a rationale for postponing publication by, at a minimum, 3 months. Therefore Quarkslab continued with the publication deadline set to September 23rd 2025 and offered to extend the deadline an additional 30 days provided NVIDIA gave us some insights about the full scope of affected products and if the fixes are to be released as a stand alone security fix, as opposed to rolled into a version bump that includes other code changes.
Richest corporation in the world needs 7 months to remedy? Why not 4 years?
zetanor•3mo ago
Microsoft might hold a patent on this.
themafia•3mo ago
At least until the SEC starts punishing revenue inflation through self-dealing.
FridgeSeal•3mo ago
willis936•3mo ago
pjc50•3mo ago