Interestingly enough, Rust does not allow you to access undefined memory, not even if you do not care about the value stored there. People have been proposing a `freeze` operation that replaces uninitialized memory with garbage but initialized data (i.e. a no-op in assembly).
But there is tension about this: Not allowing access to uninitialized memory, ever, means that you get more guarantees about what foreign (safe) Rust can do, for instance.
thrance•1d ago
True of safe rust only. You can always fall back to unsafe rust, allocate a chunk of bytes and write/read to it as you wish.
stouset•1d ago
Even in unsafe Rust, this is undefined behavior.
LoganDark•1d ago
You're allowed to trigger as much undefined behavior as you wish. It makes the program meaningless of course, but it's not like it stops you.
kmeisthax•1d ago
My impression was that there was some kind of optimization in LLVM that relied on being able to assume values were never undef[0], which is why undefined memory access was always illegal in Rust[1].
Putting that aside, a deliberate "read uninitialized memory with bounded UB" primitive like freeze would only work for types where all possible bit patterns are valid. So no freezing chars[2], references, or sum types. And any transparent wrapper type that has invariants - like, say, slices, vecs, strs, and/or range-restricted integer types - would see them utterly broken when frozen. I suppose you could define some operation to "validate" the underlying bit pattern, but I'm not sure if that would defeat the point of reading uninitialized memory.
[0] LLVM concept that represents uninitialized memory, among other things.
[1] I believe a few other unsafe Rust concepts are actually leaky abstractions around LLVM things
[2] Rust's char must hold valid UTF-8 and will UB if you stick surrogates in there
NobodyNada•1d ago
> there was some kind of optimization in LLVM that relied on being able to assume values were never undef
It's true that LLVM has restrictions on what you can do with undef/poison memory, but LLVM also supports the "freeze" operation that comes up in the Rust discussions (which transforms an undefined value into an arbitrary, well-defined value). It would certainly need to be unsafe to avoid violating invariants like you mentioned, but "LLVM" isn't the blocker to supporting this.
Rather, there are more subtle problems with reading from initialized memory -- for example on Linux, a heap allocator might use MADV_FREE on free memory, which hints to the kernel that a page contains freed memory and the operating system is not required to preserve its contents until the application writes to it again. This means the following sequence of events is possible:
- An application frees some memory, and the heap allocator invokes madvise(MADV_FREE) on the address range.
- The application makes a heap allocation, obtaining a pointer to the free'd memory.
- The application freezes the uninitialized memory and reads from it.
- Due to memory pressure, the kernel decides to reclaim the free'd memory. It unmaps it from the process and uses it somewhere else.
- The application accesses the first allocation again, and sees that its value has now changed to all-zeroes.
Thus, we can see that "freezing" arbitrary memory can't actually be implemented on real-world systems -- the contents of uninitialized memory really can change out from under you until you write to that memory.
It would be possible to implement a "by-reference freeze" that copies a MaybeUninit<T> to a new location, but introducing this functionality still has the downside that you can write a Heartbleed bug without invoking undefined behavior, which is what makes it controversial.
dooglius•1d ago
Seems like the heap allocator has a bug if it doesn't handle invalidating the free hint before it returns it to the application. This does raise the question of why MADV_FREE works on the basis of writes rather than accesses -- there are PTE bits for both cases right, and it would have been just as easy to have any access cancel the free hint? (I am assuming x86 here.)
dzaima•18h ago
That could be classified as a bug if it was decided that the allocator must guarantee that uninitialized memory is readable as a consistent value. Otherwise, making the allocator clear the hint is just unnecessary work.
Clearing the hint on read would probably be more sane, but would mean many more potential situations of unnecessarily losing it (GC doing unnecessary scanning, doing a heap dump, debuggers trying to read it, other sorts of memory scanning)
joseluis•1d ago
Just a nitpick. Rust's char is really a 21 bit unicode scalar value (a code point without surrogates) using a 32-bit representation and indeed there are a lot of invalid char values in a 32-bit space. Utf-8 is a different encoding format for code points using variable width (1-4 bytes per).
Sesse__•1d ago
An elegant optimization, but how would you intersect two of these efficiently? It sounds like you'd need to iterate over the entire dense vector and do a sparse-vector check for each and every value (O(m) with a very high constant factor). Either that, or sort both sparse vectors (O(n log n)).
dooglius•1d ago
Why would iterating over the dense vector be O(m) rather than O(n)?
Sesse__•19h ago
Sorry, I meant iterating over the sparse vector, not the dense vector (I find the nomenclature in the article somehow inverted).
dzaima•18h ago
Wouldn't it be just a trivial O(n) of a loop of "for (x in dense vector of one set) { if (is x a member of the other set) add to result; }"?
Sesse__•16h ago
True. The constant factor is nasty, though, compared to the 256-bits-per-instruction of normal bit sets.
dzaima•14h ago
Right; generally the constant factors of this approach are horrible though, can't think of any situation where it'd be worth it on systems with, well, cache (or a TLB for that matter, which is even worse off with the sparse memory usage).
dooglius•1d ago
One thing worth pointing out is that Linux makes it pretty difficult for userspace to access uninitialized memory; the MAP_UNINITIALIZED flag for mmap has to be specifically configured but generally isn't, so the memory does get zeroed at some point. Best you can hope for is that your heap allocator re-uses some un-munmapped memory. The kernel will zero pages on-demand, which helps, but you're still paying a cost for that zeroing.
jojomodding•1d ago
But there is tension about this: Not allowing access to uninitialized memory, ever, means that you get more guarantees about what foreign (safe) Rust can do, for instance.
thrance•1d ago
stouset•1d ago
LoganDark•1d ago
kmeisthax•1d ago
Putting that aside, a deliberate "read uninitialized memory with bounded UB" primitive like freeze would only work for types where all possible bit patterns are valid. So no freezing chars[2], references, or sum types. And any transparent wrapper type that has invariants - like, say, slices, vecs, strs, and/or range-restricted integer types - would see them utterly broken when frozen. I suppose you could define some operation to "validate" the underlying bit pattern, but I'm not sure if that would defeat the point of reading uninitialized memory.
[0] LLVM concept that represents uninitialized memory, among other things.
[1] I believe a few other unsafe Rust concepts are actually leaky abstractions around LLVM things
[2] Rust's char must hold valid UTF-8 and will UB if you stick surrogates in there
NobodyNada•1d ago
It's true that LLVM has restrictions on what you can do with undef/poison memory, but LLVM also supports the "freeze" operation that comes up in the Rust discussions (which transforms an undefined value into an arbitrary, well-defined value). It would certainly need to be unsafe to avoid violating invariants like you mentioned, but "LLVM" isn't the blocker to supporting this.
Rather, there are more subtle problems with reading from initialized memory -- for example on Linux, a heap allocator might use MADV_FREE on free memory, which hints to the kernel that a page contains freed memory and the operating system is not required to preserve its contents until the application writes to it again. This means the following sequence of events is possible:
- An application frees some memory, and the heap allocator invokes madvise(MADV_FREE) on the address range.
- The application makes a heap allocation, obtaining a pointer to the free'd memory.
- The application freezes the uninitialized memory and reads from it.
- Due to memory pressure, the kernel decides to reclaim the free'd memory. It unmaps it from the process and uses it somewhere else.
- The application accesses the first allocation again, and sees that its value has now changed to all-zeroes.
Thus, we can see that "freezing" arbitrary memory can't actually be implemented on real-world systems -- the contents of uninitialized memory really can change out from under you until you write to that memory.
It would be possible to implement a "by-reference freeze" that copies a MaybeUninit<T> to a new location, but introducing this functionality still has the downside that you can write a Heartbleed bug without invoking undefined behavior, which is what makes it controversial.
dooglius•1d ago
dzaima•18h ago
Clearing the hint on read would probably be more sane, but would mean many more potential situations of unnecessarily losing it (GC doing unnecessary scanning, doing a heap dump, debuggers trying to read it, other sorts of memory scanning)
joseluis•1d ago