Using uninitialized memory for fun and profit (2008)

https://research.swtch.com/sparse

34•AminZamani•4d ago

Comments

jojomodding•1d ago

Interestingly enough, Rust does not allow you to access undefined memory, not even if you do not care about the value stored there. People have been proposing a `freeze` operation that replaces uninitialized memory with garbage but initialized data (i.e. a no-op in assembly).

But there is tension about this: Not allowing access to uninitialized memory, ever, means that you get more guarantees about what foreign (safe) Rust can do, for instance.

thrance•1d ago

True of safe rust only. You can always fall back to unsafe rust, allocate a chunk of bytes and write/read to it as you wish.

stouset•1d ago

Even in unsafe Rust, this is undefined behavior.

LoganDark•1d ago

You're allowed to trigger as much undefined behavior as you wish. It makes the program meaningless of course, but it's not like it stops you.

kmeisthax•1d ago

My impression was that there was some kind of optimization in LLVM that relied on being able to assume values were never undef[0], which is why undefined memory access was always illegal in Rust[1].

Putting that aside, a deliberate "read uninitialized memory with bounded UB" primitive like freeze would only work for types where all possible bit patterns are valid. So no freezing chars[2], references, or sum types. And any transparent wrapper type that has invariants - like, say, slices, vecs, strs, and/or range-restricted integer types - would see them utterly broken when frozen. I suppose you could define some operation to "validate" the underlying bit pattern, but I'm not sure if that would defeat the point of reading uninitialized memory.

[0] LLVM concept that represents uninitialized memory, among other things.

[1] I believe a few other unsafe Rust concepts are actually leaky abstractions around LLVM things

[2] Rust's char must hold valid UTF-8 and will UB if you stick surrogates in there

NobodyNada•1d ago

> there was some kind of optimization in LLVM that relied on being able to assume values were never undef

It's true that LLVM has restrictions on what you can do with undef/poison memory, but LLVM also supports the "freeze" operation that comes up in the Rust discussions (which transforms an undefined value into an arbitrary, well-defined value). It would certainly need to be unsafe to avoid violating invariants like you mentioned, but "LLVM" isn't the blocker to supporting this.

Rather, there are more subtle problems with reading from initialized memory -- for example on Linux, a heap allocator might use MADV_FREE on free memory, which hints to the kernel that a page contains freed memory and the operating system is not required to preserve its contents until the application writes to it again. This means the following sequence of events is possible:

- An application frees some memory, and the heap allocator invokes madvise(MADV_FREE) on the address range.

- The application makes a heap allocation, obtaining a pointer to the free'd memory.

- The application freezes the uninitialized memory and reads from it.

- Due to memory pressure, the kernel decides to reclaim the free'd memory. It unmaps it from the process and uses it somewhere else.

- The application accesses the first allocation again, and sees that its value has now changed to all-zeroes.

Thus, we can see that "freezing" arbitrary memory can't actually be implemented on real-world systems -- the contents of uninitialized memory really can change out from under you until you write to that memory.

It would be possible to implement a "by-reference freeze" that copies a MaybeUninit<T> to a new location, but introducing this functionality still has the downside that you can write a Heartbleed bug without invoking undefined behavior, which is what makes it controversial.

dooglius•1d ago

Seems like the heap allocator has a bug if it doesn't handle invalidating the free hint before it returns it to the application. This does raise the question of why MADV_FREE works on the basis of writes rather than accesses -- there are PTE bits for both cases right, and it would have been just as easy to have any access cancel the free hint? (I am assuming x86 here.)

dzaima•18h ago

That could be classified as a bug if it was decided that the allocator must guarantee that uninitialized memory is readable as a consistent value. Otherwise, making the allocator clear the hint is just unnecessary work.

Clearing the hint on read would probably be more sane, but would mean many more potential situations of unnecessarily losing it (GC doing unnecessary scanning, doing a heap dump, debuggers trying to read it, other sorts of memory scanning)

joseluis•1d ago

Just a nitpick. Rust's char is really a 21 bit unicode scalar value (a code point without surrogates) using a 32-bit representation and indeed there are a lot of invalid char values in a 32-bit space. Utf-8 is a different encoding format for code points using variable width (1-4 bytes per).

Sesse__•1d ago

An elegant optimization, but how would you intersect two of these efficiently? It sounds like you'd need to iterate over the entire dense vector and do a sparse-vector check for each and every value (O(m) with a very high constant factor). Either that, or sort both sparse vectors (O(n log n)).

dooglius•1d ago

Why would iterating over the dense vector be O(m) rather than O(n)?

Sesse__•19h ago

Sorry, I meant iterating over the sparse vector, not the dense vector (I find the nomenclature in the article somehow inverted).

dzaima•18h ago

Wouldn't it be just a trivial O(n) of a loop of "for (x in dense vector of one set) { if (is x a member of the other set) add to result; }"?

Sesse__•16h ago

True. The constant factor is nasty, though, compared to the 256-bits-per-instruction of normal bit sets.

dzaima•14h ago

Right; generally the constant factors of this approach are horrible though, can't think of any situation where it'd be worth it on systems with, well, cache (or a TLB for that matter, which is even worse off with the sparse memory usage).

dooglius•1d ago

One thing worth pointing out is that Linux makes it pretty difficult for userspace to access uninitialized memory; the MAP_UNINITIALIZED flag for mmap has to be specifically configured but generally isn't, so the memory does get zeroed at some point. Best you can hope for is that your heap allocator re-uses some un-munmapped memory. The kernel will zero pages on-demand, which helps, but you're still paying a cost for that zeroing.

Graphene OS: a security-enhanced Android build

Scientists may have found a way to eliminate chromosome linked to Down syndrome

Inter-Planetary Network Special Interest Group

Positron – A next-generation data science IDE

I wasted weeks hand optimizing assembly because I benchmarked on random data

AMD CEO sees chips from TSMC's US plant costing 5%-20% more

There is no memory safety without thread safety

Alto turns your Apple Notes into a website

A GPU Calculator That Helps Calculate What GPU to Use

Air Force unit suspends use of Sig Sauer pistol after shooting death of airman

PSA: SQLite WAL checksums fail silently and may lose data

RE#: High performance derivative-based regular expression matching (2024)

New Aarch64 Back End

Visa and Mastercard: The global payment duopoly (2024)

Use Your Type System

Revisiting Moneyball

Open Source Maintenance Fee

Information Warfare

How Anthropic teams use Claude Code

Covers as a way of learning music and code

Vet is a safety net for the curl | bash pattern

Intel CEO Letter to Employees

Why concatenative programming matters (2012)

Bus Bunching

Low-Temp 2D Semiconductors: A Chipmaking Shift

Writing is thinking

Mwm – The smallest usable X11 window manager

Show HN: Nia – MCP server that gives more docs and repos to coding agents

UK: Phone networks down: EE, BT, Three, Vodafone, O2 not working in mass outage

The POSIX specification of vi

Graphene OS: a security-enhanced Android build

Scientists may have found a way to eliminate chromosome linked to Down syndrome

Inter-Planetary Network Special Interest Group

Positron – A next-generation data science IDE

I wasted weeks hand optimizing assembly because I benchmarked on random data

AMD CEO sees chips from TSMC's US plant costing 5%-20% more

There is no memory safety without thread safety

Alto turns your Apple Notes into a website

A GPU Calculator That Helps Calculate What GPU to Use

Air Force unit suspends use of Sig Sauer pistol after shooting death of airman

PSA: SQLite WAL checksums fail silently and may lose data

RE#: High performance derivative-based regular expression matching (2024)

New Aarch64 Back End

Visa and Mastercard: The global payment duopoly (2024)

Use Your Type System

Revisiting Moneyball

Open Source Maintenance Fee

Information Warfare

How Anthropic teams use Claude Code

Covers as a way of learning music and code

Vet is a safety net for the curl | bash pattern

Intel CEO Letter to Employees

Why concatenative programming matters (2012)

Bus Bunching

Low-Temp 2D Semiconductors: A Chipmaking Shift

Writing is thinking

Mwm – The smallest usable X11 window manager

Show HN: Nia – MCP server that gives more docs and repos to coding agents

UK: Phone networks down: EE, BT, Three, Vodafone, O2 not working in mass outage

The POSIX specification of vi

Using uninitialized memory for fun and profit (2008)

Comments