Iirc the work on safe transmute also involves a sort of “any bit pattern” trait?
I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application. It’s really a case where the abstract machine cart is leading the horse
I totally understand not wanting to promise things get zeroed, but I don't really understand why full UB, instead of just "they have whatever value is initially in memory / the register / the compiler chose" is so much better.
Has anyone ever done a performance comparison between UB and freezing I wonder? I can't find one.
Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.
I would imagine there isn't that many cases where we are reading uninitalised memory and counting on that reading not saving a value. It would happen when reading in 8-byte blocks for alignment, but does it happen that much elsewhere?
The only way to really know is to test this. Compilers and their optimizations depend on a lot of things. Even the order and layout of instructions can matter due to the instruction cache. You can always go and make the guarantee later on, but undoing it would be impossible.
it pretty much requires the compiler to initialize all values when they first "appear"
except that this is impossible and outright hazardous if pointers are involved
But doable for a small subset like e.g.
- stack values (but would inhibit optimizations, potentially pretty badly)
- some allocations e.g. I/O buffers, (except C alloc has no idea that you are allocating an I/O buffer)
Can you provide (on say x86_64) an example of this, other than the case where the compiler prunes cases based on characterizing certain paths as UB? In other words, a case where "an uninitialized value is well-defined but can be different on each read" allows more performance optimization than "the value will be the same on each read".
> Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.
This does not sound correct to me, at least for Linux (assuming one isn't directly requesting such behavior with madvise or something). Do you have more information?
And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.
what is out there is still pretty wield
just slightly less
> probably nothing significant on modern out-of-order machines.
having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)
out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all
and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time
and realize sometimes the UB is even in the hardware registers
and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug
and other fun like that
so the insanity is reality not the compiler
(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)
EDIT: one could see "apparent" violation of memory consistency if say the cache subsystem or memory controller were misconfigured, however this would require both (1) you are running in kernel mode, not user-space (2) you have a bug, so GP's claim is not supported that bug-free code could encounter such a state.
This seems very sensitive to specific definitions that others might not share. DRAM is provided with a spec sheet that defines its behavior (if you write to an address, you’ll read back the same value from the same address in the future) under certain conditions. If you violate those conditions, the behavior is… undefined. If you operate DRAM with the wrong refresh timing, or temperature, or voltage, or ionizing radiation level, you may see strange behavior. Even non-local behavior, where the value read from one cell depends on other cells (RowHammer). How is this not UB?
I'm not exaggerating by a legalistic interpretation, and I'm only slightly exaggerating in practice. UB can do some really weird, unintuitive stuff in practice on real hardware:
https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-...
The point is that this extreme UB should never happen. It was a choice of the compiler implementors, and rather than fix this, they allowed the escape hatch of UB in the spec. It would be more sensible for the compiler to say that, e.g., accessing uninitialized memory results in a nonspecified value, or even possibly multiple different nonspecified values if accessed from different threads. That captures what we expect to happen, but would be (according to C language spec lawyers) defined behavior.
In practice, it would mean that compliant compilers will ensure that any situation in which uninitialized memory could be accessed would not result in weird edge case page faults on certain architectures or whatever that could in fact lead to wacky UB situations.
This is not an unreasonable ask.
But isn't this exactly parallel to the Rowhammer case in DRAM? When operating at the edge of the spec, the behavior of DRAM becomes undefined. (And, of course, one challenge with Rowhammer was about /which/ edge of the spec this happened on.) In this case, writing one physical address altered the contents of other physical addresses. This is "really weird, unintuitive stuff … on real hardware." And of course we can (and do) ask DRAM vendors to not take advantage of this undefined behavior; but they do so as an optimization, allowing slightly smaller and more closely spaced DRAM cells, and thus higher density DRAM dice for the same price. Just like it's possible to work with a language with fully-defined semantics at the cost of performance, it's possible to buy DRAM with much wider specifications and more clearly defined behavior up to the edges of those specifications… at the cost of performance.
Extreme UB in both hardware and software is a choice of priorities. You may favor giving up performance capabilities to achieve a more understandable system (so do I! I do most of my work on in-order lockstep processors with on-die ECC SRAM to maximize understandability of failure modes), but the market as a whole clearly does not, in both hardware and software.
I'm not saying that a computer architecture should be UB-free. That would be awesome if it could be done, but in practice probably a bridge too far. But a compiler should map high-level directives into low-level implementations on a specific architecture using constructs that do not result in UB. This is not too much to ask.
A compiler can't reasonably protect you from rowhammer attacks. But it should guarantee that, barring hardware errors, accessing uninitialized memory has no effect other than something sensible like returning unspecified contents, or causing a memory access exception, or whatever. It should be defined up front what the behavior is, even if some of the runtime values are unpredictable.
As a more concrete example, most languages these days clearly define what happens in signed integer overflow: the thing that you expect to happen in any two's complement machine (char)127 + (char)1 == -128. C treats this as undefined behavior, and as mentioned in my link above that can cause what should be a finite loop (with or without overflow) to compile as an infinite loop. This "optimization" step by the compiler should never have happened. C resists changing this because C compilers exist for non-twos-complement architectures where the behavior would be different. IMHO the correct approach would be to require THOSE weird esoteric architectures to compile in extra overflow checks (possibly disabled with opt-in compiler flags that explicitly violate the standard), rather than burden every C developer everywhere with the mess that is signed arithmetic overflow UB.
It's a matter of expectations. Any but the most junior programmers expect that signed integer overflow will not be portable. That's fine. But they expect a sensible result (e.g. wrap around on two's complement machines), even if it is non-portable. They don't expect the compiler to silently and sneakily change the logic of their program to something fundamentally different, because UB means the compiler can do whatever tf it wants.
But that's exactly the point. The "bug" of RowHammer was that it occurred slightly on the "allowed" side of the envelope, at acceptably-low refresh rates. The "UB" of RowHammer and a hundred other observable effects is that, on the "disallowed" side of the envelope, the behavior is undefined. The system designer gets to choose at what probability they are on each side of the envelope, and the trade-offs are very much optimization opportunities.
Writing software in C that may exhibit undefined behavior is exactly this -- it's choosing, as a software engineer, to be on the far side of the specification envelope. In exchange, you get access to several powerful optimizations, some at the compiler level, and some at the career level (if you think that not needing to learn to use understand your language properly is at time optimization, at least).
There most definitely is.
In the ARM documentation this is referred to as “UNPREDICTABLE”. The outcome is not defined. It may work. It may not. It may put garbage data in a register.
https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-...
UB is about declaring programs invalid with a snide "don't do that", not about incorrect execution due to an incorrect specification. E.g. speculative execution running in privileged mode due to a prior syscall is just a plain hardware bug. It's not undefined behavior. In fact, the bug in question is extremely well defined.
The closest thing to reading undefined behavior is reading a "don't care" or VHDL's 'U' value from a std_ulogic and even those are well defined in simulation, just not in physical hardware, but even there they can only ever be as bad as reading a garbage value. Since a lot of the hardware design is non-programmable, there is also usually no way to exploit it.
In short, the moment you enable integer-to-pointer conversions (assuming your target has a flat address space), you create pointer provenance problems whose only resolution is that some things have to be UB.
It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained. E.g you might get a memory access error. UB is, in principle, completely unlimited in downside. And because of that, it often interacts badly with optimization passes, resulting in very strange bugs.
> It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained.
... that is incoherent precisely because the interactions of provenance with optimizations isn't boxed and contained.
It's really not until people started putting the model into formal semantics that they realized "hey, wait a second, this means either most of our optimizations are wrong or our semantics are wrong" and the consensus is that it was the semantics that were broken.
it kills _a lot_ of optimizations leading to problematic perf. degredation
TL;DR: always freezing I/O buffers => yes no issues (in general); freezing all primitives => perf problem
(at lest in practice in theory many might still be possible but with a way higher analysis compute cost (like exponential higher) and potentially needing more high level information (so bad luck C)).
still for I/O buffers of primitive enough types `frozen` is basically always just fine (I also vaguely remember some discussion about some people more involved into rust core development to probably wanting to add some functionality like that, so it might still happen).
To illustrate why frozen I/O buffers are just fin: Some systems do already anyway always (zero or rand) initialize all their I/O buffers. And a lot of systems reuse I/O buffers, they init them once on startup and then just continuously re-use them. And some OS setups do (zero or rand) initialize all OS memory allocations (through that is for the OS granting more memory to your in process memory allocator, not for every lang specific alloc call, and it doesn't remove UB for stack or register values at all (nor for various stations related to heap values either)).
So doing much more "costly" things then just freezing them is pretty much normal for I/O buffers.
Through as mentioned, sometimes things are not frozen undefined on a hardware level (things like every read might return different values). It's a bit of a niche issue you probably won't run into wrt. I/O buffers and I'm not sure how common it is on modern hardware, but still a thing.
But freezing primitives which majorly affect control flows is both making some optimizations impossible and other much harder to compute/check/find, potentially to a point where it's not viable anymore.
This can involve (as in freezing can prevent) some forms of dead code elimination, some forms of inlining+unrolling+const propagation etc.. This is mostly (but not exclusively) for micro optimizations but micro optimizations which sum up and accumulate leading to (potentially but not always) major performance regressions. Frozen also has some subtle interactions with floats and their different NaN values (can be a problem especially wrt. signaling NaNs).
Through I'm wondering if a different C/C++ where arrays of primitives are always treated as frozen (and no signaling NaNs) would have worked just fine without any noticeable perf. drawback. And if so, if rust should adopt this...
I've implemented what TFA calls the "double cursor" design for buffers at $dayjob, ie an underlying (ref-counted) [MaybeUninit<u8>] with two indices to track the filled, initialized and unfilled regions, plus API to split the buffer into two non-overlapping handles, etc. It certainly required wrangling with UnsafeCell in non-trivial ways to make miri happy, but it doesn't have any less performance than the equivalent C code that just dealt with uint8_t* would've had.
EDIT: adding init_len() isn't good enough, we'd need an ensure_init() method too that returns a &mut [u8], and with those we could impl Buffer for BorrowedCursor, but if Read::read_buf() took this modified Buffer trait the default impl would be very inefficient when using a &mut [MaybeUninit<u8>] and that becomes a performance footgun.
Some people simply aren't comfortable with it.
Currently sound Rust code does not depend on the value of uninitialized memory whatsoever. Adding `freeze` means that it can. A vulnerability similar to heartbleed to expose secrets from free'd memory is impossible in sound Rust code without `freeze`, but theoretically possible with `freeze`.
Whether you consider this a realistic issue or not likely determines your stance on `freeze`. I personally don't think it's a big deal and have several algorithms which are fundamentally being slowed down by the lack of `freeze`, so I'd love it if we added it.
but I guess this isn't just about I/O buffers ;)
Arguably, the existence of "asm!() freeze" has already broken this idea. Of course, you nominally don't get any guarantees about the stability of data that asm!() code reads from uninitialized bytes, yet you can do it nonetheless.
And it's not like it's practical to say "asm!() code is always unsound if it uses uninitialized bytes like they're numbers!", since lots of it does useful stuff like communicating with the kernel with structs that get serialized, and it can also open up interfaces like mmap() which translate possibly-uninitialized virtual-memory bytes into definite kernel bytes.
Not to mention /proc/self/mem and similar kernel-provided debugging utilities that can peek into memory as serialized data.
Some people--especially when those people are closest to the workings of the operational semantics--not being comfortable is a sign that it is in fact harder than it looks.
The problems with "freeze" are in the same vein as integer-to-pointer semantics: it's a change which turns out to have implications for things not closely related to the operation itself, giving it a spooky-action-at-a-distance effect that is hard to tame.
The deeper issue is that, while there is clearly a use for some sort of "garbage value" semantics in a high-level language (that supports things like uninitialized I/O buffers, excessive reads for vectorization, padding bytes within structures), it's not clear which of the subtly different variants of garbage value works the best for all of the use cases.
Abstractions like ReadBuf allow safe code to efficiently work with uninitialized buffers without risking exposure of random memory contents.
Go has similar characteristics.
I thought Rust was supposed to be a systems language?
In Rust you can mostly just wrap the C library as a crate and the end developer doesn't need to care very much. If you want to use their system headers then you need to let them know, but that's not an issue if the headers are bundled. Cargo knows how to absorbs any C-built object files into the final binary and it just works.
In Go, in every scenario, the end developer needs to go through the effort of managing all the necessary the cgo calls when any of their dependencies use C libraries, the whole way up the dependency tree. It's really really not supported.
Safe abstractions for dealing with uninitialized memory efficiently are important in very niche scenarios to get optimal code out of the compiler and for reducing the ability to make a mistake when writing such code.
Reaching for C to do this is an emotional overreaction instead of calmly dealing with a small corner case that already has workarounds even if it does involve using unsafe
$ cat bigbuf.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEFINITELY_BIG_ENOUGH 2U*1024*1024*1024
int main(int nargs, char ** args)
{
char * definitely_big_enough = malloc(DEFINITELY_BIG_ENOUGH);
if (nargs > 1)
{
memset(definitely_big_enough,0, DEFINITELY_BIG_ENOUGH);
}
sprintf(definitely_big_enough,"%s",args[0]);
return 0;
}
$ /usr/bin/time ./bigbuf
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 1356maxresident)k
0inputs+0outputs (0major+80minor)pagefaults 0swaps
$ /usr/bin/time ./bigbuf 1
0.05user 1.25system 0:01.31elapsed 99%CPU (0avgtext+0avgdata 2098468maxresident)k
0inputs+0outputs (0major+524369minor)pagefaults 0swaps
YMMV on different operating systems. Of course this is a program only an idiot would write, but things like caches are often significantly bigger than the median case, especially on Linux where you know there is overcommit.People should really read more on safety semantics in Rust before making comments like this, it's quite annoying to bump into surface level misunderstandings everytime Rust is mentioned somewhere.
Safe abstractions are created out of unsafe operations by making sure it's impossible to violate the operations' preconditions. Either by checking at runtime and returning an error or aborting the program if they're violated, or by using the type system and borrow checker to verify them at compile time (writing the code such that any program that could violate the preconditions must have a type error.)
If Rust didn't have unsafe, the only way to access the underlying unsafe operations would be dropping down to C/C++/Assembly, or hardcoding them in the compiler. This is what other languages do, and it's ergonomically worse because the barrier to adding a whole new language and build system to your project is quite high.
Rust is being used and is designed to be able to be used everywhere from top of the line PCs, to servers to microcontrollers to virtual machines in the browser.
Not all tradeoffs are acceptable to everyone all of the time
int* ptr = malloc(size); if(ptr[offset] == 0) { }
The code was assuming that the value in an allocated buffer did not change.
However, it was pointed out in review that it could change with these steps:
1) The malloc allocates from a new memory page. This page is often not mapped to a physical page until written to.
2) The reads just return the default (often 0 value) as the page is not mapped.
3) Another allocation is made that is written to the same page. This maps the page to physical memory which then changes the value of the original allocation.
What could happen is that the UB in that code could result in it being compiled in a way that makes the comparison non-deterministic.
(*): ... or alternatively, we're not talking about regular userspace program but a higher privilege layer that is doing direct unpaged access, but I assume that's not the case since you're talking about malloc.
The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
He explicitly states 128byte filename allocates 129 bytes. https://www.youtube.com/watch?v=kPR8h4-qZdk&t=1417s
Some people suggest that maybe Facebook runs with MAP_UNINITIALIZED
Is this so inefficient? If your code is very sensitive to IO throughput, then it seems preferable to re-use buffers and pay the initialization once at startup.
Some years ago, I needed a buffer like this and one didn't exist, so I wrote one: https://crates.io/crates/fixed-buffer . I like that it's a plain struct with no type parameters.
It can be. If you have large buffers (tuned for throughput) that end up fulfilling lots of small requests for whatever reason, for example. And there's always the occasional article when someone rediscovers that replacing malloc + memset with calloc can have massive performance savings thanks to zeroing by the OS only occuring on first page fault (if it ever occurs), instead of an O(N) operation on the whole buffer up front.
Which, if in the wrong loop, can quickly balloon from O(N) to O(scary).
https://github.com/PSeitz/lz4_flex/issues/147
https://github.com/rust-lang/rust/issues/117545
If I'm reading that log-log plot right, that looks like a significantly worse than 100x slowdown on 1GB data sets. Avoiding init isn't the only solution, of course, but it was a solution.
> then it seems preferable to re-use buffers
Buffer reuse may be an option, but in code with complicated buffer ownership (e.g. transfering between threads, with the thread of origination not necessarily sticking around, etc.), one of the sanest methods of re-use may be to return said buffer to the allocator, or even OS.
> and pay the initialization once at startup.
Possibly a great option for long lived processes, possibly a terrible one for something you spawn via xargs.
• Uninitialized bytes are not just some garbage random values, they're a safety risk. Heartbleed merely exposed unitialized buffers. Uninit buffers can contain secrets, keys, and pointers that help defeat ASLR and other mitigations. As usual, Rust sets the bar higher than "just be careful not to have this bug", and therefore the safe Rust subset requires making uninit impossible to read.
• Rust-the-language can already use uninitialized buffers efficiently. The main issue here is that the Rust standard library doesn't have APIs for I/O using custom uninitialized buffers (only for the built-in Vec, in a limited way). These are just musings how to design APIs for custom buffers to make them the most useful, ergonomic, and interoperable. It's a debate, because it could be done in several ways, with or without additions to the language.
Only when read. Writing to "uninitialized" memory[1] and reading it back is provably secure[2], but doesn't work in safe Rust as it stands. The linked article is a proposal to address that via some extra complexity that I guess sounds worth it.
[1] e.g. using it as the target of a read() syscall
[2] Because it's obviously isomorphic to "initialization"
There are fun edge cases here. Writing to memory through `&mut T` makes it initialized for T, but its padding bytes become de-initialized (that's because the write can be a memcpy that also copies the padding bytes from a source that never initialized them).
90s_dev•1mo ago
> Another thing is the difficulty of using uninitialized data in Rust. I do understand that this involves an attribute in clang which can then perform quite drastic optimizations based on it, but this makes my life as a programmer kind of difficult at times. When it comes to `MaybeUninit`, or the previous `mem::uninit()`, I feel like the complexity of compiler engineering is leaking into the programming language itself and I'd like to be shielded from that if possible. At the end of the day, what I'd love to do is declare an array in Rust, assign it no value, `read()` into it, and magically reading from said array is safe. That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does. [https://news.ycombinator.com/item?id=44036021]
electrograv•1mo ago
UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.
mk12•1mo ago
Arnavion•1mo ago
It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.
nemothekid•1mo ago
vgatherps•1mo ago
It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.
Arnavion•1mo ago
vgatherps•1mo ago
nemothekid•1mo ago
Arnavion•1mo ago
nemothekid•1mo ago
Arnavion•1mo ago
eptcyka•1mo ago
vlovich123•1mo ago
eptcyka•1mo ago
vlovich123•1mo ago
UB is weird and valgrind is not a tool for detecting UB. For that you want Miri or UBSAN. Valgrind’s equivalent is ASAN and MSAN which catch UB issues incidentally in some rare cases and not necessarily where the UB actually happened.
ironhaven•1mo ago
NobodyNada•1mo ago
Currently, the team is leaning in the direction of not requiring recursive validity for references. This would mean your code is not language UB as long as you can assume `set_len` and `copy_to_slice` never read from 'data`. However, it's still considered library UB, as this assumption is not documented or specified anywhere and is not guaranteed -- changes to safe code in your program or in the standard library can turn this into language UB, so by doing something like this you're writing fragile code that gives up a lot of Rust's safety by design.
bombela•1mo ago
And from the doc:
> This implementation is specialized for slice iterators, where it uses copy_from_slice to append the entire slice at once.
Of course this trivial example could also be written as:
uecker•1mo ago
codeflo•1mo ago
There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.
mk12•1mo ago
o11c•1mo ago
Most other UBs related to datums that you think you can do something with.
usefulcat•1mo ago
It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.
lhecker•1mo ago
If I write the equivalent code in Rust I may write
The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead: ...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.
Edit: Also, what usefulcat said.
ninkendo•1mo ago
Something like:
(Honest question, actually… because the above may be impossible to write and I’m on my phone and can’t try it.)Edit: it works: https://play.rust-lang.org/?version=stable&mode=debug&editio...
lhecker•1mo ago
Edit: Also, I believe your code would fail my second section, as the `convert` function would have difficulty accepting a `[u8]` slice. Converting `[u8]` to `[MaybeUninit<u8>]` is not safe per se.
ninkendo•1mo ago
But I don’t think this is really a shortcoming, so much as a simple consequence of strong typing. If you want take “whatever” as a parameter, you have to spell out the types that satisfy it, whether it’s via a trait, or an enum with specific variants, etc. You don’t get to just cast things to void and hope for the best, and still call the result safe.
ii41•1mo ago
IIRC it's not that hard to convince the compiler to give you a safe buffer from a MaybeUninit. However, this type has really lengthy docs and makes you question everything you do with it. Thinking through all this is painful but it's not like you don't have to it with C.
jcranmer•1mo ago
And the most aggravating part of all of this is that the most common use case for uninitialized memory (the scenario being talked about both in the article here and the discussion you quote) is actually pretty easy to have a reasonable, safe abstraction for, so the fact that the current options requires both use of unsafe code and also potentially faulty duplication of value calculations doesn't make for a fun experience. (Also, the I/O traits predate MaybeUninit, which means the most common place to want to work with uninitialized memory is one where you can't do it properly.)
90s_dev•1mo ago
lhecker•1mo ago
Personally, I also like the simpler approach overall, compared to the `BorrowedBuf` trait, for the same reasons outlined in the article.
While this possibly solves parts of pain points that I had, what I meant to write is that in an ideal world I could write Rust while mostly not thinking about this issue much, if at all. Even with this approach, I'd still need to decide whether my API needs to take a `[u8]` or a `Buffer`, just in the mere off-chance that a caller may want to pass an uninitialized array further up in the call chain. This then requires making the call path generic for the buffer parameter which may end up duplicating any of the functions along the path, even though that's not really my intention by marking it as `Buffer`.
I think if there was a way to modify Rust so we can boldly state in writing "You may cast a `[MaybeUninit<T>]` into a `[T]` and pass it into a call _if_ you're absolutely certain that nothing reads from the slice", it would already go a long way. It may not make this more comfortable yet, but it would definitely take off a large part of my worries when writing such unsafe casts. That's basically what I meant with "occupy my mind": It's not that I wouldn't think about it at all, rather it just wouldn't be a larger concern for me anymore, for code where I know for sure that this requirement is fulfilled (i.e. similar to how I know it when writing equivalent C code).
Edit: jcranmer's suggestion of write-only references would solve this, I think? https://news.ycombinator.com/item?id=44048450
[^1]: This is of course not a problem for a simple `read` syscall, but may be an issue for more complex functions, e.g. the UTF8 <> UTF16 converter API I suggested elsewhere in this thread, particularly if it's accelerated, the way simdutf is.