A simpler alternative is to compile the program with LTO. We confirmed that LLVM’s inter-procedural analyses can propagate both alignment and dereferenceability information for this function, which allows the LTO build to recover the performance loss.
"can" is doing a lot of heavy-lifting here. Guaranteeing expected optimizations "will" be applied are hard-enough, without leaving it entirely to an easily-derailed indirect side-effect.Relying on LTO to "discover" this annotation through interprocedural analysis -- based on my experience of looking at LTO in practice -- will not be as comprehensive, and even when it works it accomplishes its task in an achingly-slow and expensive way.
This is a real devil-is-in-the-details case.
int f() {
int arr[3], i = 0;
arr[3] = 5;
return i;
}
Optimizing this to "return 0" is relying on UB, because it's assuming that i wasn't laid out directly after arr in the stack frame. I believe this is what the paper calls "non-guardable UB".I don't agree with the claim in the paper that their semantics offers a "flat memory model". A flat memory model would rule out the optimization above. Rather, the memory model still has the notion of object bounds; it's just simplified in some ways.
int g(int n) {
int arr[3], i = 0;
arr[n] = 5;
return i;
}
Without "exploiting UB" it's incorrect to optimize this to "return 0", because of the possibility that i was allocated right after arr and n == 3.If you could somehow "stop exploiting UB" they'd just be angry either that you're still exploiting an actual language requirement they don't like and so have decided ought to be excluded or that you followed the rules too literally and obviously the thing they meant ought to happen even though that's not what they actually wrote. It's lose-lose for compiler vendors.
I agree that some of us are unreasonable, but I do recognize that DWIM is not feasible.
I just want compilers to treat UB the same as unspecified behavior, which cannot be assumed away.
Rust's safe subset doesn't have any UB, but the unsafe Rust can of course cause UB very easily, because the rules in Rust are extremely strict and only the safe Rust gets to have the compiler ensure it doesn't break the rules. So it seems weird for people who work on the compiler guts to be "surprised".
You may say I'm "exploiting ub" when making these deductions but I would disagree. My argument is not of the form "x is ub so I can do whatever I want".
To elaborate on the example, if arr was volatile then I would expect the write to always go through. And if i was volatile then I would expect i to always be read. However it's still not guaranteed that i is stored immediately after arr, as the compiler has some discretion about where to place variables afaik. But if i is indeed put immediately after, then the function should indeed return 5 for n=3. For n>3 it should either return 0 (if writing to unused stack memory), page fault (for small n outside of the valid stack space), or stomp on random memory (for unlucky and large n). For negative n, many bad things are likely to happen.
Edit: I think I mixed up which way the stack grows but yeah.
That's not how compilers work. The optimization changing `return i;` into `return 0;` happens long before the compiler determines the stack layout.
In this case, because `return i;` was the only use of `i`, the optimization allows deleting the variable `i` altogether, so it doesn't end up anywhere on the stack. This creates a situation where the optimization only looks valid in the simple "flat memory model" because it was performed; if the variable `i` hadn't been optimized out, it would have been placed directly after `arr` (at least in this case: https://godbolt.org/z/df4dhzT5a), so the optimization would have been invalid.
There's no infrastructure in any compiler that I know of that would track "an optimization assumed arr[3] does not alias i, so a later stage must take care not to place i at that specific point on the stack". Indeed, if array index was a runtime value, the compiler would be prevented from ever spilling to the stack any variable that was involved in any optimizations.
So I think your general idea "the allowable behaviors of an out-of-bounds write is specified by the possible actual behaviors in a simple flat memory model for various different stack layouts" could work as a mathematical model as an alternative to UB-based specifications, but it would end up not being workable for actual optimizing compiler implementations -- unless the compiler could guarantee that a variable can always stay in a register and will never be spilled (how would the compiler do that for functions calls?), it'd have to essentially treat all variables as potentially-modified by basically any store-via-pointer, which would essentially disable all optimizations.
The compiler isn't "assuming" that so much as choosing not to put i in the stack frame at all. And I don't think it's correct to view the lack of a register spill as an "optimization" per se. It does remain true that code writing past the end of an array will be UB in typical scenarios (i.e. when not using asan/valgrind).
(Now, if the compiler also removed the store, that could legitimately be called an optimization based on assuming no-UB)
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0
mov dword ptr [rbp - 4], 5
mov eax, dword ptr [rbp - 4]
pop rbp
ret
Which indeed returns 5. But at -O2 clang optimizes it to this: xor eax, eax
ret
Which returns 0. So the simple semantics produces one result, and the complex semantics produces another. Hence, it's exploiting undefined behavior. printf("A");
bool x;
if ( x ) {printf("B");} else {printf("C");}
printf("\n");
If at -O0 "AB" is printed and at -O2 "AC" is printed (due to the vagaries of whatever was left on the stack), then that would meet your definition, but I would not regard that as "exploiting undefined behavior", merely as the manifestation of the inherent unpredictability of UB. If the compiler didn't print anything (i.e. the whole block was removed due to UB detection) then that _would_ be a case of exploiting undefined behavior.Here the compiler "register allocates" i for some reads but not for others.
i gets stack allocated, but some uses of it act as though they were register allocated.
In your provided snippet, the correctness argument for the assembly in a non-UB-reasoning universe goes like this: at first, i is stored privately on the stack with value zero, and so as an optimization we can assume that value is still zero without rereading. Only later, when &i is taken, is that memory made public and the compiler has to worry about something altering it. In actual execution, the problem is that the write function alters compiler-private memory (and note again, that being private is a property of the underlying address, not the fact that it's accessed via an out-of-bounds array indexing), and this is UB and so the program breaks. But, the compiler didn't need to make _assumptions_ around UB.
As you note, it doesn't remove the more core UB behaviors in LLVM, in particular LLVM's reliance on pointer provenance.
So I think this option very roughly approximates a kind of "no provenance, but with address non-determinism" model, which still permits optimizations like SROA on non-escaping objects.
Also, hi, didn't know you commented on this site.
Another interesting thing is that there is clearly synergy between different UB. For the LTO results, disabling each individual UB seems to be either neutral or an improvement, but if you disable all of them at once, then you get a significant regression.
jonstewart•13h ago
_THANK YOU._
ryao•13h ago
These are almost never used by software.
mgaunard•13h ago
jeffbee•12h ago
com2kid•12h ago
The issue with early pgo implementations was getting a really good profile, as you had to have automation capable of fully exercising code paths that you knew would be hot in actual usage, and you needed good instrumentation to know what code paths those were!
The same problem exists now days, but programs are instrumented to hell and back to collect usage data.
jeffbee•12h ago
loeg•12h ago
jeffbee•12h ago
I think there is a valley in terms of organization size where you have tons of engineers but not enough to accomplish peak optimization of C++ projects. These are the orgs that are spending millions to operate, for example, the VERY not-optimized packages of postgresql from Ubuntu, in AWS.
spookie•4h ago
Hell, their latest upgrade broke one of their flavours. Not to mention how fragile their installer is.
UncleMeat•11h ago
jeffbee•11h ago
UncleMeat•9h ago
As a user, building with thin-lto vs full-lto generally produces pretty similar performance in no small part because a huge amount of effort has gone into making the summaries as effective as possible for key performance needs.
As a compiler developer, especially when developing static analysis warnings rather than optimization passes, the number of cases where I've run into "this would be viable if we had full-lto" has been pretty high.
pcwalton•11h ago
mgaunard•10h ago
Not exactly a problem for LTO since any reasonable build machine will have 128GB of ram.
astrange•7h ago
CPUs are so dynamic anyway that there often isn't a way to pass down the information you'd get from the profile. eg I don't think Intel actually recommends any way of hinting branch directions.
jeffbee•7h ago
saagarjha•6h ago
atq2119•6h ago
jeffbee•5h ago
"Branches that do not have a history in the BTB ... are predicted using a static prediction algorithm: Predict forward conditional branches to be NOT taken. Predict backward conditional branches to be taken."
It then goes on to recommend exactly what every optimizing compiler and post-link optimizers like BOLT do:
"Arrange code to be consistent with the static branch prediction algorithm: make the fall-through code following a conditional branch be the likely target for a branch with a forward target, and make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target."
This is why a reduction in taken forward branches is one of the key statistics that BOLT reports.
tialaramex•12h ago
UB is a runtime phenemenon, it happens, or it doesn't, and we may be able to ensure the case where it happens doesn't occur with ordinary human controls.
But IFNDR is a property of the compiled program, if you have IFNDR (by some estimates that's most C++ programs) your program has no defined behaviour and never did, so there is no possible countermeasure, too bad game over.
jandrewrogers•12h ago
yxhuvud•12h ago
jorvi•11h ago
Compilation happens once and then runs on hundreds of thousands up to billions of devices. Respect your users.
astrange•7h ago
I would recommend only doing things that fit within the 'build > text > fix' loop.
Sharlin•11h ago
pcwalton•11h ago
KerrAvon•10h ago
scott_s•3h ago
steveklabnik•11h ago
alpaca128•10h ago
steveklabnik•9h ago
vlovich123•10h ago
steveklabnik•9h ago
> false: Performs “thin local LTO” which performs “thin” LTO on the local crate only across its codegen units.
I think this is kind of confusing but whatever. I should have been more clear.
LegionMammal978•4h ago
Rusky•11h ago
They are not removing UB around things like out-of-bounds or use-after-free, which would likely be more expensive.
jonstewart•7h ago
saagarjha•6h ago
atq2119•6h ago
Also, commenting on downvotes is generally frowned upon.