Sometimes because C is lingua franca of low level.
Some noalias optimizations Rust had didn't work in LLVM, because no one bothered using it before in C.
This goes even further to hardware. C-like null terminated string search SIMD is faster than a saner (pointer + len ) string view. So it's faster to append null to end of string view than to invoke the SIMD on string slice.
C standards with the 'restrict' keyword to allow aliasing optimisations have been existing longer than Rust. LLVM just never bothered, despite the early claims that all the intermediate language and "more modern" compiler would enable more optimisation potential. LLVM is still slower than GCC, even in plain C.
Where is the problem to quickly null-terminate a pointer+len string and use the quick null-terminated SIMD search? Should only be 1 move slower than the native C. "Space after that string" shouldn't be a problem, since in higher-level languages, allocations can make enough room, it is higher-level after all (and you need null-termination for syscalls anyways).
It is always the same story. Compiler and programming language people make claims about future optimisations but never ever follow through. It's always just theory, never implementation.
They did bother but no one seemed to be using it, so a bug snuck in. It took Rust exercising that corner of LLVM to find the bug.
> LLVM is still slower than GCC, even in plain C.
Citation needed. Last time I checked LLVM beat GCC on -O3. Unless you mean compilation performance.
> Where is the problem to quickly null-terminate a pointer+len string and use the quick null-terminated SIMD search?
Why should two nearly idenfical operations have such wildly different performance? And why isn't the safer/saner interface more performant?
My point if compilers and hardware are optimizing for C, it's no surprise no one can approach its speed.
It's like asking why when everyone is optimizing for Chrome no other web browser can approach its speed.
> Citation needed. Last time I checked LLVM beat GCC on -O3.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://www.phoronix.com/review/gcc-clang-eoy2023/8
There is tons more, just ask your favourite internet search engine.
> Why should two nearly idenfical operations have such wildly different performance? And why isn't the safer/saner interface more performant?
Because in hardware, checking a zero-flag on a register is very cheap. Big NAND over all bits of a sub-register, big OR over all those flags for the whole SIMD register. Very short path, very few transistors.
Checking a length is expensive: Increment a length register through addition, including a carry, which is a long path. Compare with another register to out if you are at the last iteration. Compare usually is also expensive since it is subtract with carry internally, even though you could get away with xor + zero flag. But you can't get around the length register addition.
There can be an optimisation because you do have to increment the address you fetch from in any case. But in most modern CPUs, that happens in special pointer registers internally, if you do additional arithmetics on those it'll be expensive again. x86 actually has more complex addressing modes that handle those, but compilers never really used them, so those were dropped for SIMD.
Then Rust made it trivial to use correctly, and found that what LLVM did have was quite buggy, because it hadn’t been exercised. These were bugs that could in theory have been found in C/C++, but in practice never would have been. And so for quite some years Rust would toggle the noalias optimisations on and off, depending on whether there were any known bad-codegen issues at the time. I think the last toggle was a few years ago by now, this stuff is finally actually stable and appreciated.
My recollection is that the figures for the Rust compiler are in the vicinity of a 5% improvement from emitting noalias in the LLVM IR.
And it’s likely that quite a bit more is possible.
And here we are, 20 years later.
Especially profile-guided-optimization was hailed as the next big thing, only JIT-ed languages were the ones to be fast, because after some warmup time they would adapt to your data and hardware. Java is still dog-slow...
And in particular that garbage collection can be faster than malloc/free.
This is technically true in the case you have a naive C program. There are tons of C programs out there, and some are slow. (But there are more slow Java programs out there)
But it's 100% clear now that GC has a significant cost, and while you can optimize a bad C program, it can be very hard in many cases to optimize the cost of GC away.
C offers more freedom and thus more footguns, but the freedom also means you don't really have a performance cap.
And I prefer GC -- most programs I write will have GC; I just recognize the performance cap, and design accordingly. (In fact I wrote a GC for a subset of C++ for https://oils.pub -- I actually think this is a language with nicer performance properties than Java, mainy due to being AOT)
To this day, Java applications are the slowest and most memory hungry long-running server applications by far. Only some scripting languages are worse, but only by performance, almost never by memory.
On the other hand, the results of this true fact are already seen in practice in real systems. Rust's iterator adapters, for example, allow high level "functional style" with closures to be compiled to code that is exactly the same as hand rolled C loops without unamortized bounds checks - despite guaranteeing the absence of out of bound accesses even in the face of programmer error - because the constraints of these interfaces and the design of the language enable tons of optimizations by LLVM. This has been true since Rust 1.0 more than a decade ago, it's not a promise of the future.
https://www.w3.org/DesignIssues/Principles.html#PLP
By restricting the power of a language, you enable it to be used in more ways than just “execute it and see what the result is”.
That it's supposedly easier might be true. But that usually doesn't lead to someone really doing it.
It is the most easily optimized language I know. If you write a working program in it, you know you're doing it in the most optimal way for that language already.
greatgib•3h ago
But easier to optimize doesn't necessarily means that they could be optimized to be more performant than less constrained languages.