[0] tldr: "I think that there are so many variables that it is difficult to draw generalized conclusions."
also, if performance is critical to you, profile stuff and compare outputted assembly, more often than not you'll find that llvm just outputs the same thing in both cases
It's part of the ABI spec. It's true that C evolved in an ad hoc way and so the formal rigor got spread around to a bunch of different stakeholders. It's not true that C is a lawless wasteland where all behavior is subject to capricious and random whims, which is an attitude I see a lot in some communities.
People write low level software to deal with memory layout and alignment every day in C, have for fourty years, and aren't stopping any time soon.
[1] https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf section 6.7.3.2, paragraph 17.
See "6.7.3.2 Structure and union specifiers", paragraph 16 & 17:
> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
> Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
... well, that's what I get for reading an article with a silly title.
This is because C does so little for you -- bounds checking must be done explicitly for instance, like you mention in the article, so C is "faster" unless you work around rust's bounds checking. It reminds me of some West Virginia residents I know who are very proud of how low their taxes are -- the roads are falling apart, but the taxes are very low! C is this way too.
C is pretty optimally fast in the trivial case, but once you add bounds checking and error handling and memory management its edge is much much smaller (for Rust and Zig and other lowish-level languages)
In c the callers isn’t choosing typically. The author of some library or api decides this for you.
This turns out to be fairly significant in something like an embedded context where function pointers kill icache and rob cycles jumping through hoops. Say you want to bit bang a bus protocol using GPIO, in C with function pointers this adds maybe non trivial overhead and your abstraction is no longer (never was) free. Traits let the caller decide to monomorphize that code and get effectively register reads and writes inlined while still having an abstract interface to GPIO. This is excellent!
Tbf this applies to Rust too. If the author writes
fn foo(bar: Box<dyn BarTrait>)
they have forced the caller into dynamic dispatch.Had they written
fn foo(bar: impl BarTrait)
the choice would've remained open to the caller fn foo(bar: impl BarTrait)
and AFAIK it isn't possible to write that in C (though C++ does allow this kind of thing).In the extreme, you surely wouldn't accept a 1 day or even 1 week build time for example? It seems like that could be possible and not hypothetical for a 1 week build since a system could fuzz over candidate compilation, and run load tests and do PGO and deliver something better. But even if runtime performance was so important that you had such a system, it's obvious you wouldn't ever have developer cycles that take a week to compile.
Build time also even does matter for release: if you have a critical bug in production and need to ship the fix, a 1 hour build time can still lose you a lot here. Release build time doesn't matter until it does.
Folks have worked tirelessly to improve the speed of the Rust compiler, and it's gotten significantly faster over time. However, there are also language-level reasons why it can take longer to compile than other languages, though the initial guess of "because of the safety checks" is not one of them, those are quite fast.
> How slow are we talking here?
It really depends on a large number of factors. I think saying "roughly like C++" isn't totally unfair, though again, it really depends.
Note that C++ also has almost as large problem with compile times with large build fanouts including on templates, and it's not always realistic for incremental builds to solve either especially time burnt on linking, e.g. I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.
Someone down the line might be wondering why suddenly their Rust builds take 4x the time after merging something, and just maybe remembering this offhand comment will make them find the issue faster :)
Rust does make it a lot easier to use generics which is likely why using more traits appears to be the cause of longer build times. I think it's just more that the more traits you have, the more likely you are to stumble over some generic code which ultimately generates more code.
I'd usually rather have a nice language-level interface for customizing implementation, but ELF and Linux scripting is typically good enough. Binary patching is in a much easier to use place these days with good free tooling and plenty of (admittedly exploit-oriented) tutorials to extrapolate from as examples.
I'd say most people use this definition, with the caveat that there's no official "average programmer", and everyone has different standards.
If you do hand optimize your code, all bets are off. With both languages. But I think the notion that the Rust compiler has more context for optimizing than the C compiler is maybe not as controversial as the notion that language X is better/faster than language Y. Ultimately, producing fast/optimal code in C kind of is the whole point of C. And there aren't really any hacks you can do in C that you can't do in Rust, or vice versa. So, it would be hard to make the case that Rust is slower than C or the other way around.
However, there have been a few rewrites of popular unix tools in Rust that benchmark a bit faster than their C equivalents. Could those be optimized in C. Probably; but they just haven't. But there is a case there of arguing that maybe Rust code is a bit easier to make fast than C code.
Where C application code often suffers, but by no means always, is the use of memory for data structures. A nice big chunk of static memory will make a function fast, but I’ve seen many C routines malloc memory, do a strcpy, compute a bit, and free it at the end, over and over, because there’s no convenient place to retain the state. There are no vectors, no hash maps, no crates.io and cargo to add a well-optimized data structure library.
It is for this reason I believe that Rust, and C++, have an advantage over C when it comes to writing fast code, because it’s much easier to drop in a good data structure. To a certain extent I think C++ has an advantage over Rust due to easier and better control over layout.
1. What costs does the language actively inject into a program?
2. What optimizations does the language facilitate?
Most of the time, it's sufficient to just think about the first point. C and Rust are faster than Python and Javascript because the dynamic nature of the latter two requires implementations to inject runtime checks all over the place to enable that dynamism. Rust and C simply inject essentially zero active runtime checks, so membership in this club is easy to verify.
The second one is where we get bogged down, because drawing clean conclusions is complicated by the (possibly theoretical) existence of optimizing compilers that can leverage the optimizability inherent to the language, as well as the inherent fragility of such optimizations in practice. This is where we find ourselves saying things like "well Rust could have an advantage over C, since it frequently has more precise and comprehensive aliasing information to pass to the optimizer", though measuring this benefit is nontrivial and it's unclear how well LLVM is thoroughly utilizing this information at present. At the same time, the enormous observed gulf between Rust in release mode (where it's as fast as C) and Rust in debug mode (when it's as slow as Ruby) shows how important this consideration is; Rust would not have achieved C speeds if it did not carefully pick abstractions that were amenable to optimization.
Speed is also not the only metric, Rust and C enable much better control over memory usage. In general, it is easier to write a memory-efficient program in Rust or C than it is in JS.
It's also interesting to think about this in terms of the "zero cost abstractions"/"zero overhead abstractions" idea, which Stroustrup wrote as "What you don't use, you don't pay for. What you do use, you couldn't hand code any better". The first sentence is about 1, and the second one is about what you're able to do with 2.
That is, most of the time, most of the users aren't thinking about how to squeeze the last tenth of a percent of speed out of it. They aren't thinking about speed at all. They're thinking about writing code that works at all, and that hopefully doesn't crash too often. How fast is the language for them? Does it nudge them toward faster code, or slower? Are the default, idiomatic ways of writing things the fast way, or the slow way?
That is a damn good reason to choose Rust over C++, even if the Rust implementation of the "same" thing should be a bit slower.
Rust does have some interesting features, which restrict what you are allowed to do and thus make some things impossible but in turn make other things easier. It is highly likely that those restrictions are part of what made this possible. Given infinite resources (which you never have) a C++ implementation could be faster because it has better shared data concepts - but those same shared data concepts make it extremely hard to reason about multi-threaded code and so humanly you might not be able to make it work.
In short, the previous two attempts were done by completely different groups of different people, a few years apart. Your direct question about if direct wisdom from these two attempts was shared, either between them, or used by Stylo, isn't specifically discussed though.
> a C++ implementation could be faster because it has better shared data concepts
What concepts are those?
I don't think a language should count as "fast" if it takes an expert or an inordinate amount of time to get good performance, because most code won't have that.
So on those grounds I would say Rust probably is faster than C, because it makes it much much easier to use multithreading and more optimised libraries. For example a lot of C code uses linked lists because they're easy to write in C, even when a vector would be faster and more appropriate. Multithreading can just be a one line change in Rust.
Let's say they only need 2 hour to get the <X> to work, and can use the remaining 6 hours for optimizing. Can 6 hours of optimizing a Python program make it faster than the assembly program?
The answer isn't obvious, and certainly depends on the specific <X>. I can imagine various <X> where even unlimited time spent optimizing Python code won't produce faster results than the assembly code, unless you drop into C/C++/Zig/Rust/D and write a native Python extension (and of course, at that point, you're not comparing against Python but that native language).
However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).
Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.
Then, I raise you to Zig which has unsigned integer overflow being UB.
Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.
https://doc.rust-lang.org/std/intrinsics/fn.unchecked_add.ht...
C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.
"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().
On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.
Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.
Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.
I'm very happy to see the nuanced take in this article, slowly deconstructing the implicit assumptions proposed by the person asking this question, to arrive at the same conclusion that I long have. I hope this post reaches the right people.
A particular language doesn't have a "speed", a particular implementation may have, and the language may have properties that make it difficult to make a fast implementation (of those specific properties/features) given the constraints of our current computer architectures. Even then, there's usually too many variables to make a generalized statement, and the question often presumes that performance is measured as total cpu time.
The big one is multi-threading. In Rust, whether you use threads or not, all globals must be thread-safe, and the borrow checker requires memory access to be shared XOR mutable. When writing single-threaded code takes 90% of effort of writing multi-threaded one, Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement. In C, the cost/benefit analysis is different. Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp. Risk of having to debug heisenbugs means that code typically won't be made multi-threaded unless really necessary, and even then preferably kept to simple cases or very coarse-grained splits.
I agree that it has no meaning. Speed(language) is undefined, therefore there is no faster language.
I get this often because python is referred to as a slow language, but since a python programmer can write more features than a C programmer in the same time, at least in my space, it causes faster programs in python, because some of those features are optimizations.
Now speed(program(language,programmer)) is defined, and you could do an experiment by having programmers of different languages write the same program and compare its execution times.
If you don't know at compile time then you should probably just have the check anyway (in the C code also).
gignico•1h ago
bluGill•1h ago
pornel•51m ago
steveklabnik•50m ago
The first is, we do have some amount of empirical evidence here: Rust had to turn its aliasing optimizations on and off again a few times due to bugs in LLVM. A comment from 2021: https://github.com/rust-lang/rust/issues/54878#issuecomment-...
> When noalias annotations were first disabled in 2015 it resulted in between 0-5% increased runtime in various benchmarks.
This leaves us with a few relevant questions:
Were those benchmarks representative of real world code? (They're not linked, so we cannot know. The author is reliable, as far as I'm concerned, but we have no way to verify this off-hand comment directly, I link to it specifically because I'd take the author at their word. They do not make any claim about this, specifically.)
Those benchmarks are for Rust code with optimizations turned off and back on again, not Rust code vs C code. Does that make this a good benchmark of the question, or a bad one?
These were llvm's 'noalias' markers, which were written for `restrict` in C. Do those semantics actually take full advantage of Rust's aliasing model, or not? Could a compiler which implements these optimizations in a different way do better? (I'm actually not fully sure of the latest here, and I suspect some corners would be relying on the stacked borrows vs tree borrows stuff being finalized)
Karliss•38m ago
There are 2 main differences between versions with and without strict aliasing. Without strict aliasing compiler can't assume that the result accumulator doesn't change during the loop and it has to repeatedly read/write it each iteration. With strict aliasing it can just read it to register, do the looping and write the result back at the end once. Second effect is that with strict aliasing enabled compiler can vectorize the loop processing 4 floats at the same time, most likely the same uncertainty of counter prevents vecotorization without strict aliasing.
If you want something slightly simpler example you can disable vectorization by adding '-fno-tree-vectorize'. With it disabled there is still difference in handling of counter.
Using restrict pointers and multiple same type input arrays it would probably be possible to make something closer to real world example.
steveklabnik•34m ago
Also note that C++ does not have restrict, formally speaking, though it is a common compiler extension. It's a C feature only!