Is Rust faster than C?

https://steveklabnik.com/writing/is-rust-faster-than-c/

317•vincentchau•1mo ago

Comments

gignico•1mo ago

The article does not mention the possible additional optimisation opportunities that arise in Rust code due to stricter aliasing rules of references. But I don’t have an example in mind. Does anyone know of an example of it happening in real code?

bluGill•1mo ago

Many C programs are vailid C++ and are faster when compiled with a C++ compiler because of those stricter aliasing and type rules. Like you though I have no examples.

stickynotememo•1mo ago

That seems very odd - if it's possible to make those optimisations without any additional type data then why wouldn't GCC do that anyway? The benefit of stricter type rules is that more information is available to the compiler. Using a different compiler doesn't inherently increase the amount of type information.

tcfhgj•1mo ago

theoretically the C++ compiler needs to consider things like exceptions which don't exist in C, so I'd even tend to the opposite

aw1621107•1mo ago

I believe the claim is more precisely stated as "Many C programs are valid C++ and are faster when compiled as C++" - i.e., even though the text of the program didn't change, the rules for interpreting that text changed, and it's that difference in interpretation that permits better optimizations.

teo_zero•1mo ago

Interesting concept! Any examples?

pornel•1mo ago

When the optimizer knows writes can't change the reads, it can reorder and coalesce them. The main benefit of that is enabling autovectorization in more cases. Otherwise it saves a few loads here and there.

steveklabnik•1mo ago

In the spirit of the article... there's a few ways in which this could go :)

The first is, we do have some amount of empirical evidence here: Rust had to turn its aliasing optimizations on and off again a few times due to bugs in LLVM. A comment from 2021: https://github.com/rust-lang/rust/issues/54878#issuecomment-...

> When noalias annotations were first disabled in 2015 it resulted in between 0-5% increased runtime in various benchmarks.

This leaves us with a few relevant questions:

Were those benchmarks representative of real world code? (They're not linked, so we cannot know. The author is reliable, as far as I'm concerned, but we have no way to verify this off-hand comment directly, I link to it specifically because I'd take the author at their word. They do not make any claim about this, specifically.)

Those benchmarks are for Rust code with optimizations turned off and back on again, not Rust code vs C code. Does that make this a good benchmark of the question, or a bad one?

These were llvm's 'noalias' markers, which were written for `restrict` in C. Do those semantics actually take full advantage of Rust's aliasing model, or not? Could a compiler which implements these optimizations in a different way do better? (I'm actually not fully sure of the latest here, and I suspect some corners would be relying on the stacked borrows vs tree borrows stuff being finalized)

Measter•1mo ago

Another issue we have to consider here for the measurements taken then is that it was miscompiling, which, to me, calls into question how much we can trust that performance change.

Additionally, it was 10 years ago and LLVM has changed. It could be that LLVM does better now, or it could do worse. I would actually be interested in seeing some benchmarks with modern rustc.

Karliss•1mo ago

Not exactly real world, but real code example demonstrating strict aliasing rule in action for C++. https://godbolt.org/z/WvMb34Kea Rust should have even more opportunities of this due to restrictions it has for writable references.

There are 2 main differences between versions with and without strict aliasing. Without strict aliasing compiler can't assume that the result accumulator doesn't change during the loop and it has to repeatedly read/write it each iteration. With strict aliasing it can just read it to register, do the looping and write the result back at the end once. Second effect is that with strict aliasing enabled compiler can vectorize the loop processing 4 floats at the same time, most likely the same uncertainty of counter prevents vecotorization without strict aliasing.

If you want something slightly simpler example you can disable vectorization by adding '-fno-tree-vectorize'. With it disabled there is still difference in handling of counter.

Using restrict pointers and multiple same type input arrays it would probably be possible to make something closer to real world example.

steveklabnik•1mo ago

Note that Rust does not do strict aliasing, its model is different.

Also note that C++ does not have restrict, formally speaking, though it is a common compiler extension. It's a C feature only!

Tuna-Fish•1mo ago

I believe this advantage is currently mostly theoretical, as the code ultimately gets compiled with LLVM which does not fully utilize all the additional optimization opportunities.

adgjlsfhk1•1mo ago

LLVM doesn't fully utilize all the power, but it does use an increasing amount every year. Flang and Rust have both given LLVM plenty of example code and a fair number of contributors who want to make LLVM work better for them.

senko•1mo ago

Interesting post, but read it for the journey, not the destination[0].

[0] tldr: "I think that there are so many variables that it is difficult to draw generalized conclusions."

taminka•1mo ago

struct field alignment/padding isn't part of the C spec iirc (at least not in the way mentioned in the article), but it's almost always done that way, which is important for having a stable abi

also, if performance is critical to you, profile stuff and compare outputted assembly, more often than not you'll find that llvm just outputs the same thing in both cases

ajross•1mo ago

> struct field alignment/padding isn't part of the C spec iirc

It's part of the ABI spec. It's true that C evolved in an ad hoc way and so the formal rigor got spread around to a bunch of different stakeholders. It's not true that C is a lawless wasteland where all behavior is subject to capricious and random whims, which is an attitude I see a lot in some communities.

People write low level software to deal with memory layout and alignment every day in C, have for fourty years, and aren't stopping any time soon.

cyco130•1mo ago

It is indeed part of the standard. It says "Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared"[1] which doesn't allow implementations to reorder fields, at least according to my understanding.

[1] https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf section 6.7.3.2, paragraph 17.

taminka•1mo ago

i was talking abt padding/alignment, not ordering, that's indeed not allowed you're right

steveklabnik•1mo ago

Here's the draft of C23: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

See "6.7.3.2 Structure and union specifiers", paragraph 16 & 17:

> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.

> Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.

taminka•1mo ago

so they're ordered, which i didn't dispute, but alignment is implementation defined, so it could be aligned to the biggest field (like in the article), or packed in whatever (sequential) order the particular platform demands, which was my initial point

steveklabnik•1mo ago

Ah, sorry, you're right I forgot about alignment. Yes, alignment is implementation defined, paragraph 16:

> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.

But, I still don't think that what you've said is true. This is because alignment isn't decided per-object, but per type. That bit is covered more fully in 6.2.8 Alignment of objects.

You also have to be able to take a pointer to a (non-bitfield) member, and those pointers must be aligned. This is also why __attribute__((packed)) and such are non-standard extensions.

Then again: I have not passed the C specification lawyer bar, so it is possible that I am wrong here. I'm just an armchair lawyer. :)

(but for padding, yes, that's correct.)

einpoklum•1mo ago

tl;dr: Rust officially allows you to write inline assembly so it's fast, but in C it's not officially specified as part of the language. Plus more points which do not actually indicate Rust is faster than C.

... well, that's what I get for reading an article with a silly title.

steveklabnik•1mo ago

That’s not how I would summarize what I wrote, for what it’s worth. My summary would be “the question is malformed, you need to first state what the boundaries are for comparison before you can make any conclusions.” I think this is an interesting thing to discuss because many people assume that the answer to “is x faster than C?” to be “no” for all values of X.

bigfishrunning•1mo ago

> many people assume that the answer to “is x faster than C?” to be “no” for all values of X.

This is because C does so little for you -- bounds checking must be done explicitly for instance, like you mention in the article, so C is "faster" unless you work around rust's bounds checking. It reminds me of some West Virginia residents I know who are very proud of how low their taxes are -- the roads are falling apart, but the taxes are very low! C is this way too.

C is pretty optimally fast in the trivial case, but once you add bounds checking and error handling and memory management its edge is much much smaller (for Rust and Zig and other lowish-level languages)

bluGill•1mo ago

In the real world the difference is rarely significant assuming great programmings implement great algorithms. However those two assumptions are rarely true.

sevensor•1mo ago

I read the post to see how you would answer, not because I was unclear about what the answer would be, because the only possible answer here is “sometimes.” I especially like the point that Rust can be faster because it enables you to write different things. As I never tire of getting downvoted for saying, I’ve improved the speed of a program by replacing C with Python, because nobody could figure out how to write the right thing in C. If even Python can do this, it must apply to just about every pair of languages.

anonnon•1mo ago

The article felt fairly dispassionate and even-handed to me, and I say this as someone who dislikes Klabnik very much and also dislikes the Rust community (especially its insidious, forced MIT rewrites of popular GPL software, with which they also break backwards compatibility). It is worth mentioning that there are certain things about Rust that conceivably could make it faster, e.g., const by default (theoretically facilitating certain optimizations), but in practice, thus far, do not.

avadodin•1mo ago

> especially its insidious, forced MIT rewrites of popular GPL software

Is this some sort of movement?

I was aware that some Rust software had been released under permissive licenses but I didn't know it was activism besides the obvious C-is-obsolete angle.

steveklabnik•1mo ago

It’s not deliberate activism. It’s two things:

Monomorphizarion makes the GPL weird.

Rust is dual licensed under Apache/MIT, and so most people choose the same as a default if they don’t feel strongly about licensing.

voidUpdate•1mo ago

Depends what you're doing with it... You can make any language you want slower than another language by using it badly

philipallstar•1mo ago

This is like saying that no car is faster than any other because it depends what gear you drive it in

voidUpdate•1mo ago

That's true as well. Language speed depends on how you use it

philipallstar•1mo ago

It's true but incomplete.

AlanLan•1mo ago

It’s a bit more fundamental than just "using it badly." The real tension lies in whether a language's safety invariants force a memory layout that is inherently at odds with the CPU cache hierarchy.

In low-latency systems, the true "tax" is often the loss of determinism. If I have to sacrifice a cache-friendly structure or introduce indirection just to satisfy a borrow checker's static analysis, the performance game is already lost, regardless of how "well" I use the language.

To give a concrete example: I previously built a high-frequency bridge for MT4 using a strict Modern C++ stack. I observed that after the initial warm-up, the working set actually settled from 13.6MB down to a stable 11.0MB and stayed there for a 7-day continuous stress test.

This 2.6MB drop was simply the OS reclaiming initialization overhead—a result of manual memory management (via custom pool allocators) preventing heap fragmentation from "pinning" that memory. You don't achieve that level of long-term residency stability by just "using a language well"; you get it by using a toolchain that allows you to treat the hardware as the ultimate source of truth.

bell-cot•1mo ago

True, but also a tautology.

Instead, I'd say that Rust & C are close enough, speed-wise, that (1) which one is faster will depend on small details of the particular use case, or (2) the speed difference will matter less than other language considerations.

bfrog•1mo ago

One example where Rust enables better and faster abstractions is traits. C you can do this with some ugly methods like macros and such but in Rust it’s not the implementers choice it’s the callers choice whether to use dynamic dispatch (function pointer table in C) or static dispatch (direct function calls!)

In c the caller isn’t choosing typically. The author of some library or api decides this for you.

This turns out to be fairly significant in something like an embedded context where function pointers kill icache and rob cycles jumping through hoops. Say you want to bit bang a bus protocol using GPIO, in C with function pointers this adds maybe non trivial overhead and your abstraction is no longer (never was) free. Traits let the caller decide to monomorphize that code and get effectively register reads and writes inlined while still having an abstract interface to GPIO. This is excellent!

K0nserv•1mo ago

> In c the callers isn’t choosing typically. The author of some library or api decides this for you.

Tbf this applies to Rust too. If the author writes

   fn foo(bar: Box<dyn BarTrait>)

they have forced the caller into dynamic dispatch.

Had they written

   fn foo(bar: impl BarTrait)

the choice would've remained open to the caller

nicoburns•1mo ago

Right, but almost all APIs in Rust use something like

    fn foo(bar: impl BarTrait)

and AFAIK it isn't possible to write that in C (though C++ does allow this kind of thing).

bfrog•1mo ago

C++ you either use templates or classes and virtuals. In either case the caller doesn't get to decide.

Seattle3503•1mo ago

Interesting, there isn't some way to have a template that is polymorphic over virtuals?

Maxatar•1mo ago

In C++ you do it the other way around, have a single class that is polymorphic over templates. The name of this technique within C++ is type-erasure (that term means something else outside of C++).

Examples of type erasure in C++ are classes like std::function and std::any, and normally you need to implement the type erasure manually, but there are some library that can automate it to a degree, such as [1], but it's fairly clumsy.

[1] https://www.boost.org/doc/libs/latest/doc/html/boost_typeera...

bfrog•1mo ago

It's neat this is a thing I guess, but I agree it looks fairly clumsy compared to the Rust answer.

bsaul•1mo ago

how do apis typically manage to actually « use » the « bar » of your example, such as storing it somewhere, without enforcing some kind of constraints ?

steveklabnik•1mo ago

"BarTrait" is the constraint.

This is monomorphized for every type you pass in, in short.

Maxatar•1mo ago

If you need to store the value then you have no choice but to take in a dyn trait.

steveklabnik•1mo ago

Depending on exactly what you mean, this isn't correct. This syntax is the same as <T: BarTrait>, and you can store that T in any other generic struct that's parametrized by BarTrait, for example.

marcosdumay•1mo ago

> you can store that T in any other generic struct that's parametrized by BarTrait, for example

Not really. You can store it on any struct that specializes to the same type of the value you received. If you get a pre-built struct from somewhere and try to store it there, your code won't compile.

steveklabnik•1mo ago

Can you show me what you’re talking about? I don’t understand what you mean. I’ll add a code example of what I mean in a bit.

steveklabnik•1mo ago

Here's what I'm talking about: https://play.rust-lang.org/?version=stable&mode=debug&editio...

tcfhgj•1mo ago

sure about that?

the struct in which it is stored, could be generic as well

Maxatar•1mo ago

I'm addressing the intent of the original question.

No one would ask this question in the case where the struct is generic over a type parameter bounded by the trait, since such a design can only store a homogeneous collection of values of a single concrete type implementing the trait; the question doesn't even make sense in that situation.

The question only arises for a struct that must store a heterogeneous collection of values with different concrete types implementing the trait, in which case a trait object (dyn Trait) is required.

embedding-shape•1mo ago

It's a tradeoff though, as I think traits makes the Rust build times grow really quickly. I don't know the exact characteristics of it, also I think they speed it up compared to how it used to be, but I do remember that you'll get noticeable build slowdowns the more you use traits, especially "complicated" ones.

treyd•1mo ago

Code is typically run many more times than it's compiled, so this is a perfectly good tradeoff to make.

cardanome•1mo ago

For release builds yes. For debug builds slow compile times kill productivity.

greener_grass•1mo ago

If you are not willing to make this trade then how much of a priority was run-time performance, really?

esrauch•1mo ago

It's never the case that only one thing is important.

In the extreme, you surely wouldn't accept a 1 day or even 1 week build time for example? It seems like that could be possible and not hypothetical for a 1 week build since a system could fuzz over candidate compilation, and run load tests and do PGO and deliver something better. But even if runtime performance was so important that you had such a system, it's obvious you wouldn't ever have developer cycles that take a week to compile.

Build time also even does matter for release: if you have a critical bug in production and need to ship the fix, a 1 hour build time can still lose you a lot here. Release build time doesn't matter until it does.

kace91•1mo ago

Doesn’t rust have incremental builds to speed up debug compilation? How slow are we talking here?

steveklabnik•1mo ago

Rust does have incremental rebuilds, yes.

Folks have worked tirelessly to improve the speed of the Rust compiler, and it's gotten significantly faster over time. However, there are also language-level reasons why it can take longer to compile than other languages, though the initial guess of "because of the safety checks" is not one of them, those are quite fast.

> How slow are we talking here?

It really depends on a large number of factors. I think saying "roughly like C++" isn't totally unfair, though again, it really depends.

sfink•1mo ago

My initial guess would be "because of the zero-cost abstractions", since I read "zero-cost" as "zero runtime cost" which implies shifting cost from runtime to compile time—as would happen with eg generics or any sort of global properties.

(Uh oh, there's an em-dash, I must be an AI. I don't think I am, but that's what an AI would think.)

steveklabnik•1mo ago

I used em dashes before AI, and won't stop now :)

That's sort of part of it, but it's also specific language design choices that if they were decided differently, might make things faster.

esrauch•1mo ago

People do have cold Rust compiles that can push up into measured in hours. Large crates often take design choices that are more compile time friendly shape.

Note that C++ also has almost as large problem with compile times with large build fanouts including on templates, and it's not always realistic for incremental builds to solve either especially time burnt on linking, e.g. I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.

embedding-shape•1mo ago

> I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.

Bevy, a Rust ECS framework for building games (among other things), has a similar solution by offering a build/rust "feature" that enables dynamic linking (called "dynamic_linking"). https://bevy.org/learn/quick-start/getting-started/setup/#dy...

kibwen•1mo ago

There's no Rust codebase that takes hours to compile cold unless 1) you're compiling a massive codebase in release mode with LTO enabled, in which case, you've asked for it, 2) you've ported Doom to the type system, or 3) you're compiling on a netbook.

dwattttt•1mo ago

I'm curious if this is tracked or observed somewhere; crater runs are a huge source of information, metrics about the compilation time of crates would be quite interesting.

estebank•1mo ago

I know some large orgs have this data for internal projects.

This page gives a very loose idea of how we're doing over time: https://perf.rust-lang.org/dashboard.html

esrauch•1mo ago

Down and to the right is good, but the claim here is the average full release build is only 2 seconds?

kibwen•1mo ago

Those are graphs of averages from across the benchmarking suite, which you can read much more information about here: https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchma...

torginus•1mo ago

A lot of C++ devs advocate for simple replacements for the STL that do not rely too much on zero-cost abstractions. That way you can have small binaries, fast compiles, and make a fast-debug kinda build where you only turn on a few optimizations.

That way you can get most of the speed of the Release version, with a fairly good chance of getting usable debug info.

A huge issue with C++ debug builds is the resulting executables are unusably slow, because the zero-cost abstractions are not zero cost in debug builds.

pjmlp•1mo ago

Unless one uses VC++, which can debug release builds.

Similar capabilities could be made available in other compilers.

torginus•1mo ago

Its not just the compiler - MSVC like all others has a tendency to mangle code in release builds to such an extent that the debug info is next to useless (which to be fair is what I asked it to do, not that I fault it).

Now to hate a bit on MSVC - its Edit & Continue functionality makes debug builds unbearably slow, but at least it doesn't work, so my first thing is to turn that thing off.

pjmlp•1mo ago

Which is why recent versions have dynamic debugging mode.

cyberax•1mo ago

You can debug release builds with gcc/clang just fine. They don't generate debug information by default, but you can always request it ("-O3 -g" is a perfectly fine combination of flags).

pjmlp•1mo ago

Not really, because some optimizations get the step through and such rather confusing.

VC++ dynamic debugging pretends the code motion, inlining and similar optimizations aren't there and maps back to the original code as written.

Unless this has been improved for gdb,lldb.

cyberax•1mo ago

Ah, I see what you mean.

GCC can now emit information that can be used to reconstruct the frame pointers for inlined functions: https://lwn.net/Articles/940686/ It's now filtering through various projects: https://sourceware.org/binutils/wiki/sframe

It will not undo _all_ the transformations, but it will help a lot. I used it for backtraces, and it fixed the missing frame issues for me.

This was possible with the earlier DWARF format (it's Turing-complete), and I think this is how VCC does it. Although I have not checked it.

arw0n•1mo ago

I think this also massively depends on your domain, familiarity with the code base and style of programming.

I've changed my approach significantly over time on how I debug (probably in part due to Rusts slower compile times), and usually get away with 2-3 compiles to fix a bug, but spend more time reasoning about the code.

embedding-shape•1mo ago

Absolutely, was not trying to claim otherwise. But since we're engineers (at least I like to see myself as one), it's worth always keeping in mind that almost everything comes with tradeoffs, even traits :)

Someone down the line might be wondering why suddenly their Rust builds take 4x the time after merging something, and just maybe remembering this offhand comment will make them find the issue faster :)

cogman10•1mo ago

AFAIK, it's not the traits that does it but rather the generics.

Rust does make it a lot easier to use generics which is likely why using more traits appears to be the cause of longer build times. I think it's just more that the more traits you have, the more likely you are to stumble over some generic code which ultimately generates more code.

embedding-shape•1mo ago

> AFAIK, it's not the traits that does it but rather the generics.

Aah, yes, that sounds more correct, the end result is the same, I failed to remember the correct mechanism that led to it. Thank you for the correction!

emidln•1mo ago

I probably enjoy ELF hacking more than most, but patching an ELF binary via LD_PRELOAD, linker hacks, or even manual or assisted relinking tricks are just tools in the bag of performant C/C++ (and probably Rust too, but I don't get paid to make that fast). If you care about perf and for whatever reason are using someone else's code, you should be intimately familiar with your linker, binary format, ABI, and OS in addition to your hardware. It's all bytes in the end, and these abstractions are pliable with standard tooling.

I'd usually rather have a nice language-level interface for customizing implementation, but ELF and Linux scripting is typically good enough. Binary patching is in a much easier to use place these days with good free tooling and plenty of (admittedly exploit-oriented) tutorials to extrapolate from as examples.

pmarin•1mo ago

The C way is to avoid abstractions in first place.

pjmlp•1mo ago

Yet it is one in itself, otherwise UNIX would still be written in Assembly.

seanw444•1mo ago

Assembly itself is an abstraction. UNIX should have been written in machine code.

pjmlp•1mo ago

It was until version 4, when the C rewrite took place as it wasn't fun to keep writing it in Assembly.

seanw444•1mo ago

Assembly code != machine code

pjmlp•1mo ago

Depends if you have an Assembler at hand, or a plain hexdump monitor, hopefully with a checksum entry on each row.

pron•1mo ago

The question is what do we mean by "a fast language"? We could mean it to be how fast the fastest code that a performance expert in that language, with no resource constraints, could write. Or, we can restrict it to "idiomatic" code. Or we can say that a fast language is the one where an average programmer is most likely to produce fast code with a given budget (in which case probably none of the languages mentioned here are among the fastest).

justin66•1mo ago

These are the languages an "average programmer" would use. What language are you thinking of?

pron•1mo ago

I may be biased, but I think that if you have a budget that's reasonable in the industry for some project size and includes not only the initial development but also maintenance and evolution over the software's lifetime, especially when it's not small (say over 200KLOC), and you want to choose the language that would give you the fastest outcome, you will not get a faster program than if you chose Java. To get a faster program in any language, if possible, would require a significantly higher budget (especially for the maintenance and evolution).

cdelsolar•1mo ago

Go?

pron•1mo ago

I don't think so, but it may not be far behind. More importantly, though, I'm fairly confident it won't be Assembly, or C, or C++, or Rust, or Zig, but also not Python, or TS/JS. The candidates would most likely include Java, C#, and Go.

xnorswap•1mo ago

Do you think C# / .NET doesn't stack up in terms of budget, or not stack up in terms of runtime speed?

pron•1mo ago

It's probably in the same ballpark. To me, the contenders for "the fastest language" include Java, C#, and Go and not many more.

justin66•1mo ago

Ah thanks. That clarifies things.

swiftcoder•1mo ago

Purely by the numbers, an "average programmer" is much more likely to use Javascript, Python, or Java. The native languages have been a bit of a niche field since the late 90's (i.e. heavily slanted towards OS, embedded, and gamedev folks)

DoctorOW•1mo ago

> we can say that a fast language is the one where an average programmer is most likely to produce fast code with a given budget

I'd say most people use this definition, with the caveat that there's no official "average programmer", and everyone has different standards.

pron•1mo ago

Right, but if we assume that programmers' compensation is statistically correlated with their skill, then we can drop "average" and just talk about budget.

Avicebron•1mo ago

That seems like a wild assumption to make.

gf000•1mo ago

Statistically? I don't think it's that wild.

If you prefer it, salaries correlate with years of experience, and the latter surely correlates with skills, right?

(No, this doesn't mean that every 10 years XP dev is better than a 3 years XP one, but it's definitely a strong correlation)

jillesvangurp•1mo ago

It's compilers and compiler optimizations that make code run fast. The real question is if the Rust language and the richer memory semantics it has help the Rust compiler to provide a bit more context for optimizing that the C compiler wouldn't have do unless you hand optimize your code.

If you do hand optimize your code, all bets are off. With both languages. But I think the notion that the Rust compiler has more context for optimizing than the C compiler is maybe not as controversial as the notion that language X is better/faster than language Y. Ultimately, producing fast/optimal code in C kind of is the whole point of C. And there aren't really any hacks you can do in C that you can't do in Rust, or vice versa. So, it would be hard to make the case that Rust is slower than C or the other way around.

However, there have been a few rewrites of popular unix tools in Rust that benchmark a bit faster than their C equivalents. Could those be optimized in C. Probably; but they just haven't. But there is a case there of arguing that maybe Rust code is a bit easier to make fast than C code.

gf000•1mo ago

> It's compilers and compiler optimizations that make code run fast

Well, then in many cases we are talking about LLVM vs LLVM.

> Ultimately, producing fast/optimal code in C kind of is the whole point of C

Mostly a nitpick, but I'm not convinced that's true. The performance queen has been traditionally C++. In C projects it's not rare to see very suboptimal design choices mandated by the language's very low expressivity (e.g. no multi-threading, sticking to an easier data structure, etc).

jillesvangurp•1mo ago

The compiler backend yes. But there probably is a lot of work happening elsewhere in the compiler tools.

adgjlsfhk1•1mo ago

Compilers are only as good as the semantics you give them. C and C++ both have some pretty bad semantics in many places that heavily encourage inefficient coding patterns.

pron•1mo ago

> It's compilers and compiler optimizations that make code run fast.

Compiler optimisations certainly play a large role, but they're not the only thing. Tracing-moving garbage collectors can trade off CPU usage for memory footprint and allow you to shift costs between them, so depending on the relative cost of CPU and RAM, you could gain speed (throughput) in exchange for RAM at a favourable price.

Arenas also offer a similar tradeoff knob, but they come with a higher development/evolution price tag.

dwattttt•1mo ago

It might be a minute or two before we get to see the words "favourable price" anywhere near the word RAM again.

torginus•1mo ago

I think when designing a language, and a set of libraries for it, the designer has an idea of how code for said language should be written, what 'idiomatic' code looks like.

In that context, the designer can reason about how should code written that way should perform.

So I think this is a meaningful question for a langauge designer, which makes it a meaningful question for the users as well, when phrased like this:

'How does idiomatic code (as imagined by the language creators) perform in language X vs Y?'

hobofan•1mo ago

Off-topic: Is it just me, or have there been a disproportionally high number of ~mid 2025 posts that have been reposted the last few days?

tycoon666•1mo ago

No.

steveklabnik•1mo ago

I love Betteridge's Law, and so one small thing I was trying to do here was subvert it a bit. Instead of "no," in this case, the answer is "the question is malformed."

otikik•1mo ago

I know which one is faster to produce an unintended segfault.

jonstewart•1mo ago

“It’s the memory, stupid!” So wrote Richard Sites, lead designer of the famous DEC Alpha chip, in 1996 (http://cva.stanford.edu/classes/cs99s/papers/architects_look...). It’s rung true for 30 years.

Where C application code often suffers, but by no means always, is the use of memory for data structures. A nice big chunk of static memory will make a function fast, but I’ve seen many C routines malloc memory, do a strcpy, compute a bit, and free it at the end, over and over, because there’s no convenient place to retain the state. There are no vectors, no hash maps, no crates.io and cargo to add a well-optimized data structure library.

It is for this reason I believe that Rust, and C++, have an advantage over C when it comes to writing fast code, because it’s much easier to drop in a good data structure. To a certain extent I think C++ has an advantage over Rust due to easier and better control over layout.

JacoboJacobi•1mo ago

I'd certainly agree that malloc is the Achilles heel of any real world C. Overall though C++ was not a particularly good solution to memory efficiency since having OO available made the situation look like a fast sprint to the cake shop.

jonstewart•1mo ago

Heavy smalltalk-style OOP in C++ has kind of died out, especially with data structures. So with any templated data structure you’re reducing indirection from vtables and you have the opportunity to allocate however you want, often in continuous slabs to ease memory transfer and caching.

kibwen•1mo ago

I like to say that there are two primary factors when we talk about how "fast" a language is:

1. What costs does the language actively inject into a program?

2. What optimizations does the language facilitate?

Most of the time, it's sufficient to just think about the first point. C and Rust are faster than Python and Javascript because the dynamic nature of the latter two requires implementations to inject runtime checks all over the place to enable that dynamism. Rust and C simply inject essentially zero active runtime checks, so membership in this club is easy to verify.

The second one is where we get bogged down, because drawing clean conclusions is complicated by the (possibly theoretical) existence of optimizing compilers that can leverage the optimizability inherent to the language, as well as the inherent fragility of such optimizations in practice. This is where we find ourselves saying things like "well Rust could have an advantage over C, since it frequently has more precise and comprehensive aliasing information to pass to the optimizer", though measuring this benefit is nontrivial and it's unclear how well LLVM is thoroughly utilizing this information at present. At the same time, the enormous observed gulf between Rust in release mode (where it's as fast as C) and Rust in debug mode (when it's as slow as Ruby) shows how important this consideration is; Rust would not have achieved C speeds if it did not carefully pick abstractions that were amenable to optimization.

bluGill•1mo ago

Is Javascript significantly slower? It is extremely common in the real world and so a lot of effort has gone into optimizing it - v8 is very good. Yes C and Rust enable more optimizations: they will be slightly faster, but javascript has had a lot of effort put into making it run fast.

sgeisenh•1mo ago

Yes, for most real-world examples JavaScript is significantly slower; JIT isn’t free and can be very sensitive to small code changes, you also have to consider the garbage collector.

Speed is also not the only metric, Rust and C enable much better control over memory usage. In general, it is easier to write a memory-efficient program in Rust or C than it is in JS.

kibwen•1mo ago

Yes. V8 (and other Javascript JIT engines) are very good, with a lot of effort put into them by talented engineers. But there's a floor on performance imposed by the language's own semantics. Of course, if your program is I/O bound rather than CPU bound (especially at network-scale latencies), this may never be noticeable. But a Javascript program will use significantly more CPU, significantly more memory, and both CPU and memory usage will be significantly more variable and less predictable than a program written in C or Rust.

sfink•1mo ago

It's complicated, though mostly that complication doesn't change the overall conclusion.

Much of the language's semantics can be boiled away before JIT compilation, because that flexibility isn't in use at that time, which can be proven by a quick check before entering the hot code. (Or in the extreme, the JIT code doesn't check it at all, and the runtime invalidates that code lazily when an operation is performed that violates those preconditions.) Which thwarts people who do simple-minded comparisons of "what language is fastest at `for (i = 0; i < 10000000; i++) x += 7`?", because the runtime is entirely dominated by the hot loop, and the machine code for the hot loop is identical across all languages tested.

Still: you have to spend time JIT compiling. You have to do some dynamic checks in all but the innermost hot code. You have to materialize data in memory, even if just as a fallback, and you have to garbage collect periodically.

So I agree with your conclusion, except for perhaps un-nuanced use of the term "performance floor" -- there's really no elevated JS floor, at least not a global one; simple JS can generate the same or nearly the same machine code as equivalent C/C++/Rust, will use no more memory, and will never GC. But that floor only applies to a small subset of code (which can be the bulk of the runtime!), and the higher floor does kick in for everything else. So generally speaking, JS can only "be as fast" as non-managed languages for simple programs.

(I'll ignore the situations where the JIT can depend on stricter constraints at runtime than AOT-compiled languages, because I've never seen a real-world situation where it helps enough to counterbalance everything else.)

steveklabnik•1mo ago

I like this framing a lot.

It's also interesting to think about this in terms of the "zero cost abstractions"/"zero overhead abstractions" idea, which Stroustrup wrote as "What you don't use, you don't pay for. What you do use, you couldn't hand code any better". The first sentence is about 1, and the second one is about what you're able to do with 2.

AnimalMuppet•1mo ago

I think there's a third question, but I don't know quite how to phrase it. Maybe "how real-world fast is the language?" or "how fast is the language in the hands of someone who isn't obsessively thinking about speed?"

That is, most of the time, most of the users aren't thinking about how to squeeze the last tenth of a percent of speed out of it. They aren't thinking about speed at all. They're thinking about writing code that works at all, and that hopefully doesn't crash too often. How fast is the language for them? Does it nudge them toward faster code, or slower? Are the default, idiomatic ways of writing things the fast way, or the slow way?

morshu9001•1mo ago

A lot of people think of static types as a safety feature, but the origin is performance. The assembly needs to know struct sizes ahead of time.

classified•1mo ago

> Mozilla tried to parallelize Firefox’s style layout twice in C++, and both times the project failed. The multithreading was too tricky to get right.

That is a damn good reason to choose Rust over C++, even if the Rust implementation of the "same" thing should be a bit slower.

bluGill•1mo ago

Only if it is repeatable. We have no information on what they learned in the two failed attempts - it is likely that they learned from the failures and started other architectural changes that enabled the final one to work. As such we cannot say anything about this.

Rust does have some interesting features, which restrict what you are allowed to do and thus make some things impossible but in turn make other things easier. It is highly likely that those restrictions are part of what made this possible. Given infinite resources (which you never have) a C++ implementation could be faster because it has better shared data concepts - but those same shared data concepts make it extremely hard to reason about multi-threaded code and so humanly you might not be able to make it work.

steveklabnik•1mo ago

We do have some information: https://youtu.be/Y6SSTRr2mFU?t=361 (linked with the specific timestamp)

In short, the previous two attempts were done by completely different groups of different people, a few years apart. Your direct question about if direct wisdom from these two attempts was shared, either between them, or used by Stylo, isn't specifically discussed though.

> a C++ implementation could be faster because it has better shared data concepts

What concepts are those?

bluGill•1mo ago

> What concepts are those?

Data can be modified by any thread that wants to. It is up to you to ensure that modifications work correctly without race conditions. In rust you can't do this (unsafe aside), the borrow checker enforces data access patterns that can't be proved correct.

Again let me be clear: the things rust doesn't allow are hard to get correct.

steveklabnik•1mo ago

I mean, data races are undefined behavior in C++ the same way that they are in unsafe Rust. The languages are equivalent there.

bluGill•1mo ago

Only if there is a data race - if there is no data race C++ lets you do it. Rust doesn't let you do things that don't have a race but cannot be proven within the context of rust to not have a data race.

steveklabnik•1mo ago

In safe Rust, yes, you must prove it. But in unsafe Rust, it's up to you. It's the exact same thing.

sfink•1mo ago

It's a good reason to choose Rust over C++ for that application, and others that share its characteristics. (Or, more to the point of the article, it's a good reason to declare that Rust is faster than C++ for that application.)

It doesn't provide a lot of evidence in either direction for the rest of the vast space of potential programs.

(Knowing C++ fairly well and Rust not very well, I have Opinions, but they are not very well-informed opinions. They roughly boil down to: Rust is generally better for most programs, largely due to cargo not Rust, but C++ is better for more exploratory programming where you're going to be frequently reworking things as you go. Small changes ripple out across the codebase much more with Rust than C++ in my [limited] experience, and as a result the percentage of programming time spent fixing things up is substantially higher with Rust.)

IshKebab•1mo ago

I think the only reasonable way to interpret this question is "is Rust written by reasonably competent Rust developer spending a reasonable amount of time faster/slower than an equally competent C developer spending the same amount of time".

I don't think a language should count as "fast" if it takes an expert or an inordinate amount of time to get good performance, because most code won't have that.

So on those grounds I would say Rust probably is faster than C, because it makes it much much easier to use multithreading and more optimised libraries. For example a lot of C code uses linked lists because they're easy to write in C, even when a vector would be faster and more appropriate. Multithreading can just be a one line change in Rust.

oguz-ismail2•1mo ago

So assembly is the slowest language?

hmry•1mo ago

Depends. If it takes an assembly programmer 8 hours to implement <X>, can an equally proficient Python programmer spending 8 hours to implement <X> create a faster program?

Let's say they only need 2 hours to get the <X> to work, and can use the remaining 6 hours for optimizing. Can 6 hours of optimizing a Python program make it faster than the assembly program?

The answer isn't obvious, and certainly depends on the specific <X>. I can imagine various <X> where even unlimited time spent optimizing Python code won't produce faster results than the assembly code, unless you drop into C/C++/Zig/Rust/D and write a native Python extension (and of course, at that point you're not comparing against Python, but that native language).

IshKebab•1mo ago

Maybe it's best to think of it as an effort-performance graph. For a given amount of effort what performance do you get?

Assembly is going to give you pretty great performance generally, but the line only starts when you get to "ridiculous effort"!

kstrauser•1mo ago

Or honestly, anything involving a hashmap. Of course you can write those in C, but it’s enough friction that most people won’t for minor things. In Rust, it’s trivial, so people are more likely to use them.

OskarS•1mo ago

I think personally the answer is "basically no", Rust, C and C++ are all the same kind of low-level languages with the same kind of compiler backends and optimizations, any performance thing you could do in one you can basically do in the other two.

However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).

Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.

renox•1mo ago

>signed integer overflow being UB would count for C/C++

Then, I raise you to Zig which has unsigned integer overflow being UB.

steveklabnik•1mo ago

Interestingly enough, Zig does not use the same terminology as C/C++/Rust do here. Zig has "illegal behavior," which is either "safety checked" or "unchecked." Unchecked illegal behavior is like undefined behavior. Compiler flags and in-source annotations can change the semantics from checked to unchecked or vice versa.

Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.

ladyanita22•1mo ago

Rust has UB overflow as well, just unsafe.

https://doc.rust-lang.org/std/intrinsics/fn.unchecked_add.ht...

dana321•1mo ago

Rust has linker optimizations that can make it faster in some cases

quotemstr•1mo ago

Huh? Both have LTO. There are linker optimizations available to Rust and not to C and C++. They all use the same God damn linker.

josephg•1mo ago

A few years ago I pulled a rust library into a swift app on ios via static linking & C FFI. And I had a tiny bit of C code bridge the languages together.

When I compiled the final binary, I ran llvm LTO across all 3 languages. That was incredibly cool.

Measter•1mo ago

> On the other hand, signed integer overflow being UB would count for C/C++

C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.

OskarS•1mo ago

It's interesting, because it's a "cultural" thing like the author discusses, it's a very good point. Sure, you can do unsafe integer arithmetic in Rust. And you can do safe integer arithmetic with overflow in C/C++. But in both cases, do you? Probably you don't in either case.

"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().

On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.

Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.

Xirdus•1mo ago

C++ isn't always "unsafe-but-high-perf". Move semantics are a good example. The spec goes to great lengths to ensure safety in a huge number of scenarios, at the cost of performance. Mostly shows up in two ways: one, unnecessary destructor calls on moved out objects, and two, allowing throwing exceptions in move constructors which prevents most optimizations that would be enabled by having move constructors in the first place (there was an article here recently on this topic).

Another one is std::shared_ptr. It always uses atomic operations for reference counting and there's no way to disable that behavior or any alternative to use when you don't need thread safety. On the other hand, Rust has both non-atomic Rc and atomic Arc.

josefx•1mo ago

> one, unnecessary destructor calls on moved out objects

That issue predates move semantics by ages. The language always had very simple object life times, if you create Foo foo; it will call foo.~Foo() for you, even if you called ~Foo() before. Anything with more complex lifetimes either uses new or placement new behind the scenes.

> Another one is std::shared_ptr.

From what I understand shared_ptr doesn't care that much about performance because anyone using it to manage individual allocations already decided to take performance behind the shed to be shot, so they focused more on making it flexible.

Xirdus•1mo ago

C++11 totally could have started skipping destructors for moved out values only. They chose not to, and part of the reason was safety.

I don't agree with you about shared_ptr (it's very common to use it for a small number of large/collective allocations), but even if what you say is true, it's still a part of C++ that focuses on safety and ignores performance.

Bottom line - C++ isn't always "unsafe-but-high-perf".

the8472•1mo ago

The rust standard library does make targeted use of unchecked arithmetic when the containing type can ensure that that overflow never happens and benchmarks have shown that it benefits performance. E.g. in various iterator implementations. Which means the unsafe code has to be written and encapsulated once, users can now use safe for loops and still get that performance benefit.

toodlemcnoodle•1mo ago

I agree with this whole-heartedly. Rust is a LANGUAGE and C is a LANGUAGE. They are used to describe behaviours. When you COMPILE and then RUN them you can measure speed, but that's dependent on two additional bits that are not intrinsically part of the languages themselves.

Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.

pessimizer•1mo ago

No. As you've made clear, it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

Saying that a language is about "expressability" is obvious. A language is nothing other than a form of expression; no more, no less.

toodlemcnoodle•1mo ago

Yes. But the speed is dependent on whether or not the compiler makes use of that information and the machine architecture the compiler is running it on.

Speed is a function of all three -- not just the language.

Optimizations for one architecture can lead to perverse behaviours on another (think cache misses and memory layout -- even PROGRAM layout can affect speed).

These things are out of scope of the language and as engineers I think we ought to aim to be a bit more precise. At a coarse level I can understand and even would agree with something like "Python is slower than C", but the same argument applies there as well.

But at some point objectivity ought to enter the playing field.

irishcoffee•1mo ago

> ... it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

There is expressing idea via code, and there is optimization of code. They are different. Writing what one may think is "fully optimized code" the first time is a mistake, every time, and usually not possible for a codebase of any significant size unless you're a one-in-a-billion savant.

Programming languages, like all languages, are expressive, but only as expressive as the author wants to be, or knows how to be. Rarely does one write code and think "if I'm not expressive enough in a way the compiler understands, my code might be slightly slower! Can't have that!"

No, people write code that they think is correct, compile it, and run it. If your goal is to make the most perfect code you possibly can, instead of the 95% solution is the robust, reliable, maintainable, and testable, you're doing it wrong.

Rust is starting to take up the same mental headspace as LLMs: they're both neat tools. That's it. I don't even mind people being excited about neat tools, because they're neat. The blinders about LLMs/Rust being silver bullets for the software industry need to go. They're just tools.

foldr•1mo ago

>in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it.

I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).

That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.

OskarS•1mo ago

qsort is obviously just an example, this situation applies to anything that takes a callback: in C++/Rust, that's almost always generic and the compiler will monomorphize the function and optimize around it, and in C it's almost always a function pointer and a userData argument for state passed on the stack. (and, of course, it applies not just to callbacks, but more broadly to anything templated).

I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?

If anyone has any resources that have looked into this, I would be curious to hear about it.

foldr•1mo ago

My point is that the real issue is just whether or not the function call is compiled as part of the same unit as the function. If it is, then, certainly, modern C compilers can inline functions called via function pointers. The inlining itself is not made easier via the template magic.

Your C comparator function is already “monomirphized” - it’s just not type safe.

Maxatar•1mo ago

Dynamic linking will inhibit inlining entirely, and so yes qsort does not in practice get inlined if libc is dynamically linked. However, compilers can inline definitions across translation units without much of any issue if whole program optimization is enabled.

The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.

1718627440•1mo ago

If you choose to put a boundary in your code that makes it span over several binaries, so that they can be swapped out at runtime, no compiler in any language can optimize that away, because that would be against the interface you explicitly chose. That's what dynamic linking aka. runtime linking is in C.

This is not an issue for libc, because the behaviour of that is not specified by the code itself, but by the spec, which is why C compilers can and do completely remove or change calls to libc, much to the distress of someone expecting a portable assembler.

jarjoura•1mo ago

Wouldn't C++ and Rust eventually call down into those same libc functions?

I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.

steveklabnik•1mo ago

Rust doesn't call into libc for sort, it has its own implementation in the standard library.

oguz-ismail2•1mo ago

Obviously. How about more complex things like multi-threading APIs though? Can the Rust compiler determine that the subject program doesn't need TLS and produce a binary that doesn't set it up at all, for example?

dwattttt•1mo ago

Optimising out TLS isn't going to be a good example of compiler capability. Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.

oguz-ismail2•1mo ago

> Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

TLS is a language feature. Whether another thread exists doesn't mean it has to use the same facilities as the main program.

> The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.

Say the program is not dynamically linked. Still no?

dwattttt•1mo ago

> Say the program is not dynamically linked. Still no?

Whether the program has dynamic dependencies does not dictate whether a thread can be created, that's a property of the OS. Windows has CreateRemoteThread, and I'd be shocked if similar capabilities didn't exist elsewhere.

If I mark something as thread-local, I want it to be thread-local.

steveklabnik•1mo ago

I mean, it’s not that obvious, your parent asked about it directly, and you could easily imagine calling it libc for this.

I beehive the answer to your question is “yes” because no-std binaries can be mere bytes in size, but I suspect that more complex programs will almost always have some dependency somewhere (possibly even the standard library, but I don’t know offhand) that uses TLS somewhere in it.

adgjlsfhk1•1mo ago

Many of the libc functions are bad apis with traditionally bad implementations.

jandrewrogers•1mo ago

The main performance difference between Rust, C, and C++ is the level of effort required to achieve it. Differences in level of effort between these languages will vary with both the type of code and the context.

It is an argument about economics. I can write C that is as fast as C++. This requires many times more code that takes longer to write and longer to debug. While the results may be the same, I get far better performance from C++ per unit cost. Budgets of time and money ultimately determine the relative performance of software that actually ships, not the choice of language per se.

I've done parallel C++ and Rust implementations of code. At least for the kind of performance-engineered software I write, the "unit cost of performance" in Rust is much better than C but still worse than C++. These relative costs depend on the kind of software you write.

gf000•1mo ago

> I can write C that is as fast as C++

I generally agree with your take, but I don't think C is in the same league as Rust or C++. C has absolutely terrible expressivity, you can't even have proper generic data structures. And something like small string optimization that is in standard C++ is basically impossible in C - it's not an effort question, it's a question of "are you even writing code, or assembly".

jandrewrogers•1mo ago

Yes, it is the difference between "in theory" and "in practice". In practice, almost no one would write the C required to keep up with the expressiveness of modern C++. The difference in effort is too large to be worth even considering. It is why I stopped using C for most things.

There is a similar argument around using "unsafe" in Rust. You need to use a lot of it in some cases to maintain performance parity with C++. Achievable in theory but a code base written in this way is probably going to be a poor experience for maintainers.

Each of these languages has a "happy path" of applications where differences in expressivity will not have a material impact on the software produced. C has a tiny "happy path" compared to the other two.

pjmlp•1mo ago

Also in theory, one could be using a static analyser all the time as a C or C++ build step.

Lint is part of UNIX toolset since 1979, and we have modern versions freely available like clang tidy.

In practice, many devs keep thinking they know better.

pjmlp•1mo ago

> I can write C that is as fast as C++.

Only if ignoring the C++ compile time execution capabilites.

pmarin•1mo ago

C++ compile time execution is just a gimmicky code generator, you can do it in any language.

pjmlp•1mo ago

Yeah, I could also be writting in a macro assembler for some Lisp inspired ideas and optimal performace.

jandrewrogers•1mo ago

Any code that can be generated at compile-time can be written the old fashioned way.

pjmlp•1mo ago

Including using a macro assembler with a bunch MASM/TASM like clever macros.

throwaway2037•1mo ago

I like this post. It is well-balanced. Unfortunatley, we don't see enough of this in discussions of Rust vs $lang. Can you share a specific example of where the "unit cost of performance" in Rust is worse than C++?

trueismywork•1mo ago

Strict aliasing analysis of rust will provide some fundamental better optimization than C.

jarjoura•1mo ago

At that point the real question should be restated. Does the LLVM IL that is generated from clang and rustc matter in a meaningful way?

marcosdumay•1mo ago

> On the other hand, signed integer overflow being UB would count for C/C++

Rust defaults to the platform treatment of overflows. So it should only make any difference if the compiler is using it to optimize your code, what will most likely lead to unintended behavior.

astrange•1mo ago

Writing a function with UB for overflows doesn't cause unintended behavior if you're doing it to signal there aren't any overflows. And it's very important because it's needed to do basically any loop rewriting.

On the other hand, writing a function that recovers from overflows in an incorrect/useless way still isn't helpful if there are overflows.

kibwen•1mo ago

Rust's overflow behavior isn't platform-dependent. By default, Rust panics on overflow when compiled in debug mode and wraps on overflow when compiled in release mode, and either behavior can be selected in either mode by a compiler flag. In neither case does Rust consider it UB for arithmetic operations to wrap.

ndesaulniers•1mo ago

> For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function.

That's more of a critique of the standard libraries than the languages themselves.

If someone were writing C and cared, they could provide their own implementation of sort such that the callback could be inlined (LLVM can inline indirect calls when all call sites are known), just as it would be with C++'s std::sort.

Further, if the libc allows for LTO (active area of research with llvm-libc), it should be possible to optimize calls to qsort this way.

jesse__•1mo ago

"could" and "should" are doing some very theoretical heavy lifting here.

Sure, at the limit, I agree with you, but in reality, relying on the compiler to do any optimization that you care about (such as inlining an indirect function call in a hot loop) is incredibly unwise. Invariably, in some cases it will fail, and it will fail silently. If you're writing performance critical code in any language, you give the compiler no choice in the matter, and do the optimization yourself.

I do generally agree that in the case of qsort, it's an API design flaw

oguz-ismail2•1mo ago

> qsort, it's an API design flaw

It's just a generic sorting function. If you need more you're supposed to write it yourself. The C standard library exists for convenience not performance.

jesse__•1mo ago

Fair point.

josephg•1mo ago

> That's more of a critique of the standard libraries than the languages themselves.

But we're right to criticise the standard libraries. If the average programmer uses standard libraries, then the average program will be affected (positively and negatively) by its performance and quirks.

tick_tock_tick•1mo ago

You're qsort example is basically the same reason people say C++ is faster than Rust. C++ templates are still a lot more powerful than Rusts systems but that's getting closer and closer every day.

josephg•1mo ago

It is?? Can you give some examples of high performance stuff you can do using C++'s template system that you can't do in rust?

jandrewrogers•1mo ago

They are likely referring to the scope of fine-grained specialization and compile-time codegen that is possible in modern C++ via template metaprogramming. Some types of complex optimizations common in C++ are not really expressible in Rust because the generics and compile-time facilities are significantly more limited.

As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20. In practice, few people have the time or patience to generate this code manually so it doesn't get written.

Effective optimization at scale is difficult without strong metaprogramming capabilities. This is an area of real strength for C++ compared to other systems languages.

josephg•1mo ago

Again, can you provide an example or two? Its hard to agree or disagree without an example.

I think all C++ wild template stuff can be done via proc macros. Eg, in rust you can add #[derive(Serialize, Deserialize)] to have a highly performant JSON parser & serializer. And thats just lovely. But I might be wrong? And maybe its ugly? Its hard to tell without real examples.

steveklabnik•1mo ago

Specialization isn’t stable in Rust, but is possible with C++ templates. It’s used in the standard library for performance reasons. But it’s not clear if it’ll ever land for users.

tick_tock_tick•1mo ago

Rust doesn't allow specialization and likely never will because it's unsound https://www.reddit.com/r/rust/comments/1p346th/specializatio... has a couple of nice comments about it.

But yes it's basically

template <typename T, size_t N> class Example { vector<T> generic; };

template<> class Example<int32_t, 32> { int bitpackinhere; }

dwattttt•1mo ago

> As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20.

It's also still less elegant, but compile time codegen for specialisation is part of the language (build system?) with build.rs & macros. serde makes strong use of this to generate its serialisation/deserialisation code.

pjmlp•1mo ago

And compile time execution.

With C you only have macro soup and the hope the compiler might optimise some code during compilation into some kind of constant values.

With C++ and Rust you're sure that happens.

vlovich123•1mo ago

I’m not sure about the other UB opportunities, but in idiomatic rust code this just doesn’t come up.

In C, you frequently write for loops with signed integer counters for the compiler to realize the loop must hit the condition. In Rust you write for..each loops or invoke heavily inlined functional operators. It ends up all lowering to the same assembly. C++ is the worst here because size_t is everywhere in the standard library so you usually end up using size_t for the loop counter, negating the ability for the compiler to exploit UB.

atoav•1mo ago

There was a contest for which language the fastest tokenizer could be written in. I entered my naive 15 minutes Rust version and got second place among roughly 30 entries. First place was hand-crafted assembly.

I am not saying Rust is faster always. But it can be a damn performant language even if you don't think about performance too deeply or don't twist yourself into bretzels to write performant code.

And in my book that counts for something. Because yes, I want my code to be performant, but I'd also not have it blow up on edge cases, have a way to express limitations (like a type system) and have it testable. Rust is pretty good even if you ignore the hype. I write audio DSP code on embedded devices with a strict deadline in C++. I plan to explore Rust for this, especially now since more and more embedded devices start to have more than one processor core.

beng-nl•1mo ago

This is a tangent, because it clearly didn’t pan out, but I had hope for rust having an edge when I learned about how all objects are known to be immutable or not. This means all the mutable objects can be held together, as well as the immutable, and we’d have more efficient use of the cache: memory writes to mutable objects share the cache with other mutable objects, not immutable Objects, and the bandwidth isn’t wasted on writing back bytes of immutable objects that will never change.

As I don’t see any reason rust would be limited in runtime execution compared to c, I was hoping for this proving an edge.

Apparently not a big of an effect as I hoped.

rcxdude•1mo ago

I think it would be quite difficult to actually arrange the memory layout to take advantage of this in a useful way. Mutable/immutable is very context-dependent in rust.

pornel•1mo ago

Rust doesn't have immutable memory, only access restrictions. An exclusive owner of an object can always mutate it, or can lend temporary read-only access to it. So the same memory may flip between exclusive-write and shared-read back and forth.

It's an interesting optimization, but not something that could be done directly.

mid-kid•1mo ago

I almost ignored this post because I can't stand this particular war, where examples are cherry picked to prove either answer.

I'm very happy to see the nuanced take in this article, slowly deconstructing the implicit assumptions proposed by the person asking this question, to arrive at the same conclusion that I long have. I hope this post reaches the right people.

A particular language doesn't have a "speed", a particular implementation may have, and the language may have properties that make it difficult to make a fast implementation (of those specific properties/features) given the constraints of our current computer architectures. Even then, there's usually too many variables to make a generalized statement, and the question often presumes that performance is measured as total cpu time.

steveklabnik•1mo ago

I will admit the title was a bit of a gamble, but thank you for taking the time to read it and I'm glad that you enjoyed it in the end.

jibal•1mo ago

We recently had a post here where the claim being refuted was in quotes in the title, but half the comments were as if the article were making the claim, clearly indicating that people didn't read it (and don't understand how quote marks work).

steveklabnik•1mo ago

Yes, this is an age old problem, for sure.

It's a good thing to keep in mind when you read the comments on any article.

aw1621107•1mo ago

Assuming I'm thinking of the same submission as you, the quotes were not present in the original submission [0].

[0]: https://news.ycombinator.com/item?id=46525937

jibal•1mo ago

They were (are) in the article title, so that doesn't change the point that it proves that people didn't read it.

jesse__•1mo ago

From the other side of the table, I love performance comparisons, so I always read these things. I also enjoyed your commentary, thanks for writing it :)

steveklabnik•1mo ago

You’re welcome, glad you liked it.

nixpulvis•1mo ago

I just want to say, I always really appreciate your writing.

steveklabnik•1mo ago

Thank you! I’m gonna end up doing a lot more of it in 2026 than I did in 2025… stay tuned!