(I have spent a good amount of time hacking the llvm pass pipeline for my personal project so if there was a significant difference I probably would have seen it by now)
A nitpick is that benchmarking C/C++ with $MARCH_FLAG -mtune=native and math magic is kinda unfair for Zig/Julia (Nim seem to support those) - unless you are running Gentoo it's unlikely to be used for real applications.
In my opinion, the comparisons could be better if the file I/O and console printing were removed.
- run bytecode - very high level - GC memory
But not all have these traits. Not sure.
The one exception is sort of an exception that proves the rule: it's marked "C# (SIMD)", and looks like a native compiler and not a managed one.
Also, winners don’t make excuses.
(Not even being snarky. You have to spiritually accept that as a fact if you are in the PL perf game.)
I did the same sort of thing with the Seive of Eratosthenes once, on a smaller scale. My Haskell and Python implementations varied by almost a factor of 4 (although you could argue that I changed the algorithm too much on the fastest Python one). OK, yes, all the Haskell ones were faster than the fastest Python one, and the C one was another 4 times faster than the fastest Haskell one... but they were still over the place.
It's true this is a microbenchmark and not super informative about "Big Problems" (because nothing is). But it absolutely shows up code generation and interpretation performance in an interesting way.
Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.
The benchmark also includes startup time, file I/O, and console printing. There could have been a one-time startup cost somewhere that got removed.
The benchmark is not really testing the Leibniz loop performance for the very fast languages, it's testing startup, I/O, console printing, etc.
https://github.com/niklas-heer/speed-comparison/blob/master/...
https://github.com/niklas-heer/speed-comparison/blob/master/...
ᐅ time uv run -p cpython-3.14 leibniz.py
3.1415926525880504
________________________________________________________
Executed in 38.24 secs fish external
usr time 37.91 secs 158.00 micros 37.91 secs
sys time 0.16 secs 724.00 micros 0.16 secs
ᐅ time uv run -p pypy leibniz.py
3.1415926525880504
________________________________________________________
Executed in 1.52 secs fish external
usr time 1.16 secs 0.25 millis 1.16 secs
sys time 0.02 secs 1.29 millis 0.02 secs
It was a free 25x speedup.But this is a good benchmark results that demonstrate what performance level can you expect from every language when someone not versed in it does the code porting. Fair play
What do you think they could have done better assuming that the IO is a necessary part of the benchmark?
Also good job to the Rust devs for making the benchmark so much faster in nightly. I wonder what they did.
The differences among the really fast languages are probably in different startup times if I had to guess.
When you put these programs into Godbolt to see what's going on with them, so much of the code is just the I/O part that it's annoying to analyze
> Why do you also count reading a file and printing the output?
> Because I think this is a more realistic scenario to compare speeds.
Which is fine, but should be noted more prominently. The startup time and console printing obviously aren't relevant for something like the Python run, but at the top of the chart where runs are a fraction of a second it probably accounts for a lot of the differences.
Running the inner loop 100 times over would have made the other effects negligible. As written, trying to measure millisecond differences between entire programs isn't really useful unless someone has a highly specific use case where they're re-running a program for fractions of a second instead of using a long-running process.
Swift: 3.7
Python: that's incorrect!
Swift: yeah, but it's fast!
There is very little superfluous or that cannot be inferred by the compiler here: https://github.com/niklas-heer/speed-comparison/blob/master/...
forgotpwd16•4d ago
- C++ unsurpassable king.
- There's a stark jump of times going from ~200ms to ~900ms. (Rust v1.92.0 being an in-between outlier.)
- C# gets massive boost (990->225ms) when using SIMD.
- But C++ somehow gets slower when using SIMD.
- Zig very fast*!
- Rust got big boost (630ms->230ms) upgrading v1.92.0->1.94.0.
- Nim (that compiles to C then native via GCC) somehow faster than GCC-compiled C.
- Julia keeps proving high-level languages can be fast too**.
- Swift gets faster when using SIMD but loses much accuracy.
- Go fastest language with own compiler (ie not dependent to GCC/LLVM).
- V (also compiles to C) expected it (appearing similar) be close to Nim.
- Odin (LLVM) & Ada (GCC) surprisingly slow. (Was expecting them to be close to Zig/Fortran.)
- Crystal slowest LLVM-based language.
- Pure CPython unsurpassable turtle.
Curious how D's reference compiler (DMD) compares to the LLVM/GCC front-ends, how LFortran to gfortran, and QBE to GCC/LLVM. Also would like to see Scala Native (Scala currently being inside the 900~1000ms bunch).
* Note that uses `@setFloatMode(.Optimized)` which according to docs is equivalent to `--fast-math` but only D/Fortran use this flag (C/C++ do not).
** Uses `@fastmath` AND `@simd`. The comparison supposedly is for performance on idiomatic code and for Julia SIMD is a simple annotation applied to the loop (and Julia may even auto do it) but should still be noted because (as seen in C# example) it can be big.
mrsmrtss•4d ago
mrsmrtss•4d ago
neonsunset•4d ago
On M4 Max, Go takes 0.982s to run while C# (non-SIMD) and F# are ~0.51s. Changing it to be closer to Go makes the performance worse in a similar manner.
neonsunset•4d ago
C# is using CoreCLR/NativeAOT. Which does not use GCC or LLVM also. Its compiler is more capable than that of Go.
Aurornis•12m ago
For the sub-second compiled languages, it's basically a benchmark of startup times, not performance in the hot loop.