frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Programming language speed comparison using Leibniz formula for π

https://niklas-heer.github.io/speed-comparison/
33•PKop•4d ago

Comments

forgotpwd16•4d ago
Some seeings:

- C++ unsurpassable king.

- There's a stark jump of times going from ~200ms to ~900ms. (Rust v1.92.0 being an in-between outlier.)

- C# gets massive boost (990->225ms) when using SIMD.

- But C++ somehow gets slower when using SIMD.

- Zig very fast*!

- Rust got big boost (630ms->230ms) upgrading v1.92.0->1.94.0.

- Nim (that compiles to C then native via GCC) somehow faster than GCC-compiled C.

- Julia keeps proving high-level languages can be fast too**.

- Swift gets faster when using SIMD but loses much accuracy.

- Go fastest language with own compiler (ie not dependent to GCC/LLVM).

- V (also compiles to C) expected it (appearing similar) be close to Nim.

- Odin (LLVM) & Ada (GCC) surprisingly slow. (Was expecting them to be close to Zig/Fortran.)

- Crystal slowest LLVM-based language.

- Pure CPython unsurpassable turtle.

Curious how D's reference compiler (DMD) compares to the LLVM/GCC front-ends, how LFortran to gfortran, and QBE to GCC/LLVM. Also would like to see Scala Native (Scala currently being inside the 900~1000ms bunch).

* Note that uses `@setFloatMode(.Optimized)` which according to docs is equivalent to `--fast-math` but only D/Fortran use this flag (C/C++ do not).

** Uses `@fastmath` AND `@simd`. The comparison supposedly is for performance on idiomatic code and for Julia SIMD is a simple annotation applied to the loop (and Julia may even auto do it) but should still be noted because (as seen in C# example) it can be big.

mrsmrtss•4d ago
Looking closer at the benchmarks, it seems that C# benchmark is not using AOT, so Go and even Java GraalVM get here an unfair advantage (when looking at the non SIMD versions). There is a non trivial startup time for JIT.
mrsmrtss•4d ago
Sorry, I can't seem to edit my answer anymore, but I was mistaken, C# version is using AOT. But the are other significant differences here:

  > var rounds = int.Parse(File.ReadAllText("rounds.txt"));

  > var pi = 1.0D;
  > var x = 1.0D;

  > for (var i = 2; i < rounds + 2; i++) {
  >     x = -x;
  >     pi += x / (2 \* i - 1);
  > }

  > pi \*= 4;
  > Console.WriteLine(pi);
For example, if we change the type of 'rounds' variable here from int to double (like it is also in Go version), the code runs significantly faster on my machine.
neonsunset•4d ago
Try that on ARM64 and the result will be the opposite :)

On M4 Max, Go takes 0.982s to run while C# (non-SIMD) and F# are ~0.51s. Changing it to be closer to Go makes the performance worse in a similar manner.

neonsunset•4d ago
> Go fastest language with own compiler (ie not dependent to GCC/LLVM).

C# is using CoreCLR/NativeAOT. Which does not use GCC or LLVM also. Its compiler is more capable than that of Go.

Aurornis•12m ago
Reading the repo, the benchmark includes the entire program execution from startup to reading the file.

For the sub-second compiled languages, it's basically a benchmark of startup times, not performance in the hot loop.

theanonymousone•4d ago
Is there a explanation for why C is slower than C++?
mutkach•1h ago
Probably LLVM runs different sets of optimization passes for C and C++. Need to look at the IR, or assembly to know exactly what happens.
pizlonator•17m ago
It doesn’t as far as I know.

(I have spent a good amount of time hacking the llvm pass pipeline for my personal project so if there was a significant difference I probably would have seen it by now)

AlotOfReading•58m ago
It's probably down to the measurement noise of benchmarking on GitHub actions.
drob518•39m ago
I suspect this is it. Any benchmark that takes less than a second to run should have its iteration count increased such that it takes at least a second, and preferably 5+ seconds, to run. Otherwise CPU scheduling, network processing, etc. is perturbing everything.
tliltocatl•58m ago
The code looks 100% identical except for the namespace prefixes. Must be something particular about github setup, because on mine (gcc15.2.1/clang20.1.8/Ryzen5600X) the run time is indistinguishably close. Interestingly, with default flags but -O3 clang is 30% slower, with flags from the script (-s -static -flto $MARCH_FLAG -mtune=native -fomit-frame-pointer -fno-signed-zeros -fno-trapping-math -fassociative-math) clang is a bit faster.

A nitpick is that benchmarking C/C++ with $MARCH_FLAG -mtune=native and math magic is kinda unfair for Zig/Julia (Nim seem to support those) - unless you are running Gentoo it's unlikely to be used for real applications.

AlotOfReading•40m ago
The actual assembly generated for the hot loop is identical in both C and C++ on Clang, as you'd expect. It's also identical at the IR level.
nmaludy•1h ago
If i'm understanding the repository correctly, it looks like each language reads from a file, does some I/O printing to console, then computes the value, then some more console printing and exits.

In my opinion, the comparisons could be better if the file I/O and console printing were removed.

gavinray•35m ago
I'm fairly sure I can speed the JVM implementations up a significant amount by MMAP'ing the file into memory and ensuring it's aligned.
Twirrim•8m ago
I'm not sure why the contents of rounds.txt isn't just provided as some kind of argument instead of read in from a file. Given all the other boilerplate involved, I would have expected it to be trivial to add relevant templating.
arohner•1h ago
The Clojure version is not AOT'd, so it's measuring startup + compiler time. When properly compiled it should be comparable to the Java implementation.
klaff•1h ago
I think I get why C++ thru C are all similar (all compile to similar assembly?), but I don't get why Go thru maybe Racket are all in what looks like a pretty narrow clump. Is there a common element there?
f1shy•1h ago
Some features some of those languages have:

- run bytecode - very high level - GC memory

But not all have these traits. Not sure.

ajross•50m ago
I think it's SIMD generation. Managed runtimes have a much harder time autovectorizing, because you can't do any static analysis about things like array sizes. Note that the true low-level tools are all clustered around 2-300ms, and that the next level up are all the "managed" runtimes around 1-2s.

The one exception is sort of an exception that proves the rule: it's marked "C# (SIMD)", and looks like a native compiler and not a managed one.

Someone•43m ago
They’re measuring program execution time, including program startup and tear down. Languages with a more complex runtime take longer for the startup, and all seem to have roughly equally optimized that.
andrepd•1h ago
This is meaningless. The benchmarks are (1) run in github actions, (2) include file and console IO, and (3) are compiled with different compiler flags...
tliltocatl•52m ago
It is meaningful as an indication of a realistic developer setup rather than a fine-tuned setup you'll only see in a HPC context.
pizlonator•18m ago
Exactly.

Also, winners don’t make excuses.

(Not even being snarky. You have to spiritually accept that as a fact if you are in the PL perf game.)

Hizonner•1h ago
This sort of thing is pretty meaningless unless the code is all written by people who know how to get performance out of their languages (and they're allowed to do so). Did you use the right representation of the data? Did you use the right library? Did you use the right optimization options? Did you choose the fast compiler or the slow one? Did you know there was a faster or slower one? If you're using fancy stuff, did you use it right?

I did the same sort of thing with the Seive of Eratosthenes once, on a smaller scale. My Haskell and Python implementations varied by almost a factor of 4 (although you could argue that I changed the algorithm too much on the fastest Python one). OK, yes, all the Haskell ones were faster than the fastest Python one, and the C one was another 4 times faster than the fastest Haskell one... but they were still over the place.

ajross•54m ago
It's an extremely simple algorithm, just one loop with an iterated expression inside it. You can check the source code at: https://github.com/niklas-heer/speed-comparison/tree/master/...

It's true this is a microbenchmark and not super informative about "Big Problems" (because nothing is). But it absolutely shows up code generation and interpretation performance in an interesting way.

Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.

Aurornis•8m ago
> Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.

The benchmark also includes startup time, file I/O, and console printing. There could have been a one-time startup cost somewhere that got removed.

The benchmark is not really testing the Leibniz loop performance for the very fast languages, it's testing startup, I/O, console printing, etc.

pjscott•5m ago
The delta there is because the Rust 1.92 version uses the straightforward iterative code and the 1.94-nightly version explicitly uses std::simd vectorization. Compare the source code:

https://github.com/niklas-heer/speed-comparison/blob/master/...

https://github.com/niklas-heer/speed-comparison/blob/master/...

drob518•50m ago
Startup time doesn’t seem to be factored in correctly, so any language that uses a bytecode (e.g. Java) or is compiling from source (e.g. Ruby, Python, etc.) will look poor on this. If the kids of applications that you write are ones that exit after a fraction of a second, then sure, this will tell you something. But if you’re writing server apps that run for days/weeks/months, then this is useless.
vhdd•26m ago
Python took 86 seconds, if I'm reading it correctly. I can see your point holding for a language like Java, but most of Python's time spent cannot have been startup time, but actual execution time.
kstrauser•50m ago
How I love pypy for certain tasks. On my laptop:

  ᐅ time uv run -p cpython-3.14 leibniz.py
  3.1415926525880504
  
  ________________________________________________________
  Executed in   38.24 secs    fish           external
     usr time   37.91 secs  158.00 micros   37.91 secs
     sys time    0.16 secs  724.00 micros    0.16 secs
  
  ᐅ time uv run -p pypy leibniz.py
  3.1415926525880504
  
  ________________________________________________________
  Executed in    1.52 secs    fish           external
     usr time    1.16 secs    0.25 millis    1.16 secs
     sys time    0.02 secs    1.29 millis    0.02 secs
It was a free 25x speedup.
kiriberty•47m ago
And the winner is (Drumroll)... Python - the most popular language in the AI world
empiricus•8m ago
Well, python for AI is just the syntactic sugar to call pytorch cuda code on the gpu.
viktorcode•30m ago
After seeing Swift's result I had to look into the source to confirm that yes, it was not written by someone who knows the language.

But this is a good benchmark results that demonstrate what performance level can you expect from every language when someone not versed in it does the code porting. Fair play

pizlonator•22m ago
What do you think they could have done better in the Swift code?
viktorcode•53s ago
Using overflow operators instead of the ones that check for that each iteration.
xnacly•23m ago
the rust example is so far off being useful and file io seems completly dumb in this context
pizlonator•20m ago
Real programs have to do IO and the C and C++ code runs faster while also doing IO.

What do you think they could have done better assuming that the IO is a necessary part of the benchmark?

Also good job to the Rust devs for making the benchmark so much faster in nightly. I wonder what they did.

Aurornis•10m ago
The file I/O is probably irrelevant, but the startup time is not.

The differences among the really fast languages are probably in different startup times if I had to guess.

henning•19m ago
Some implementations seem vectorization-friendly like the C one that uses a bit-twiddling trick to avoid the `x = -x` line that the Odin implementation and others have.

When you put these programs into Godbolt to see what's going on with them, so much of the code is just the I/O part that it's annoying to analyze

Qem•18m ago
It appears Raku runtime improve a lot. It used to end last in comparisons like that, by a large margin, and now is surpassing Perl and CPython.
systems•14m ago
why is ocaml so low, didnt expect this
Aurornis•13m ago
Reading the fine print, the benchmark is not just the Leibniz formula like it says in the chart title. It also includes file I/O, startup time, and console printing:

> Why do you also count reading a file and printing the output?

> Because I think this is a more realistic scenario to compare speeds.

Which is fine, but should be noted more prominently. The startup time and console printing obviously aren't relevant for something like the Python run, but at the top of the chart where runs are a fraction of a second it probably accounts for a lot of the differences.

Running the inner loop 100 times over would have made the other effects negligible. As written, trying to measure millisecond differences between entire programs isn't really useful unless someone has a highly specific use case where they're re-running a program for fractions of a second instead of using a long-running process.

dvh•11m ago
Python: how much is pi?

Swift: 3.7

Python: that's incorrect!

Swift: yeah, but it's fast!

sph•11m ago
C# wins hands down in the performance / lines of code metric.

There is very little superfluous or that cannot be inferred by the compiler here: https://github.com/niklas-heer/speed-comparison/blob/master/...

amelius•4m ago
That first big jump in the graph, I thought that it must be the divide between auto-gc'd and non auto-gc'd languages. But then I noticed that Rust is on the wrong side of the divide.

Garage – An S3 object store so reliable you can run it outside datacenters

https://garagehq.deuxfleurs.fr/
232•ibobev•3h ago•36 comments

TP-Link Tapo C200: Hardcoded Keys, Buffer Overflows and Privacy

https://www.evilsocket.net/2025/12/18/TP-Link-Tapo-C200-Hardcoded-Keys-Buffer-Overflows-and-Priva...
29•sibellavia•52m ago•0 comments

GotaTun -- Mullvad's WireGuard Implementation in Rust

https://mullvad.net/en/blog/announcing-gotatun-the-future-of-wireguard-at-mullvad-vpn
417•km•7h ago•97 comments

Amazon will allow ePub and PDF downloads for DRM-free eBooks

https://www.kdpcommunity.com/s/article/New-eBook-Download-Options-for-Readers-Coming-in-2026?lang...
382•captn3m0•9h ago•207 comments

Reverse Engineering Major US Airline's PNR System and Accessing All Reservations

https://alexschapiro.com/security/vulnerability/2025/11/20/avelo-airline-reservation-api-vulnerab...
17•bearsyankees•56m ago•1 comments

Cursor Acquires Graphite

https://graphite.com/blog/graphite-joins-cursor
150•timvdalen•3h ago•91 comments

The FreeBSD Foundation's Laptop Support and Usability Project

https://github.com/FreeBSDFoundation/proj-laptop
92•mikece•4h ago•33 comments

Believe the Checkbook

https://robertgreiner.com/believe-the-checkbook/
52•rg81•3h ago•19 comments

Where Is GPT in the Chomsky Hierarchy?

https://fi-le.net/chomsky/
17•fi-le•4d ago•7 comments

Prepare for That Stupid World

https://ploum.net/2025-12-19-prepare-for-that-world.html
82•speckx•2h ago•45 comments

Show HN: I Made Loom for Mobile

https://demoscope.app
25•admtal•2h ago•22 comments

Texas is suing all of the big TV makers for spying on what you watch

https://www.theverge.com/news/845400/texas-tv-makers-lawsuit-samsung-sony-lg-hisense-tcl-spying
1149•tortilla•2d ago•571 comments

Show HN: Stepped Actions – distributed workflow orchestration for Rails

https://github.com/envirobly/stepped
68•klevo•5d ago•9 comments

We pwned X, Vercel, Cursor, and Discord through a supply-chain attack

https://gist.github.com/hackermondev/5e2cdc32849405fff6b46957747a2d28
1055•hackermondev•1d ago•387 comments

1.5 TB of VRAM on Mac Studio – RDMA over Thunderbolt 5

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5
552•rbanffy•20h ago•202 comments

Does my key fob have more computing power than the Lunar lander?

https://www.buzzsprout.com/2469780/episodes/18340142-17-does-my-key-fob-have-more-computing-power...
34•jammcq•5d ago•29 comments

Getting bitten by Intel's poor naming schemes

https://lorendb.dev/posts/getting-bitten-by-poor-naming-schemes/
241•LorenDB•13h ago•130 comments

TikTok Deal Is the Shittiest Possible Outcome, Making Everything Worse

https://www.techdirt.com/2025/12/19/tiktok-deal-done-and-its-somehow-the-shittiest-possible-outco...
227•lateforwork•2h ago•193 comments

Building a Transparent Keyserver

https://words.filippo.io/keyserver-tlog/
37•noident•4h ago•13 comments

Noclip.website – A digital museum of video game levels

https://noclip.website/
402•ivmoreau•16h ago•52 comments

Show HN: Linggen – A local-first memory layer for your AI (Cursor, Zed, Claude)

https://github.com/linggen/linggen
3•linggen•1h ago•1 comments

AMD Ryzen 7 5800X3D sells for more than 9800X3D, enthusiasts flock to AM4 DDR4

https://www.tomshardware.com/pc-components/cpus/amds-legacy-ryzen-7-5800x3d-chips-now-sell-for-up...
29•walterbell•1h ago•24 comments

How to think about durable execution

https://hatchet.run/blog/durable-execution
80•abelanger•1w ago•28 comments

History LLMs: Models trained exclusively on pre-1913 texts

https://github.com/DGoettlich/history-llms
680•iamwil•20h ago•331 comments

Beginning January 2026, all ACM publications will be made open access

https://dl.acm.org/openaccess
1932•Kerrick•1d ago•232 comments

From Zero to QED: An informal introduction to formality with Lean 4

https://sdiehl.github.io/zero-to-qed/01_introduction.html
126•rwosync•5d ago•16 comments

GPT-5.2-Codex

https://openai.com/index/introducing-gpt-5-2-codex/
559•meetpateltech•1d ago•300 comments

Designing a Passive Lidar Detector Device

https://www.atredis.com/blog/2025/11/20/designing-a-passive-lidar-detection-sensor
62•speckx•3d ago•5 comments

How China built its ‘Manhattan Project’ to rival the West in AI chips

https://www.japantimes.co.jp/business/2025/12/18/tech/china-west-ai-chips/
440•artninja1988•1d ago•535 comments

Prompt caching for cheaper LLM tokens

https://ngrok.com/blog/prompt-caching/
237•samwho•3d ago•56 comments