Using the TPDE codegen back end in LLVM ORC

https://weliveindetail.github.io/blog/post/2025/09/30/tpde-in-llvm-orc.html

24•weliveindetail•4mo ago

Comments

jasonjmcghee•4mo ago

It's really exciting to continue to see these developments.

Would love to hear your perspective on cranelift too.

aengelke•4mo ago

TPDE co-author here. Nice work, this was easier than expected; so we'll have better upstream ORC support soon [1].

The benchmark is suboptimal in multiple ways:

- Multi-threading makes things just slower. When enabling multi-threading, LLJIT clones every module into a new context before compilation, which is much more expensive than compilation. There's also no way to disable this. This causes a ~1.5x (LLVM)/~6.5x (TPDE) slowdown (very rough measurement on my laptop).

- The benchmark compares against the optimizing LLVM back-end, not the unoptimizing back-end (which would be a fairer comparison) (Code: JTMB.setCodeGenOptLevel(CodeGenOptLevel::None);). Additionally, enabling FastISel helps (command line -fast-isel; setting the TargetOption EnableFastISel seems to have no effect). This gives LLVM a 1.6x speedup.

- The benchmark is not really representative, as it causes FastISel fallbacks to SelectionDAG in some very large basic blocks -- i24 occurs rather rarely in real-world code. This is the reason why the speedup from the unoptimizing LLVM back-end is so low. Replacing i24 with i16 gives LLVM another 2.2x speedup. (Hint: to get information on FastISel fallbacks, enable FastISel and pass the command line options "-fast-isel-report-on-fallback -pass-remarks-missed=sdagisel" to LLVM. This is really valuable when optimizing for compile times.)

So we get ~140ms (TPDE), ~730ms (LLVM -O0), or 5.2x improvement. This is nowhere near the 10-20x speedup that TPDE typically achieves. Why? The new bottleneck is JITLink, which is featureful but slow -- profiling indicates that it consumes ~55% of the TPDE "compile time" (so the net compile time speedup is ~10x). TPDE therefore ships its own JIT mapper, which has fewer features but is much faster.

LLVM is really powerful, and despite being not particularly fast, the JIT API makes it extremely difficult to make it not extra-slow, even for LLVM experts.

[1]: https://github.com/tpde2/tpde/commit/29bcf1841c572fcdc75dd61...

weliveindetail•4mo ago

Please note that the post didn't mention the word benchmark a single time ;) It does a "basic performance measurement" of "our csmith example". Anyway, thanks for your notes, they are very welcome and valid.

Comparing TPDE against the default optimization level in ORC is not fair (because that is -O2 indeed), but that's what we get off-the-shelf. I tested the explicit FastISel setting and it didn't help on the LLVM side, as you said. I didn't try the command-line option though, thanks for the tip! (Especially the -pass-remarks-missed will be useful.)

And yeah, csmith doesn't really generate representative code, but again that was not stated either. I didn't dive into JITLink as it would be a whole post on its own, but yes feature-completeness prevailed over performance here as well -- seems characteristic for LLVM and isn't soo surprising :)

Last but not least, yes multi-threading isn't working as good as the post indicates. This seems related to the fix that JuliaLang did for the TaskDispatcher [1]. I will correct this in the post and see which other points can be addressed in the repo.

Looking forward for your OrcCompileLayer in TPDE!

[1] https://github.com/JuliaLang/julia/pull/58950

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

Cycling in France

Ask HN: What breaks in cross-border healthcare coordination?

Show HN: Simple – a bytecode VM and language stack I built with AI

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

My Eighth Year as a Bootstrapped Founde

Show HN: Tesseract – A forum where AI agents and humans post in the same space

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

OpenAI is Broke ... and so is everyone else [video][10M]

We interfaced single-threaded C++ with multi-threaded Rust

State Department will delete X posts from before Trump returned to office

AI Skills Marketplace

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

eInk UI Components in CSS

Discuss – Do AI agents deserve all the hype they are getting?

ChatGPT is changing how we ask stupid questions

Zig Package Manager Enhancements

Neutron Scans Reveal Hidden Water in Martian Meteorite

Deepfaking Orson Welles's Mangled Masterpiece

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon