CUDA-oxide: Nvidia's official Rust to CUDA compiler

https://nvlabs.github.io/cuda-oxide/index.html

147•adamnemecek•2h ago

Comments

arpadav•1h ago

This is amazing.. ive been working with custom CUDA kernels and https://crates.io/crates/cudarc for a long time, and this honestly looks like it could be a near drop-in replacement.

im especially curious how build times would compare? Most Rust CUDA crates obv rely on calling CMake or nvcc, which can make compilation painfully slow. coincidentally, just last week i was profiling build times and found that tools like sccache can dramatically reduce rebuild times by caching artifacts - but you still end up paying for expensive custom nvcc invocations (e.g. candle by hugging face calls custom nvcc command in their kernel compilation): https://arpadvoros.com/posts/2026/05/05/speeding-up-rust-whi...

jauntywundrkind•16m ago

Do other people agree cuda-oxide looks like a near dorp in replacement for cudarc?

That would be amazing, but probably not imo complementarily so.

I am curious what distinguished cuda-oxide. Beyond it being totally under nv control.

arpadav•8m ago

perhaps not drop-in, but all my workflows with cudarc have always been "i make cuda kernel, i use cudarc for ffi to said kernels, i call via rust" - which for this case is pretty analogous

briefly looking at the repo, looks like the main workflow is using rustc-codegen-cuda to convert rust -> MIR -> pliron IR -> LLVM IR -> PTX, which is embedded in the host binary, where then cuda-core loads embedded PTX at runtime onto the GPU

but, if you arent directly making cuda kernels and just want cudarc for either calling existing kernels or other cuda driver api access then cudarc is lighter-weight option? or just use one of the sub-crates in this repo like cuda-core for those apis

the__alchemist•7m ago

I am observing the same from the article... is it heavily inspired by Cudarc, i.e. is this intentional, or are we reading too much into this, given Cudarc is a light abstraction over the CUDA api?

the__alchemist•11m ago

Cudarc slaps!

> Most Rust CUDA crates obv rely on calling CMake or nvcc, which can make compilation painfully slow.

I anecdotally haven't hit this; see the `cuda_setup` crate I made to handle the build scripts; it is a simple `build.rs` which only recompiles if the file changes, and it's a tiny compile time (compared to the rust CPU-side code)

arpadav•7m ago

i'll have to check this out, thanks!

rvz•53m ago

This is a bit good for Rust if you want to use the language with CUDA. The problem is, it still doesn't really move the needle if you really don't like running closed source drivers and runtime binaries and care about open source.

Continuing from this discussion [0], this only makes it a Rust or a CUDA problem rather than a Python, CUDA and a PyTorch one if there bug in one of them.

Yet at the end of the day, it still uses Nvidia's closed source CUDA compiler 'nvcc' which they will never open source. A least Mojo promises to open source their own compiler which compiles to different accelerators with multiple backend support.

Unlike this...but uses Rust.

[0] https://news.ycombinator.com/item?id=48067228

bigyabai•46m ago

> it still doesn't really move the needle if you really don't like running closed source drivers and runtime binaries

Those people probably did not buy an Nvidia GPU for themselves. It should be common knowledge that the "Open" Nvidia drivers still run gigantic firmware blobs to dispatch complex workloads. And Nouveau is close to useless for GPGPU compute.

pjmlp•40m ago

Mojo remains to be seen if it isn't another Swift for Tensorflow, apparently 1.0 won't even support Windows properly.

semiinfinitely•38m ago

who the fuck uses windows

bigyabai•32m ago

The majority of computer owners on planet Earth

OtomotO•28m ago

But also the majority of programmers?

bigyabai•20m ago

In AI-focused fields like business analytics and data science, yeah.

beanjuiceII•15m ago

many people

the__alchemist•5m ago

IMO this has nothing to do with open source as an ideology; just a practical (and official?) lib for adding GPU interaction to your rust programs.

rowanG077•42m ago

Personally I really don't want new GPU languages that do not have AD as a first class citizen. I mean rust is an improvement over C++ CUDA but still.

TallGuyShort•40m ago

Sorry, what is AD in this context?

edit: oh, automatic differentiation?

huflungdung•38m ago

Active Directory

vimarsh6739•35m ago

Really hard to find alternatives to Julia for AD as a first class citizen

hellohello2•29m ago

I think the parent is mostly referring to solutions like Slang.D

erk__•29m ago

There is actually work on adding autodiff to Rust, maybe not really first class citizen, but at least build in: https://doc.rust-lang.org/std/autodiff/index.html (it is still at a pre-RFC stage so it is not something that soon will be added)

magnio•17m ago

Incredible, I have never heard of std::autodiff before. Isn't it rare for a programming language to provide AD within the standard library? Even Julia doesn't have it built-in, I wouldn't expect Rust out of all languages to experiment it in std.

mathisfun123•15m ago

every GPU related post has a comment which makes my eyes roll all the way back. this is the one for this post.

cyber_kinetist•38m ago

I'm quite interested in how they dealt with Rust's memory model, which might not neatly map to CUDA's semantics. Curious what the differences are compared to CUDA C++, and if the Rust's type system can actually bring more safety to CUDA (I do think writing GPU kernels is inherently unsafe, it's just too hard to create a safe language because of how the hardware works, and because of the fact that you're hyper-optimizing all the time)

the__alchemist•11m ago

I think it depends on the objective. My pattern-matching brain says there will be interest in addressing this.

From my perspective of someone who writes applications in Rust and sometimes wants to use GPU compute in these applications: I don't care. If we can leverage the memory model or ownership model in a low-friction way, that's fine. If it makes it a high friction experience, I would prefer not to do it that way.

The baseline is IMO how Cudarc currently does it. I don't think there is much memory management involved; it's just imperative syntax wrapping FFI, and some lines in the build script to invoke nvcc if the kernels change.

wrs•9m ago

This is explained in some detail in the docs. There is a safe layer, a mostly safe layer, and an unsafe layer. Some clunkiness is needed for safe-yet-parallel work that they couldn’t easily fit into the Rust Send/Sync model.

whatever1•37m ago

Why do we bother with programming languages today? Why not have the LLMs just write assembly code and skip the human readable part? We are not reviewing it anymore anyway.

vjsrinivas•35m ago

Is this a serious question or are you just trolling?

regenschutz•35m ago

I mean, AI is not good at writing x86-64 assembly code. Last time I tried (with both Claude and ChatGPT), the AI failed to even create basic programs other than Hello World.

bee_rider•34m ago

This is a Rust to CUDA converter so I guess it is for codes where the programmer wants it to function properly (Rust) and have good performance (CUDA).

It’s just a matter of different workflows for different users and application.

hellohello2•31m ago

I get what you mean but I think if anything AI pairs extremely well with strongly typed languages that are at times cumbersome for humans, but decrease the latency at which AI can get feedback on its code. In my (very) limited experience Rust is an excellent target for AI codegen.

strbean•30m ago

A lot of really good reasons:

1) Higher level code is easier for LLMs to review and iterate upon. The more the intent is clear from the code, the easier it is for humans and LLMs to work with.

2) LLMs get stuck or fail to solve a problem sometimes. It is preferable to have artifacts that humans can grok without the massive extra effort of parsing out assembly code.

3) Assembly code varies massively across targets. We want provable, deterministic transformation from the intent (specified in a higher level language) to the target assembly language. LLMs can't reliably output many artifacts for different platforms that behave the same.

4) Hopefully, we are still reviewing the code output by LLMs to some extent.

jcgrillo•17m ago

I'd add to that

1.5) Having a compiler in the loop that does things like enforcing type constraints (and in the case if Rust in particular, therefore memory safety guarantees) is really useful both for humans and LLMs.

OtomotO•23m ago

Because when this idiotic hypemachinery finally dies an agonising, painful death, some of us still want to work with computers

Almondsetat•10m ago

Feel free to post a project of yours where you gave a bunch of prompts to an LLM and it produced a working application written in assembly without you having to check for anything

adamnemecek•23m ago

Here's the repo link https://github.com/NVlabs/cuda-oxide

the__alchemist•14m ago

Hell yea! I have been doing it with Cudarc (Kernels) and FFI (cuFFT). Using manual [de]serialization between byte arrays and rust data structs. I hope this makes it lower friction!

tiffanyh•10m ago

Re: Rust (and "safe" programming languages).

Does anyone have more details on NVIDIAs use of Spark/Ada?

All I can find is what's listed below:

https://www.adacore.com/case-studies/nvidia-adoption-of-spar...

CUDA-oxide: Nvidia's official Rust to CUDA compiler

Nullsoft, 1997-2004 AOL kills off the last maverick tech company (2004)

Ratty – A terminal emulator with inline 3D graphics

Show HN: TikTok but for Scientific Papers

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

Gmail registration now requires scanning a QR code and sending a text message

Bild AI (YC W25) Is Hiring Founding Product Engineers

AMÁLIA and the future of European Portuguese LLMs

Venom and Hot Peppers Offer a Key to Killing Resistant Bacteria

Should you leave red herrings about yourself online?

Interfaze: A new model architecture built for high accuracy at scale

I'm going back to writing code by hand

Building a web server in aarch64 assembly to give my life (a lack of) meaning

Holding Community Space

Software engineering may no longer be a lifetime career

Running local models on an M4 with 24GB memory

The greatest shot in television: James Burke had one chance to nail this scene (2024)

Students Boo Commencement Speaker After She Calls AI Next Industrial Revolution

Guitar tuner that uses phone accelerometer

Hardware Attestation as Monopoly Enabler

Microsoft Israel chief leaves amid ethical controversy

An AI coding agent, used to write code, needs to reduce your maintenance costs

Why we lose our friends as we age (2023)

Obsidian plugin was abused to deploy a remote access trojan

Local AI needs to be the norm

Scaffold a 1990s Geocities-themed static website

Mythos Finds a Curl Vulnerability

Bliss (Photograph)

Classification of amino acids

A.I. note takers are making lawyers nervous

CUDA-oxide: Nvidia's official Rust to CUDA compiler

Nullsoft, 1997-2004 AOL kills off the last maverick tech company (2004)

Ratty – A terminal emulator with inline 3D graphics

Show HN: TikTok but for Scientific Papers

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

Gmail registration now requires scanning a QR code and sending a text message

Bild AI (YC W25) Is Hiring Founding Product Engineers

AMÁLIA and the future of European Portuguese LLMs

Venom and Hot Peppers Offer a Key to Killing Resistant Bacteria

Should you leave red herrings about yourself online?

Interfaze: A new model architecture built for high accuracy at scale

I'm going back to writing code by hand

Building a web server in aarch64 assembly to give my life (a lack of) meaning

Holding Community Space

Software engineering may no longer be a lifetime career

Running local models on an M4 with 24GB memory

The greatest shot in television: James Burke had one chance to nail this scene (2024)

Students Boo Commencement Speaker After She Calls AI Next Industrial Revolution

Guitar tuner that uses phone accelerometer

Hardware Attestation as Monopoly Enabler

Microsoft Israel chief leaves amid ethical controversy

An AI coding agent, used to write code, needs to reduce your maintenance costs

Why we lose our friends as we age (2023)

Obsidian plugin was abused to deploy a remote access trojan

Local AI needs to be the norm

Scaffold a 1990s Geocities-themed static website

Mythos Finds a Curl Vulnerability

Bliss (Photograph)

Classification of amino acids

A.I. note takers are making lawyers nervous

CUDA-oxide: Nvidia's official Rust to CUDA compiler

Comments