Why CUDA translation wont unlock AMD

https://eliovp.com/why-cuda-translation-wont-unlock-amds-real-potential/

32•JonChesterfield•1w ago

Comments

pixelpoet•49m ago

Actual article title says "won't"; wont is a word meaning habit or proclivity.

InvisGhost•28m ago

In situations like this, I try to focus on whether the other person understood what was being communicated rather than splitting hairs. In this case, I don't think anyone would be confused.

measurablefunc•47m ago

Why can't it be done w/ AI? Why does it need to be maintained w/ manual programming? Take the ROCm specification, take your CUDA codebase, let one of the agentic AIs translate it all into ROCm or the AMD equivalent.

jsheard•42m ago

The article is literally about how rote translation of CUDA code to AMD hardware will always give sub-par performance. Even if you wrangled an AI into doing the grunt work for you, porting heavily-NV-tuned code to not-NV-hardware would still be losing strategy.

measurablefunc•20m ago

The point of AI is that it is not a rote translation & 1:1 mapping.

cbarrick•41m ago

Has this been done successfully at scale?

There's a lot of handwaving in this "just use AI" approach. You have to figure out a way to guarantee correctness.

measurablefunc•19m ago

There are tons of test suites so if the tests pass then that provides a reasonable guarantee of correctness. Although it would be nice if there was also proof of correctness for the compilation from CUDA to AMD.

bigyabai•29m ago

Because it doesn't work like that. TFA is an explanation of how GPU architecture dictates the featureset that is feasibly attainable at runtime. Throwing more software at the problem would not enable direct competition with CUDA.

measurablefunc•17m ago

I am assuming that is all part of the specification that the agentic AI is working with & since AGI is right around the corner I think this is a simple enough problem that can be solved with AI.

Blackthorn•16m ago

I don't know why you're being downvoted because even if you're Not Even Wrong, that's exactly the sort of thing that has been endlessly presented by people trying to sell AI as something that AI will absolutely do for us.

measurablefunc•10m ago

Let's see who else manages to catch on to the real point I'm making.

j16sdiz•57s ago

The same as "Why just outsourcing it to <some country >"

AI aint magic.

You need more effort to manage, test and validate that.

outside1234•41m ago

Are the hyperscalers really using CUDA? This is what really matters. We know Google isn't. Are AWS and Azure for their hosting of OpenAI models et al?

bigyabai•24m ago

> We know Google isn't.

Google isn't internally, so far as we know. Google's hyperscaler products have long offered CUDA options, since the demand isn't limited to AI/tensor applications that cannibalize TPU's value prop: https://cloud.google.com/nvidia

lvl155•25m ago

Let’s just say what it is: devs are too constrained to jump ship right now. It’s a massive land grab and you are not going to spend time tinkering with CUDA alternatives when even a six-month delay can basically kill your company/organization. Google and Apple are two companies with enough resources to do it. Google isn’t because they’re keeping it proprietary to their cloud. Apple still have their heads stuck in sand barely capable of fixing Siri.

kj4ips•24m ago

I agree pretty strongly. A translation layer like this is making an intentional trade: Giving up performance and HW alignment for less lead time and effort to make a proper port.

martinald•8m ago

Perhaps I'm misunderstanding the market dynamics; but isn't AMDs real opp inference over research?

Training etc still happens on NVDA but inference is somewhat easy to do on vLLM et al with a true ROCm backend with little effort?

mandevil•5m ago

Yeah, ROCm focused code will always beat generic code compiled down. But this is a really difficult game to win.

For example, Deepseek R-1 released optimized for running on Nvidia HW, and needed some adaption to run as well on ROCm. This was for the exact same reasons that ROCm code will beat generic code compiled into ROCm, in the same way. Basically the Deepseek team, for their own purposes, created R-1 to fit Nvidia's way of doing things (because Nvidia is market-dominant) on their own. Once they released, someone like Elio or AMD would have to do the work of adapting the code to run best on ROCm.

For more established players who weren't out-of-left-field surprises like Deepseek, e.g. Meta's Llama series, mostly coordinate with AMD ahead of release day, but I suspect that AMD still has to pay for the engineering work themselves while Meta does the work to make it run on Nvidia themselves. This simple fact, that every researcher makes their stuff work on CUDA, but AMD or Elio has to do the work to move it over to get it to be as performant on AMD, is what keeps people in the CUDA universe.

Loose wire leads to blackout, contact with Francis Scott Key bridge

Verifying your Matrix devices is becoming mandatory

Researchers discover security vulnerability in WhatsApp

Europe is scaling back GDPR and relaxing AI laws

Workday to Acquire Pipedream

Building more with GPT-5.1-Codex-Max

Why CUDA translation wont unlock AMD

Meta Segment Anything Model 3

Jailbreaking AI Models to Phish Elderly Victims

How Slide Rules Work

Roblox Requires Age Checks for Communication, Ushering in New Safety Standard

Static Web Hosting on the Intel N150: FreeBSD, SmartOS, NetBSD, OpenBSD and Linu

Cognitive and mental health correlates of short-form video use

Larry Summers resigns from OpenAI board

Thunderbird adds native Microsoft Exchange email support

Racing karts on a Rust GPU kernel driver

The patent office is about to make bad patents untouchable

Gaming on Linux has never been more approachable

Launch HN: Mosaic (YC W25) – Agentic Video Editing

Vortex: An extensible, state of the art columnar file format

Linux Career Opportunities in 2025: Skills in High Demand

The Death of Arduino?

How to identify a prime number without a computer

Pozsar's Bretton Woods III: The Framework

A $1k AWS mistake

Branching with or Without PII: The Future of Environments

Tailscale Down

What Killed Perl?

Exploring the limits of large language models as quant traders

The Future of Programming (2013) [video]

Loose wire leads to blackout, contact with Francis Scott Key bridge

Verifying your Matrix devices is becoming mandatory

Researchers discover security vulnerability in WhatsApp

Europe is scaling back GDPR and relaxing AI laws

Workday to Acquire Pipedream

Building more with GPT-5.1-Codex-Max

Why CUDA translation wont unlock AMD

Meta Segment Anything Model 3

Jailbreaking AI Models to Phish Elderly Victims

How Slide Rules Work

Roblox Requires Age Checks for Communication, Ushering in New Safety Standard

Static Web Hosting on the Intel N150: FreeBSD, SmartOS, NetBSD, OpenBSD and Linu

Cognitive and mental health correlates of short-form video use

Larry Summers resigns from OpenAI board

Thunderbird adds native Microsoft Exchange email support

Racing karts on a Rust GPU kernel driver

The patent office is about to make bad patents untouchable

Gaming on Linux has never been more approachable

Launch HN: Mosaic (YC W25) – Agentic Video Editing

Vortex: An extensible, state of the art columnar file format

Linux Career Opportunities in 2025: Skills in High Demand

The Death of Arduino?

How to identify a prime number without a computer

Pozsar's Bretton Woods III: The Framework

A $1k AWS mistake

Branching with or Without PII: The Future of Environments

Tailscale Down

What Killed Perl?

Exploring the limits of large language models as quant traders

The Future of Programming (2013) [video]

Why CUDA translation wont unlock AMD

Comments