AMD GPU Debugger

https://thegeeko.me/blog/amd-gpu-debugging/

279•ibobev•2mo ago

Comments

snarfy•2mo ago

Is there not an official tool from AMD?

c2h5oh•2mo ago

GDB supports it https://sourceware.org/gdb/current/onlinedocs/gdb.html/AMD-G...

You also get UMR from AMD https://gitlab.freedesktop.org/tomstdenis/umr

There is also a bunch of other tools provided: https://gpuopen.com/radeon-gpu-detective/ https://gpuopen.com/news/introducing-radeon-developer-tool-s...

slavik81•2mo ago

It's worth noting that upstream gdb (and clang) are somewhat limited in GPU debugging support because they only use (and emit) standardized DWARF debug information. The DWARF standard will need updates before gdb and clang can reach parity with the AMD forks, rocgdb and amdclang, in terms of debugging support. It's nothing fundamental, but the AMD forks use experimental DWARF features and the upstream projects do not.

It's a little out of date now, but Lance Six had a presentation about the state of AMD GPU debugging in upstream gdb at FOSDEM 2024. https://archive.fosdem.org/2024/events/attachments/fosdem-20...

core-explorer•1mo ago

The extensions were voted into the upcoming DWARF 6 standard, e.g.https://dwarfstd.org/issues/211206.1.html

thegeeko•2mo ago

amd gdb is an actual debugger but it only works with applications that emit dwarf and use the amdkfd KMD aka it doesn't work with graphics .. all of the rest are not a actual debuggers .. UMR does support wave stepping but it doesn't try to be a shader debugger rather a tool for drivers developers and the AMD tools doesn't have any debugging capabilities.

almostgotcaught•2mo ago

> After searching for solutions, I came across rocgdb, a debugger for AMD’s ROCm environment.

It's like the 3rd sentence in the blog post.......

djmips•2mo ago

to be fair it wasn't clear that was an official AMD debugger and besides that's only for debugging ROCm applications.

almostgotcaught•2mo ago

this sentence doesn't make any sense a) ROCm is an AMD product b) ROCm "applications" are GPU "applications".

fc417fc802•2mo ago

But not all GPU applications are ROCm applications (I would think).

I can certainly understand OP's confusion. Navigating parts of the GPU ecosystem that are new to you can be incredibly confusing.

thegeeko•2mo ago

there's 2 AMD KMD(kernel mode drivers) in linux: amdkfd and amdgpu .. the graphics applications use the amdgpu which is not supported by amdgdb .. amdgdb also has the limitation of requiring dwarf and and mesa/amd UMDs doesn't generate that ..

dman•1mo ago

Do you know which one rocm uses?

thegeeko•1mo ago

amdkfd

dman•1mo ago

Thank you!

shetaye•2mo ago

There also exists cuda-gdb[1], a first-party GDB for NVIDIA's CUDA. I've found it to be pretty good. Since CUDA uses a threading model, it works well with the GDB thread ergonomics (though you can only single-step at the warp granularity IIRC by the nature of SM execution).

[1] https://docs.nvidia.com/cuda/cuda-gdb/index.html

danjl•2mo ago

For NVIDIA cards, you can use NSight. There's also RenderDoc that works on a large number of GPUs.

_zoltan_•2mo ago

nsys and nvtx are awesome.

many don't know but you can use them without GPUs :)

KeplerBoy•1mo ago

That's true. It's pretty good for all kinds profiling. I especially like the python GIL profiler of nsys.

ahartmetz•1mo ago

RenderDoc is very cool, but more of a high level debugger, I guess? It's also good to analyze performance issues, e.g. when working with QML and QSG_VISUALIZE=overdraw / batches (both very high level) don't cut it anymore, or to get a different perspective. Watching a scene getting drawn API call by API call is fun.

justsid•1mo ago

RenderDoc is mostly a frame debugger, although it does support stepping through shaders as well which can be super useful. But for real performance analysis I would use PIX if you target D3D12 or RGP and Nsight for Vulkan. I'm at a Vulkan and Metal only shop and I wish I could use PIX for my every day work, since it also has excellent support for Intel GPUs.

whalesalad•2mo ago

Tangent: is anyone using a 7900 XTX for local inference/diffusion? I finally installed Linux on my gaming pc, and about 95% of the time it is just sitting off collecting dust. I would love to put this card to work in some capacity.

qskousen•2mo ago

I've done it with a 6800XT, which should be similar. It's a little trickier than with an Nvidia card (because everything is designed for CUDA) but doable.

FuriouslyAdrift•2mo ago

You'd be much better off wiht any decent nVidia against the 7900 series.

AMD doesn't have a unified architecture across GPU and compute like nVidia.

AMD compute cards are sold under the Insinct line and are vastly more powerfull than their GPUs.

Supposedly, they are moving back to a unified architecture in the next generation of GPU cards.

shmerl•2mo ago

tinygrad disagrees.

aystatic•2mo ago

name 3 things using tinygrad that's not openpilot

Joona•2mo ago

I tested some image and text generation models, and generally things just worked after replacing the default torch libraries with AMD's rocm variants.

universa1•2mo ago

try it with ramalama[1]. worked fine here with a 7840u and a 6900xt.

[1] https://ramalama.ai/

Gracana•2mo ago

I bought one when they were pretty new and I had issues with rocm (iirc I was getting kernel oopses due to GPU OOMs) when running LLMs. It worked mostly fine with ComfyUI unless I tried to do especially esoteric stuff. From what I've heard lately though, it should work just fine.

jjmarr•2mo ago

I've been using it for a few years on Gentoo. There were challenges with Python 2 years ago, but over the past year it's stabilized and I can even do img2video which is the most difficult local inference task so far.

Performance-wise, the 7900 xtx is still the most cost effective way of getting 24 gigabytes that isn't a sketchy VRAM mod. And VRAM is the main performance barrier since any LLM is going to barely fit in memory.

Highly suggest checking out TheRock. There's been a big rearchitecting of ROCm to improve the UX/quality.

androiddrew•2mo ago

Bought a Radeon r9700. 32GB vram and it does a good job.

veddan•2mo ago

For LLMs, I just pulled the latest llama.cpp and built it. Haven't had any issues with it. This was quite recently though, things used be a lot worse as I understand it.

bialpio•1mo ago

I've only played with using 7900XT for locally hosting LLMs via ollama (this is on Windows, mind you) and things worked fine - e.g. devstral:24b was decently fast. I haven't had time to use it for anything even semi-serious though so cannot comment on how useful it actually is.

mitchellh•2mo ago

Non-AMD, but Metal actually has a [relatively] excellent debugger and general dev tooling. It's why I prefer to do all my GPU work Metal-first and then adapt/port to other systems after that: https://developer.apple.com/documentation/Xcode/Metal-debugg...

I'm not like a AAA game developer or anything so I don't know how it holds up in intense 3D environments, but for my use cases it's been absolutely amazing. To the point where I recommend people who are dabbling in GPU work grab a Mac (Apple Silicon often required) since it's such a better learning and experimentation environment.

I'm sure it's linked somewhere there but in addition to traditionally debugging, you can actually emit formatted log strings from your shaders and they show up interleaved with your app logs. Absolutely bonkers.

The app I develop is GPU-powered on both Metal and OpenGL systems and I haven't been able to find anything that comes near the quality of Metal's tooling in the OpenGL world. A lot of stuff people claim is equivalent but for someone who has actively used both, I strongly feel it doesn't hold a candle to what Apple has done.

mattbee•2mo ago

My initiation into shaders was porting some graphics code from OpenGL on Windows to PS5 and Xbox, and (for your NDA and devkit fees) they give you some very nice debuggers on both platforms.

But yes, when you're stumbling around a black screen, tooling is everything. Porting bits of shader code between syntaxes is the easy bit.

Can you get better tooling on Windows if you stick to DirectX rather than OpenGL?

mitchellh•2mo ago

> Can you get better tooling on Windows if you stick to DirectX rather than OpenGL?

My app doesn't currently support Windows. My plane was to use the full DirectX suite when I get there and go straight to D3D and friends. I lack experience at all on Windows so I'd love if someone who knows both macOS and Windows could compare GPU debugging!

speps•2mo ago

Windows has PIX for Windows, PIX is the name of the GPU debugging since Xbox 360. The Windows version is similar but it relies on debug layers that need to be GPU specific which is usually handled automatically. Although because of that it’s not as deep as the console version but it lets you get by. Most people use RenderDoc on supported platforms though (Linux and Windows). It supports most APIs you can find on these platforms.

pjmlp•2mo ago

Pix predates the XBox.

pjmlp•2mo ago

Yes, Pix.

https://devblogs.microsoft.com/pix/

This is yet another problem with Khronos APIs, expecting each vendor comes up with a debugger, some do, some don't.

At least nowadays there is RenderDoc.

On the Web after a decade, it is still pixel debugging, or trying to reproduce the bug on a native APIs, because why bother with such devtools.

slightlygrilled•1mo ago

On web there is SpectorJS https://spector.babylonjs.com/

Which offers the basics, but at least works across devices, you can also trigger the traces from code and save the output, then load in the extension. Very useful for debugging mobile.

You can just about run chrome through Nvidias Nsight (of course you're not debugging webgl, but the what ever its translated to on the platform), although I recently tired again and it seems to fail...

these where the command line args i got nsight to pass chrome to make it work

" --disable-gpu-sandbox --disable-gpu-watchdog --enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming --no-sandbox --disable-direct-composition --use-angle=vulkan <URL> "

but yeah really really wish the tooling was better, especially on performance tracing, currently it's just disable and enable things and guess...

pjmlp•1mo ago

SpectorJS is kind of abandoned nowadays, it hardly has changed and doesn't support WebGPU.

Running the whole browser rendering stack is a masochist exercise, I rather re-code the algorithm in native code, or go back into pixel debugging.

I would vouch the state of bad tooling, and how browsers blacklist users systems, is a big reason studios rather try out streaming instead of rendering on the browser.

slightlygrilled•1mo ago

yeah... I tired to extend Spectors ui, the code base is "interesting" for simple changes seemed way harder than it should have been. Shame though as its really the only tool like it for web.

My favourite though is safari, graphics driver crashes all the time, the dev tools normally crash as well, so you have zero idea what is happening.

And I've found when the graphics crash the whole browsers graphic state become unreliable until you force close safari and reopen.

billti•2mo ago

It's a full featured and beautifully designed experience, and when it works it's amazing. However it regularly freezes of hangs for me, and I've lost count of the number of times I've had to 'force quit' Xcode or it's just outright crashed. Also, for anything non-trivial it often refuses to profile and I have to try to write a minimal repro to get it to capture anything.

I am writing compute shaders though, where one command buffer can run for seconds repeatedly processing over a 1GB buffer, and it seems the tools are heavily geared towards graphics work where the workload per frame is much lighter. (Will all the AI focus, hopefully they'll start addressing this use-case more).

mitchellh•2mo ago

> However it regularly freezes of hangs for me, and I've lost count of the number of times I've had to 'force quit' Xcode or it's just outright crashed.

This has been my experience too. It isn't often enough to diminish its value for me since I have basically no comparable options on other platforms, but it definitely has some sharp (crashy!) edges.

billti•2mo ago

I didn't even notice who I was replying to at first - so let me start by saying thank you for Ghostty. I spend a great deal of my day in it, and it's a beautifully put together piece of software. I appreciate the work you do and admire your attitude to software and life in general. Enjoy your windfall, ignore the haters, and my best wishes to you and your family with the upcoming addition.

The project I'm mostly working on uses the wgpu crate, https://github.com/gfx-rs/wgpu, which may be of interest if writing cross-platform GPU code. (Though obviously if using Rust, not Zig). With it my project easily runs on Windows (via DX12), Linux (via Vulkan), macOS (via Metal), and directly on the web via Wasm/WebGPU. It is a "lowest common denominator", but good enough for most use-cases.

That said, ever with simple shaders I had to implement some workarounds for Xcode issues (e.g. https://github.com/gfx-rs/wgpu/issues/8111). But still vastly preferable to other debugging approaches and has been indispensable in tracking down a few bugs.

hoppp•2mo ago

Is your code easy to transfer to other environments? The Apple vendor lock-in is not a great place for development if the end product runs on servers, unlike using AMD Gpus which can be found on the backend. Same goes for games because most gamers either have an AMD or an Nvidia graphics card as playing on Mac is still rare, so priority should be supporting those platforms

Its probably awesome to use Metal and everything but the vendor lock-in sounds like an issue.

mitchellh•2mo ago

It has been easy. All modern GPU APIs are basically the same now unless you're relying on the most cutting edge features. I've found that converting between MSL, OpenGL (4.3+), and WebGPU to be trivial. Also, LLMs are pretty good at it on first pass.

hoppp•2mo ago

Thats pretty cool then!

nelup20•2mo ago

Yeah, Xcode's Metal debugger is fantastic, and Metal itself is imo a really nice API :]. For whatever reason it clicked much better for me compared to OpenGL.

Have you tried RenderDoc for the OpenGL side? Afaik that's the equivalent of Xcode's debugger for Vulkan/OpenGL.

Archit3ch•2mo ago

Same, Metal is a clean and modern API.

Is anyone here doing Metal compute shaders on iPad? Any tips?

59nadir•1mo ago

> To the point where I recommend people who are dabbling in GPU work grab a Mac (Apple Silicon often required) since it's such a better learning and experimentation environment.

I don't know, buying a ridiculously overpriced computer with the least relevant OS on it just to debug graphics code written in an API not usable anywhere else doesn't seem like a good idea to me.

For anyone who seriously does want to get into this stuff, just go with Windows (or Linux if you're tired of what Microsoft is turning Windows into, you can still write Win32 applications and just use VK for your rendering, or even DX12 but have it be translated, but then you have to debug VK code while using DX12), learn DX12 or Vulkan, use RenderDoc to help you out. It's not nearly as difficult as people make it seem.

If you've got time you can learn OpenGL (4.6) with DSA to get a bit of perspective why people might feel the lower-level APIs are tedious, but if you just want to get into graphics programming just learn DX12/VK and move on. It's a lower-level endeavor and that'll help you out in the long run anyway since you've got more control, better validation, and the drivers have less of a say in how things happen (trust me, you don't want the driver vendors to decide how things happen, especially Intel).

P.S.: I like Metal as an API; I think it's the closest any modern API got to OpenGL while still being acceptable in other ways (I think it has pretty meh API validation, though). The problem is really that they never exported the API so it's useless on the actual relevant platforms for games and real interactive graphics experiences.

omneity•2mo ago

Slightly related, I made a monitor[0] for AMD gpus with a nifty chart. I had many issues with nvtop, it is a bit too strict for some situations and ends up crashing too often.

0: https://github.com/omarkamali/picomon

pjmlp•2mo ago

We surely have, Metal, CUDA, Pix, and PS/Switch also have their.

This is exactly yet another reason why researchers prefer CUDA, to the alternatives.

https://developer.nvidia.com/nsight-systems

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

What Is Ruliology?

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Dark Alley Mathematics

Hackers (1995) Animated Experience

Show HN: If you lose your memory, how to regain access to your computer?

PC Floppy Copy Protection: Vault Prolok

Delimited Continuations vs. Lwt for Threads

An Update on Heroku

How to effectively write quality code with AI

Why I Joined OpenAI

Learning from context is harder than we thought

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Introducing the Developer Knowledge API and MCP Server

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Show HN: ARM64 Android Dev Kit

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

FORTH? Really!?

Show HN: Smooth CLI – Token-efficient browser for AI agents

WebView performance significantly slower than PWA

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

What Is Ruliology?

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Dark Alley Mathematics

Hackers (1995) Animated Experience

Show HN: If you lose your memory, how to regain access to your computer?

PC Floppy Copy Protection: Vault Prolok

Delimited Continuations vs. Lwt for Threads

An Update on Heroku

How to effectively write quality code with AI

Why I Joined OpenAI

Learning from context is harder than we thought

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Introducing the Developer Knowledge API and MCP Server

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Show HN: ARM64 Android Dev Kit

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

FORTH? Really!?

Show HN: Smooth CLI – Token-efficient browser for AI agents

WebView performance significantly slower than PWA

AMD GPU Debugger

Comments