I don't get why AMD doesn't solve their own software issues. Now they have a lot of money so not having money to pay for developers is not an excuse.
And data centers GPUs are not the worst. Using GPU compute for things like running inference at home is a much, much better experience with Nvidia. My 5 years old RTX 3090 is better than any consumer GPU AMD released up to this date, at least for experimenting with ML and AI.
Anything specific related to DC level computing?
NVidia is the exception to the rule when it comes to hardware companies paying competitive salaries for software engineers. I imagine AMD is still permeated by the attitude that software "isn't real work" and doesn't deserve more compensation, and that kind of inertia is very hard to overcome.
I must say it's been a completely positive experience. The mainline Fedora kernel just worked without any need to mess with the DKMS. I just forwarded /dev/dri/* devices to my containers, and everything worked fine with ROCm.
I needed to grab a different image (-rocm instead of -cuda) for Ollama, change the type of whisper build for Storyteller. And that was it! On the host, nvtop works fine to visualize the GPU state, and VAAPI provides accelerated encoding for ffmpeg.
Honestly, it's been an absolutely pleasant experience compared to getting NVidia CUDA to work.
I’m genuinely dumbfounded by what’s up at AMD at this point.
Keeps the incentives pure.
I'm even willing to accept a 20% performance hit for this requirement, should someone bring that up.
That's self contradictory. Their incentive is to sell more HW and at higher prices using whatever shady practices they can get away with, software or no software. There's nothing pure about that, it's just business. High end chips aren't commodity HW like lawnmowers, they can't function without the right SW.
And this isn't the 90's anymore when Hercules or S3 would only make the silicon, and then system integrators would write the drivers for it which was basically MS-DOS calls to read/write to registers via the PCI bus, by the devs reading a 300 page manual, those days are long gone. Modern silicone is orders of magnitude more complex that nobody else besides the manufacturer could write the drivers for it to extract the most performance out of it.
>I'm even willing to accept a 20% performance hit for this requirement, should someone bring that up.
I'm also willing to accept arbitrary numbers I make up, as a tradeoff, but the market does not work like that.
And you don't think these shady practices will leak into the software?
> Modern silicone is orders of magnitude more complex that nobody else besides the manufacturer could write the drivers for it...
The hardware people at the manufacturer are not the software people. So there __must__ be documentation.
YES, internal documentation, full of proprietary IP.
That depends on whether OP is buying/renting AMD gpu machines.
The 300 page manual would be 3,000 or 30,000 pages long, if modern ARM ISR manuals are any indication. Independent developers could totally write performant drivers if they had the documents, but those manuals do not exist - or if they do, they're proprietary.
Surely they could, but at that complexity level, they wouldn't put the necessary amount of effort in it without being payed, and at that point, it's better to hire them.
> but those manuals do not exist - or if they do, they're proprietary.
And there are market-related reasons for that, it's not done because of some arbitrary paranoia. Another important issue is that good documents are hard to write - with regard to driver coding, it's much easier to make a quick call or message the hardware people about some unclear aspect of chip's operation rather than go through the formal process of modifying the official documents. Waiting for external developers to reverse engineer that is slow and leads to serious competitive disadvantages and AMD is an example of it.
The assumption that no good software will be written without pay is outdated as FOSS disproved it many times over.
I think I made it clear that the necessary effort has to measure up to the complexity level, modification volume and time constraints typical for competitive GPU hardware - hasn't happened without pay for any GPU.
That may change if most of the GPU drivers are moved on-chip, which should have happened earlier but there's a lot of politics involved there too, so who knows.
Let's not go too far here. Reverse engineering and independent development of usable drivers are not impossible, they're 'merely' extremely challenging. Alyssa Rosenzweig in particular had great success reverse engineering the Apple M1 GPU and writing drivers for it, and that was just a few years ago.
This is just a HN fantasy that's not compatible with business of making money. That's why everyone here make money working in SW.
That's mostly because the documentation was never released.
Honestly, it makes no sense to try to suggest that FOSS can't write decent software when reality shows otherwise.
And depending on others to write firmware for your hardware, I don’t think that’s a recipe for success.
Hardware team at AMD: "Sorry, hardware can't exist without software; we'll first have to write the software"
Software team: "But we're the software team ..."
Hardware team: "Uhm yeah ... seems we have a nasty chicken and egg problem here"
If nvidia dominate because of CUDA and why it can do it but amd should not?
Also, the 20% would be open to further optimization by the community, so it wouldn't be that bad in practice, probably.
It should be obvious by now though that there's symbiosis between software and hardware, and that support timescales are longer. Another angle is that it's more than just AMD's own software developers, also the developers making products for their customers who in turn buy AMD's if everyone works together to make them run well and it's those second developers they need to engage with in a way their efforts will be welcomed.
It runs great. Run all my steam stuff through them. Those days to mention have been long gone for quite awhile.
The common denominator to the crashes you mention might possibly not be AMD? Do you friends perchance play on Windows?
AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics
the solution was adding amdgpu.ppfeaturemask=0xffff7fff to the command line. Before that I could reliably crash the driver with firefox.The future will probably see more chiplets rather than less, so I wonder if dealing with complexity here will pay more dividends in the long run
A few choice examples:
> Checkout part one of this series for an intro to HipKittens and checkout this post for a technical deep dive.
> Unsurprisingly, making AMD GPUs go brr boils down to keeping the “matrix cores” (tensor cores on NVIDIA) fed.
> These two patterns tradeoff programmability and performance, where 8-wave and its large tile primitives lead to compact code and 4-wave fine-grained interleaving expands code size. Surprisingly, the 8-wave schedule is sufficient to achieve SoTA-level performance on GEMMs and attention forwards. For GQA non-causal attention backwards, 8-wave also outperforms all AMD baselines by 1.8 × 1.8×, and our HK 4-wave further outperforms by 2.3 × 2.3×.
And I could go on. And on.
But overall besides the overuse of cliche/memespeak places it doesn’t make sense, the entire section that deals with the hot loop describes something that should be explained in a graph and instead explained in 100 lines of source code.
homarp•2mo ago