I have been trying to use Jax on AMD/ROCM since 2023 and still can't because of the countless ROCM bugs and terrible response time of their team.
I wish AMD could succeed, I really do. I'm all for competition and having someone stand up to NVIDIA.
But 3 months I found that on AMD MI250x linear algebra operations like svd or eigh are 10x to 40x times slower than on NVIDIA A100, a gpu with 20% of the MI250x stated TFLOPs
I reported it, and I got a vague response. 3 months later the bug is still largely there, and AMD still does not offer a clear message.
I understand AMD is prioritising their ML customers who only use matrix multiplications, but as AMD somehow managed to convince France to buy HPC supercomputers from them, they should be honest and commit to a decent support.
PhilipVinc•2h ago
I wish AMD could succeed, I really do. I'm all for competition and having someone stand up to NVIDIA.
But 3 months I found that on AMD MI250x linear algebra operations like svd or eigh are 10x to 40x times slower than on NVIDIA A100, a gpu with 20% of the MI250x stated TFLOPs I reported it, and I got a vague response. 3 months later the bug is still largely there, and AMD still does not offer a clear message.
I understand AMD is prioritising their ML customers who only use matrix multiplications, but as AMD somehow managed to convince France to buy HPC supercomputers from them, they should be honest and commit to a decent support.