Built in ~10 hours using dlopen/dlsym for dynamic loading. 100% test pass rate.
The goal: break NVIDIA's CUDA vendor lock-in and make AMD GPUs viable for
existing CUDA workloads without months of porting effort.
bigyabai•2mo ago
> ## First Comment (Expand on technical details)
> Post this as your first comment after submitting:
lmfao
throwaway2027•2mo ago
Holy AI Slop
throwaway2027•2mo ago
[flagged]
tomhow•2mo ago
Please don't give oxygen to trolls. We detached and banned the account. Any time you see this kind of thing, flag the comment, and if you want to be extra-helpful, email us – hn@ycombinator.com.
ArchitectAI•2mo ago
It intercepts CUDA API calls at runtime and translates them to HIP/rocBLAS/MIOpen.
No source code needed. No recompilation. Just:
Currently supports:- 38 CUDA Runtime functions
- 15+ cuBLAS operations (matrix multiply, etc)
- 8+ cuDNN operations (convolutions, pooling, batch norm)
- PyTorch training and inference
Built in ~10 hours using dlopen/dlsym for dynamic loading. 100% test pass rate.
The goal: break NVIDIA's CUDA vendor lock-in and make AMD GPUs viable for
existing CUDA workloads without months of porting effort.
bigyabai•2mo ago
> Post this as your first comment after submitting:
lmfao