UHOP is an open-source attempt to break that lock-in by introducing a cross-vendor optimization layer. It detects your hardware (CUDA, ROCm, OpenCL, etc.), generates or benchmarks kernels, and caches the best performer. You can wrap your ops with a decorator, let UHOP choose or generate the kernel, and it just runs — wherever.
Features so far:
Hardware detection + backend selection
AI-assisted kernel generation (CUDA / OpenCL / Triton)
Fused op demos (conv2d+ReLU, matmul, etc.)
Kernel benchmarking and caching
CLI + early browser dashboard
There’s a long way to go — distributed tuning, compiler IR passes, better PyTorch/JAX hooks — but it’s open, hackable, and community-driven.
Repo: github.com/sevenloops/uhop
Demo: uhop.dev
Would love feedback from compiler engineers, GPU devs, or anyone who’s ever felt boxed in by vendor APIs.