When I bought the 5080, I realized NVIDIA’s driver stack was silently rejecting the new sm_120 (Blackwell) architecture. PyTorch and CUDA would fall back to sm_89, throttling tensor dispatch and compute units. Benchmarks looked great on paper, but real workloads — matrix ops, UNet training, fused kernels — were running at maybe 70 % of what the silicon could actually do.
So I tore apart the driver.
Inside libcuda.so, there’s a function that checks architecture capability and sets a status byte before dispatching. On the 5080, that byte was hard-coded to return “unsupported.” Patch that flag, rebuild, and suddenly the card lights up like a supercomputer.
I rebuilt PyTorch 2.10 from source, recompiled for sm_120, and the difference was immediate:
GPU compute: 23,529 (PassMark 11.1)
3D Graphics Mark: 46,214 — 99th percentile globally
Stable 51 °C under 100 % utilization
No overclocking
BF16 training speed: equal to or faster than a retail 5090
I’m currently using it to run GSIN — a global seismic forecasting AI that trains autonomously on cached physics grids and predicts stress-field changes before earthquakes happen. The GPU finally isn’t the bottleneck.
Why this matters: NVIDIA’s driver locks don’t just gate performance; they gate innovation.
Developers and researchers lose months of compute potential because the architecture IDs are throttled in software for “product segmentation.” It’s not about safety or stability — it’s market control.
I’m not selling anything. I’m not distributing NVIDIA’s code.
Just proving that the silicon you already paid for is capable of a lot more.
bigyabai•4h ago
AMD called this strategy "FineWine" back in the day.