> To test it, install the extension (no registration/key needed) and navigate to a HF model page. Then click the "VRAM" icon on the top right to open the sidepanel.
You can specify quantization, batch size, sequence length, etc.
Works for inference & fine-tuning.
If it does not fit on the specified GPUs, it gives you an advise on how to still run it (e.g. lowering precision).
It is inspired at my work, where we were constantly exporting metrics from HF to estimate required hardware. Now, it saves us in the dev team quite some time and clients can use it, too.
Let me know what you think.
clemnt•1d ago
PieterBecking•12h ago