I kept seeing people ask "Which model i can run on my gpu", "will model X fit on my GPU". Thats why I built a filter on whichllmmodel that lets you search models by what will actually fit on your hardware (8GB, 16GB, 24GB, etc.) at a given quantization level.
Comments
necovek•1h ago
Very broken: "live minimums" do not allow me to remove 512 token limit and put a bigger number easily.
No unified or shared memory scenarios (like Apple's M platform or AMD's integrated GPU platform).
johng•18m ago
Was going to mention this. I'm on an M1 Max and wanted to see what the site suggested.
CRSilkworth•44m ago
very nice idea. Would be nice if you could also keep desired context as a free parameter and let the models tell you what maximum context you could have.
necovek•1h ago
No unified or shared memory scenarios (like Apple's M platform or AMD's integrated GPU platform).
johng•18m ago