- uses https://huggingface.co/g023/Qwen3-1.77B-g023 as the demonstration model (throw model files in Qwen3-BEST folder)
This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.
santander_cl•1h ago
This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.