I'm building Qwodel, an open-source pipeline that automates the fragmented mess of LLM quantization.
If you've ever tried to prep a Hugging Face model for edge deployment or cheaper cloud inference, you know the drill: wrestling llm_compressor for AWQ, writing ctypes calls for llama.cpp for GGUF, or fighting memory leaks in coremltools for Apple Silicon.
Qwodel acts as a unified orchestration engine. Instead of context-switching between three different ecosystems, you pass the model, and we handle the memory chunking, edge-case graph conversions, and output production-ready formats (GGUF, AWQ, CoreML).
We are actively building and updating the package every week to add new model architectures and backend optimizations. You can check out the full reference guide here: docs.qwodel.com.
The project is entirely open-source. We would love for you to test it out, tear the architecture apart, and let us know where it breaks. We are wide open to pull requests, so feel free to raise bugs or contribute directly in the repo!