Existing OpenAI-compatible servers often require Docker, complex configuration files, or GPU support.
The gap between "I have a .gguf file" and "I have a working API endpoint" is wider than it should be.
A simple CLI tool to serve GGUF models as an endpoint: gguf-serve
To cut this short, we asked Neo to build gguf-serve.
Point it at any .gguf file, run the server, and immediately get OpenAI-compatible endpoints that work with any client library or tool that speaks the OpenAI API format.