I originally built this because I got tired of constantly SSHing to my server to edit a config just try out a new model. It's grown a lot since then.
What it does:
Web UI for creating and managing LLM instances from your browser
Full llama.cpp model lifecycle - download from HuggingFace, create preset.ini configs with an in-browser editor, load/unload models via router mode
Automatic idle timeout, LRU eviction, and instance limits
llama.cpp, mlx_lm and vllm backends
OpenAI and Anthropic API compatible endpoints (backend-dependent)
Multi-node support for distributing instances across hosts
Inference API keys with per-instance access control