Show HN: Llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU
9•anju-kushwaha•17h ago
Complete llama.cpp tutorial for 2026. Install, compile with CUDA/Metal, run GGUF models, tune all inference flags, use the API server, speculative decoding, and benchmark your hardware.
Ive been trying to run local, effectively followed this guide (before the guide existed), and have not had any success. Llama builds fine, and then when i start it up, it just indefinitely spins its progress bar. I left it sit for 3 days and nada.
Running on an 8core 12gb ram vm, which has an amd rx5500xt (8gb) passed through. ROCm built, llama built with the correct flags.
CableNinja•2h ago
Running on an 8core 12gb ram vm, which has an amd rx5500xt (8gb) passed through. ROCm built, llama built with the correct flags.
What am i missing?
washadjeffmad•1h ago