Fix: added host_ptr to llama_model_params. CPU tensors point directly at the mmap region. Only Vulkan tensors get copied.
Result on real hardware: Peak RAM: 524MB → 142MB (74% reduction) First boot: 19s → 11s Second boot: ~2.5s (mmap + KV cache)
Code: https://github.com/Perinban/llama.cpp/tree/axon-dev
Write-up with VmRSS proof: https://www.linkedin.com/posts/perinban-parameshwaran_machin...