Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.
Hmm, the fact that Ollama is open-source, can run in Docker, etc.?
Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.
What does unsloth-studio bring on top?
brew install llama.cpp
use the inbuilt CLI, Server or Chat interface. + Hook it up to any other app
greenstevester•1h ago
It's essentially a model that's learned to do the absolute minimum amount of work while still getting paid. I respect that enormously.
It scores 1441 on Arena Elo — roughly the same as Qwen 3.5 at 397B and Kimi k2.5 at 1100B.
Ollama v0.19 switched to Apple's MLX framework on Apple Silicon. 93% faster decode.
They've also improved caching so your coding agents don't have to re-read the entire prompt every time, about time I'd say.
The gist covers the full setup: install, auto-start on boot, keep the model warm in memory.
It runs on a 24GB Mac mini, which means the most expensive part of your local AI setup is still the desk you put it on.