Here is the release note from Ollama that made this possible: https://ollama.com/blog/claude
Technically, what I do is pretty straightforward:
- Detect which local models are available in Ollama.
- When internet access is unavailable, the client automatically switches to Ollama-backed local models instead of remote ones.
- From the user’s perspective, it is the same Claude Code flow, just backed by local inference.
In practice, the best-performing model so far has been qwen3-coder:30b. I also tested glm-4.7-flash, which was released very recently, but it struggles with reliably following tool-calling instructions, so it is not usable for this workflow yet.
mchiang•1h ago
https://github.com/21st-dev/1code
dang•1h ago