I just found out about this last week, but the good news is a new PC with a better GPU will arrive in about two weeks so I’ve decided to install a pair of local LLMs that are similar in competency to the free GPT-5 mini model I usually used in CoPilot. qwen2.5-coder:14b for chat and deepseek-coder:3b for autocomplete. I’ll switch to a Claude API for the really tough stuff, which I was doing with CoPilot anyway. The Continue plugin for VSCode gets all of this accomplished.
roscas•4m ago
Continue on vscode with Ollama running (start it with "ollama serve") is great. There are some offline models like these that im using but not forget the qwen3.5 coder also.
"ollama list
NAME ID SIZE MODIFIED
laguna-xs.2:latest ba9ecde43b0e 23 GB 12 hours ago
nemotron3:33b f6d8b7ff496c 27 GB 4 days ago
qwen3.6:latest 07d35212591f 23 GB 6 weeks ago
gemma4:e2b 7fbdbf8f5e45 7.2 GB 7 weeks ago
gemma4:e4b c6eb396dbd59 9.6 GB 7 weeks ago "
You can download it from Continue or just use "Ollama pull <name>" from what you choose from ollama.com site and search on models. these run mostly on cpu as my 3080 cannot load those with more than 10gb but the cpu speed is amazing, it outputs faster than I can read!
cbdevidal•32m ago
roscas•4m ago
"ollama list NAME ID SIZE MODIFIED laguna-xs.2:latest ba9ecde43b0e 23 GB 12 hours ago nemotron3:33b f6d8b7ff496c 27 GB 4 days ago qwen3.6:latest 07d35212591f 23 GB 6 weeks ago gemma4:e2b 7fbdbf8f5e45 7.2 GB 7 weeks ago gemma4:e4b c6eb396dbd59 9.6 GB 7 weeks ago "
You can download it from Continue or just use "Ollama pull <name>" from what you choose from ollama.com site and search on models. these run mostly on cpu as my 3080 cannot load those with more than 10gb but the cpu speed is amazing, it outputs faster than I can read!