I also use self-hosted LLMs. You can make three GTX 1080s run a 7b model competently at limited context through ollama. Get a little more bold with LM studio and you can actually make a coherent and sort of reliable model.
keyle•2h ago
on macOS if you opted for 32GB you can run a GPT4-oss model with LMStudio really easily.
It's "good enough" for a lot of questions and doesn't go up and down like a yoyo (OpenAI dashboard lies)
chasing0entropy•2h ago