Also, Claude Code tends to make very broad search requests, and I keep getting an error from MCP about exceeding 25,000 characters. It happens quite often.
What would you recommend?
Also, Claude Code tends to make very broad search requests, and I keep getting an error from MCP about exceeding 25,000 characters. It happens quite often.
What would you recommend?
bigyabai•2h ago
Invest in a local inference server and run Qwen3. At this point it will still cost less than two pro accounts.
vmt-man•1h ago
bigyabai•1h ago
Nvidia hardware is cheap as chips right now. If you got 2x 3060 12gb cards (or a 24gb 4090), you'd have 24gb of CUDA-accelerated VRAM to play with for inference and finetuning. It should be plenty to fit the smaller SOTA models like GLM-4.5 Air, Qwen3 30b A3B, and Llama Scout, and definitely enough to start layering the giant 100b+ parameter options.
That's what I'd get, at least.
vmt-man•1h ago
Are they good enough compared to Sonnet 4?
I’ve also used Gemini 2.5 Pro and Flash, and they’re worse. But they’re much bigger, not just 30B.
bigyabai•1h ago
You might be able to try out Qwen3 via API to see if it suits your needs. Their 30b MOE is really impressive, and the 480b one can only be better (presumably).