Not sure I'd trade more LLM vram for that.
For reference I am getting ~40 output tok/s on a 4090 (450W) with Qwen3 32B and a context window of 4096.
> Ultimately, as the user note aptly put it, the decision largely boils down to how much context you anticipate using regularly.
Hah. (emphasis mine)
supermatt•9h ago
It seems all these tests only compare a single prompt at a time, which is just going to be throttled by memory bandwidth (faster on 3090) and clock speed (faster on 5060) for the most part.
The 3090 has almost 3x the cores of a 5060, so I’m guessing it will absolutely wipe the floor with the dual 5060 setup for batched inference - which is increasingly essential for agentic workflows and complex tool use.