Dual RTX 5060 Ti 16GB vs. RTX 3090 for Local LLMs

https://www.hardware-corner.net/guides/dual-rtx-5060-ti-16gb-vs-rtx-3090-llm/

11•pietrushnic•13h ago

Comments

supermatt•9h ago

What is the difference like with batching?

It seems all these tests only compare a single prompt at a time, which is just going to be throttled by memory bandwidth (faster on 3090) and clock speed (faster on 5060) for the most part.

The 3090 has almost 3x the cores of a 5060, so I’m guessing it will absolutely wipe the floor with the dual 5060 setup for batched inference - which is increasingly essential for agentic workflows and complex tool use.

Havoc•8h ago

One substantial downside is other uses. e.g. I also use my desktop for gaming. And a 3090 beats a 5060 easily on that. By a sizable margin - ~33% on some games

Not sure I'd trade more LLM vram for that.

esafak•7h ago

Reading this gave me flashbacks to the 80s, when tinkerers tried to move utilities into the upper- and extended memory area to free up precious conventional memory, 640KB of which we were told ought to have been "enough for anyone". All this because we were saddled with a 16-bit OS. This is not an LLM problem -- 32GB of memory is peanuts in 2025 -- this is an Intel and AMD problem.

zamadatix•6h ago

As the article highlights the problem is really twofold. You need enough VRAM to load the model at all but there also needs to be enough bandwidth that accessing all of that memory is fast enough to be worthwhile. It'd be "easy" to slap 2 TB of "slow" DDR5 onto a GPU but it wouldn't perform much better than a high core count CPU running LLMs with the same memory.

omneity•6h ago

I am not entirely surprised by the relative equivalence for the sparse model. The combined bandwidth of 2x 5060 Ti ≃ 1x 3090. There are inefficiencies in multi-gpus that are more negligible at smaller dimensions, hence why the dense 32B model performs significantly worse on the dual 5060 setup.

For reference I am getting ~40 output tok/s on a 4090 (450W) with Qwen3 32B and a context window of 4096.

> Ultimately, as the user note aptly put it, the decision largely boils down to how much context you anticipate using regularly.

Hah. (emphasis mine)

Scroll-Driven Camera Animation

Animate a mesh across a sphere's surface

The Agentic Systems Series

Catalyzing a Golden Age: A Blueprint for Strategic AI R&D Investment

How should we think about AI welfare? (Joe Carlsmith) [video]

The stakes of AI moral status

Evidence Studio: AI-powered BI-as-Code Platform

Boltz-2 for predicting ligand/protein binding affinity

Reference Works for Every Subject

Japanese researchers develop transparent paper as alternative to plastics

Holo1: Cost-Efficient Web Agent Powered by Open Weights

Show HN: Real-Time Trade Alerts from Trump's Truth Social Posts

600 Miles from the North Pole on a boat. My Starlink Mini is at 171 mbit/s

Resistance to Immunity (2019)

Adventures in Babysitting Coding Agents

A glance at the Rust compiler team operations

Professional Decline (The Atlantic)

Dual-Engine Serverless SQL Lakehouse

Fowl Forward over Wormhole, Locally

'Proof' Review: Finding Truth in Numbers

Sync engine's best friend: fine-grained rendering

Supreme Court allows DOGE to access social security data

Neutral WordPress package manager launches at the Linux Foundation

Roman Elementary Mathematics: The Operations

Examples of linkedSignal() usage in Angular applications

New gene therapy can target airway and lungs via nasal spray

Show HN: Asteroid Impact Probability Tool

The Rise of Marketing Speak

Semi-Sync Meetings: Stop Wasting Our Time

Higher Order Continuity for Smooth As-Rigid-as-Possible Shape Modeling