Was my $48K GPU server worth it?

https://rosmine.ai/2026/05/13/was-my-48k-gpu-worth-it/

45•apwheele•2d ago

Comments

doctorpangloss•37m ago

> Because of this I got a motherboard with slow GPU interconnect. It’s good for running many small experiments in parallel (which is my main use case) but horrible for any models split across gpus.

:( you paid a professional pc builder and you weren't told this?

ginko•33m ago

Don't those Ada 6000 GPUs support NVLink? I think I can even see the cover for the connectors in OP's pic.

edit: Hm, finding mixed information online on whether that's still supported or not. Apparently it was removed in workstation GPUs.

mciancia•28m ago

Nope, they don't support it. And afair even if they did, you would be limited to connecting only in pairs, not all 6 together

CamperBob2•27m ago

Consumer motherboards can still make sense even if you leave some performance on the table. Running an actual 8x GPU server is not something you'd want to do in an apartment. Imagine the old Lucasfilm "THX" trailer where an unearthly-sounding foghorn whine rises to a sweeping crescendo at reference level, only without the decay at the end.

At the time he put this rig together, there weren't a lot of open-weight LLMs that could run well on 6x48=288 GB, so it probably wasn't a huge loss.

Right now I'm in the process of cramming Blackwell cards into an old DDR4-based Milan server, where the important thing is to be able to run large models at all. The GPU fans alone burn over 400 watts at full throttle.

storus•6m ago

Did you think about Max-Q cards? 300W and they aren't that noisy either, 14% lower perf than non-Max-Q card.

mciancia•21m ago

I wonder why using 2 PSUs resulted in having slower interconnect.

There is no specs in this blogpost regarding cpu/motherboard choice, but if you go with threadripper pro they have 128 pci-e lanes for some time now, so using all GPUs at full speed shouldn't be a problem

m-hodges•13m ago

what is a "professional pc builder" in 2026

ok_dad•11m ago

A guy on Facebook with more confidence and better insurance

zozbot234•12m ago

If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much.

gosub100•25m ago

It doesn't cover risk. If one or more gpus dies, who pays for it? If you rent, you are guaranteed to be insulated from this risk. But owning, you might not have the best return policy from the vendor. And if you are actually at fault for breaking it, they have every right to deny a return. Or if your apartment is burglarized or catches fire (possibly from overloading the circuit) you are out the entire investment.

0xbadcafebee•7m ago

[delayed]

tombert•24m ago

I have four old 24gb Nvidia cards. They're not great but they're not useless either. The problem is that I haven't really figured out a good way to actually use them.

Genuine question; would anyone here recommend any specific motherboard to best utilize these cards?

mciancia•17m ago

Depends what you want to do and which cards you have, but usually going with any older (3rd gen+) threadripper pro setup will give you a lot of pcie lanes.

I myself run with gigabyte trx40 aorus xtreme, but since it's regular threadripper (not pro) with 4 GPUs 2 of them will run at x16 and two of them at x8 speeds

hasteg•22m ago

Just curious OP (if you're the one posting) -- what do you mean by independent researcher? What are you researching and are you making $$ from it or are you living off previous built up savings? Seems like an interesting path. What research have you looked into so far?

exceptione•13m ago

I am not the author, but he has been training/tuning? a model that produces text that mimics the source material in a more natural way. So getting the LLMs to produce less bland and boring LLMisms, according to the following up blog post.

hsuduebc2•9m ago

citing from the article:

"I spent a long time trying high risk/high reward experiments and failing. But now I have something good. I’ve solved a major problem with LLMs. And I’m launching next Monday so we will soon see if it’s actually a breakthrough or just LLM psychosis "

Maybe ai companies today have some bounty program?

daemonologist•8m ago

They have a subsequent post (from Monday) about what (/one thing) they've been working on: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

freediddy•20m ago

In the last year, I have bought an M3 Ultra Mac Studio with 512 GB, a Macbook Pro M5 MAX with 128 GB and an RTX 6000 Pro. I have spent around $25k so far, not including electricity. I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

In comparison to just spending for tokens, the tokens would have been much cheaper and much much faster. I've been running against Gemma4:31b, Qwen3.5 and 3.6, and getting local LLMs to solve AMC 8/10 math questions and it's about 10-100x slower than just doing it online. When I tried it with ChatGPT late last year, it took about one night and $25 to solve about 1000 questions. Using my RTX 6000 and M3 Ultra and Gemma4:31b on both, it answered about 40 questions in 7 hours and I haven't checked how good the answer is yet. At 800 watts (600 for RTX and 200 for M3 Ultra) and running for 7 hours, it solved around 40 questions.

At the very least I'm going to try to sell my M3 Ultra if I can find a reliable place to sell it without getting ripped off by scammers.

jon-wood•17m ago

I’m not usually one to ask this because learning to do a thing can be fun, but why exactly have you spent 25 thousand dollars on getting an LLM someone else made to answer maths exam questions?

freediddy•14m ago

It's just a project I'm working on. I'm working on projects where AIs are processing and classifying large amounts of data that would be a lot of work for humans to do.

CamperBob2•17m ago

How do you use the RTX 6000 with the Macs? Exo? I would think that would be pretty snappy if configured properly.

freediddy•13m ago

This is on a separate Windows PC, I don't have it integrated with the Macs.

Aurornis•19m ago

This is a difficult calculation to make because you wouldn't rent time on the exact same system in the cloud. Depending on what you're running, a bigger server with better inter-GPU interconnects in the cloud might complete the task so much faster that the additional per-hour expense is more than covered.

jmyeet•12m ago

So some things have changed since this rig was first built (2024). The most relevant is that $6800 RTX 6000 Ada 48GB has arguably been supplanted by the $9500 RTX 6000 Pro 96GB.

The Ada has a memory bandwidth of 960GB/s. The Pro has 1.8TB/s and about 40-50% better performance so is at least equivalent in processing power, much better in memory bandwidth (important for inference) and can hold larger models on a single card.

I've considered buying a rig with 1-2 6000 Pros for similar reasons but I want to see what happens with this year's Mac Studios with a likely M5 Ultra. Macs have a shared memory architecture whereas NVidia segments the market based on max memory where the biggest consumer card (RTX 5090) has 32GB of VRAM but still excellent memory bandwidth (1.8TB/s). A RTX 5090 rig will still trounce a Mac Studio seems to be the conventional wisdom. Despite being able to hold larger models and being able to chain Mac Studios on TB5, their lower memory bandwidth (~900GB/s) and lower overall GFLOPS mean they still come out behind.

That being said, the current Mac Studios are relatively long in the tooth, being released in 2024.

I'm still not sure any of this is really wroth it because things are still changing so fast. I think there's a decent chance of a number of large AI companies going bust in the next 2-3 years such that you'll be able to buy enterprise AI hardware at cents on the dollar, a bit like how Google bought data centers in the post-dot-com crash.

But anyway, nowadays I'd be looking at the RTX 6000 Pro as the sweet spot, having anywhere from 1-4 in a single server.

The electricial issues the author mentions are interesting. I hadn't really thought about the max amperage on a residential circuit. In a DC, these would typically operate on three phase power and much higher overall amperage. I wonder if there's a device you can buy that can combine multiple residential circuits into a single power source for a server this power hungry?

freediddy•7m ago

I have the Macbook M5 MAX with 128 GB of RAM. I put its performance at roughly equivalent to the RTX 5070 Ti. The M3 Ultra 512 GB for me is about half the performance of the RTX 5070 Ti but obviously it has the ability to do more because of the increased memory.

I don't think anything compares to the nVidia chips at all.

nextos•6m ago

I am also considering to buy 3-4x RTX 6000 Pro 96GB plus some Ryzen workstation with a grant.

Is this the best general-purpose choice as of 2026 with $50k for training, fine-tuning and running large open models?

trevithick•6m ago

You would install a 240v circuit (in the US) like for an electric clothes dryer.

0xbadcafebee•9m ago

[delayed]

jameson•8m ago

The idea is similar to maintaining on-prem vs cloud

Cloud is optimized for development velocity but its nature of high margin business eventually makes on-prem more promising

It could be too late but it might be worth looking into tax saving if you have a business. Depreciation of asset is a loss and may deduct your income. (I'm NOT a tax expert)

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Project Hail Mary – Stellar Navigation Chart

Flipper One – we need your help

Was my $48K GPU server worth it?

Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

Python 3.15: features that didn't make the headlines

More than 340 local news outlets are limiting the Internet Archive's access

Lost Images from the 1945 Trinity Nuclear Test Restored

ParadeDB (YC S23) Is Hiring Distributed Systems/Platform Engineers

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

We're testing new ad formats in Search and expanding our Direct Offers pilot

Waymo pauses Atlanta service as its robotaxis keep driving into floods

Mounting Git commits as folders with NFS

Michael Keating has died

Chewing gum restores dad's taste and smell years after Covid

Show HN: I Dedicated 4 Years to Mastering Offline Password Cracking

What Is Happening to Publishing?

FatGid: FreeBSD 14.x kernel local privilege escalation

Vivaldi 8.0

We Reverse-Engineered Docker Sandbox's Undocumented MicroVM API

Museum of Pocket Calculating Devices

Magic the Gathering format: Fun 40

Google's Antigravity Bait and Switch

Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK

What Do Gödel's Incompleteness Theorems Mean?

A Bipartisan Amendment Would End Police License Plate Tracking Nationwide

AI is just unauthorised plagiarism at a bigger scale

Stop throwing AI-generated walls of text into conversations

IBM invented semiconductor manufacturing automation

Show HN: I reverse engineered Apple's video wallpapers

Was my $48K GPU server worth it?

Comments

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Project Hail Mary – Stellar Navigation Chart

Flipper One – we need your help

Was my $48K GPU server worth it?

Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

Python 3.15: features that didn't make the headlines

More than 340 local news outlets are limiting the Internet Archive's access

Lost Images from the 1945 Trinity Nuclear Test Restored

ParadeDB (YC S23) Is Hiring Distributed Systems/Platform Engineers

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

We're testing new ad formats in Search and expanding our Direct Offers pilot

Waymo pauses Atlanta service as its robotaxis keep driving into floods

Mounting Git commits as folders with NFS

Michael Keating has died

Chewing gum restores dad's taste and smell years after Covid

Show HN: I Dedicated 4 Years to Mastering Offline Password Cracking

What Is Happening to Publishing?

FatGid: FreeBSD 14.x kernel local privilege escalation

Vivaldi 8.0

We Reverse-Engineered Docker Sandbox's Undocumented MicroVM API

Museum of Pocket Calculating Devices

Magic the Gathering format: Fun 40

Google's Antigravity Bait and Switch

Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK

What Do Gödel's Incompleteness Theorems Mean?

A Bipartisan Amendment Would End Police License Plate Tracking Nationwide

AI is just unauthorised plagiarism at a bigger scale

Stop throwing AI-generated walls of text into conversations

IBM invented semiconductor manufacturing automation

Show HN: I reverse engineered Apple's video wallpapers