frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s

3•yolo-auto•1h ago
This submission is a tale about how I launched an unlimited LLM provider to about 60 hyped people on the waitlist, then immediately served them a fully dysfunctional death-loop model, and how most people, very reasonably, disappeared, but thanks to a few extremely nice people stuck around anyway, we kept the project alive and its still pretty chaotic but gaining traction.

To back up a little bit-- I believe that the whole point of AI agents is that they should keep working. They should read files, retry, search, code, summarize, run tools, and loop until the job is done. When your employer is paying for it, who cares about cost, but when it comes to my personal money/hobbies, if every loop feels like a tiny financial event, you start babysitting the agent instead of using it, and its not fun.

On the other hand, metered pricing makes me worry about using too much. Usage subscriptions make me feel like I need to use every last magical % or I'm are "wasting it". If only an unlimited provider existed....

Then I joined the AMD developer program - I got some credits to spin up my own MI300x and started tinkering with vllm/sglang inference serving on AMD.

After learning about AMD MI300x , i did some napkin math:

Renting MI300x at 2.00 an hour = ~$1500 a month . It can probably support about 150 users using a small MOE model, like qwen-35b-3a , maybe more.

1500 / 150= $10.00 per month, and we all get to play with agents for a small price.

You can oversubscribe a bit, so i landed on $6 per month, per user, for 2x generation slots, 128k context, no token limits, no rate limits.

I built the site, router, made a waitlist, and then over-optimized the MI300x to the point where vllm bench had like 3k+ output and 40k+ throughput.... But i didn't test the final config/serve commands... And that's where i did a disaster launch. You couldn't prompt the thing without it looping or bugging out, it was cursed. And that's where we lost alot of people.

Luckily, my buddy had a few 3090s, so he threw me a life boat and began hosting qwen for us on 2x 3090s and we finally had an operational model that wasn't costing $2.00 an hour for our whopping 3 users.

We started gaining a more users, so we moved up to 4x 3090s. Which we have plenty of room for more users, but even so, since then:

we've configured vllm wrong like 15 times a GPU died we lost power I made a bunch of one-click starts for openclaw,hermes,pi-mono and none of them really work right and that probably drives people away. Those are still on our site right now.

...but people that know what they are doing seem to really be liking the price point. All in all we have like 98% up time. Its been about a month. We've both learned a ton, even already having backgrounds in SWE/SE/AI , being on the hook for a couple paying users forced us to really focus on delivering them a good product. And now i think we might be close to paying the power/hosting bill so we're not operating at a loss (if u include 3090 capex were still at aloss).

Our break-even point is moving to the cloud to max out a MI300x, which is now tuned and ready to go once we get the users.

And im finding in some areas, subscribing to our service is cheaper than running the model (but as someone who loves local models, i totally get it).

Since then, I've been working on a desktop agent that actually works with small models like qwen -- thats going to replace the broken 1 click starts. It's barebones, but its something out of the box that just works. I made it open source, you can see what im talking about here: https://github.com/yolo-auto-org/yolo-auto-desktop , we're at yolo-auto.com and we have an abysmal free tier to prove it works!

Anyway, hope you got a laugh or found it interesting! Drop a question if you have any.

Comments

b--l•24m ago
`qwen-35b-3a` is really garbage (I know well because it's what I run locally). What quant are you running? Would people really pay $6 a month for it even if unlimited?

That said, nice looking site and wish you the best for getting it off the ground.

I accidentally hit SOTA on agentic memory by using AI companions

https://graph.coder.company/
1•vignesh_146•33s ago•0 comments

Local Models in Mid-2026

https://coles.codes/posts/local-models-mid-2026
1•colescodes•3m ago•0 comments

Google's Pinpoint is the free research tool you should know about

https://www.fastcompany.com/91558438/googles-pinpoint-is-the-free-research-tool-you-should-know-a...
1•OutOfHere•4m ago•1 comments

Untrusted data in Linux – How Rust is going to save us

https://www.youtube.com/watch?v=Nzmj7K0FNRY
1•tux1968•5m ago•0 comments

One-click, production-like ATProto network for local development and E2E testing

https://github.com/eurosky-social/u-at-proto
1•doener•8m ago•0 comments

Levyer: The Platform. Designed for Freedom

https://levyer.com/
1•doener•10m ago•0 comments

HumanizeHub: A confidential marketplace for humanizing AI content

https://humanizehub.me
1•cocoglare•13m ago•0 comments

While Oracle Will Rake in Big Bucks on AI, Profits Are Hard to Predict

https://www.nextplatform.com/cloud/2026/06/12/while-oracle-will-rake-in-big-bucks-on-ai-profits-a...
1•rbanffy•16m ago•0 comments

Upscaling Space Quest 3 [video]

https://www.youtube.com/watch?v=Zozc1xGuO7Q
1•skibz•20m ago•0 comments

Network service termination for certain Sony Electronics products

https://www.sony.com/electronics/support/articles/00398725
2•croes•20m ago•0 comments

Sygnet – Hash any document with SHA3-512 and declare your AI usage level

https://mysygnet.com
1•Pistachero•22m ago•0 comments

Zuckerberg says Meta made 'mistakes' in AI workforce shift

https://www.reuters.com/business/metas-zuckerberg-admits-mistakes-made-ai-transformation-2026-06-12/
1•jgilias•23m ago•0 comments

Superpowers for Claude, Codex etc.

https://github.com/obra/superpowers
1•wood_spirit•23m ago•0 comments

Nockchain vs. Pearl: The Great Compute War

https://x.com/blocmates/article/2065357097353764868
1•MrBuddyCasino•26m ago•0 comments

A Rant about IPOs

https://dampedspring101.substack.com/p/ipo-rant-everybody-lies
1•rwmj•31m ago•0 comments

Trial of 12mph bike lane speed limit grinds gears of Dutch cyclists

https://www.theguardian.com/world/2026/jun/14/trial-of-12mph-bike-lane-speed-limit-grinds-gears-o...
5•defrost•42m ago•1 comments

Cursed Knowledge

https://obscura.com/cursed-knowledge/
2•Cider9986•43m ago•0 comments

An architectural blueprint to excise MS-DOS device debt (CON, NUL) from Windows

https://feedbackportal.microsoft.com/feedback/idea/dca14930-b767-f111-9b47-6045bdbd0989
1•breakthematrix•46m ago•0 comments

New Brunswick woman sues OpenAI, alleging ChatGPT led to daughter's death

https://www.cbc.ca/news/canada/new-brunswick/sue-open-ai-suicide-chat-gpt-9.7234630
1•uladzislau•55m ago•0 comments

Don't trust large context windows

https://garrit.xyz/posts/2026-05-06-dont-trust-large-context-windows
7•computersuck•57m ago•2 comments

China Is Propping Up the World Economy by Importing Less Oil

https://www.wsj.com/business/energy-oil/china-is-propping-up-the-world-economy-by-importing-a-lot...
6•PankajGhosh•1h ago•0 comments

Introduction to (Multimodal) LLM-as-a-Judge

https://yinghonglan.substack.com/p/introduction-to-multimodal-llm-as
4•rented_mule•1h ago•0 comments

Mining And refining: uranium and plutonium (2024)

https://hackaday.com/2024/04/24/mining-and-refining-uranium-and-plutonium/
2•leonidasrup•1h ago•0 comments

How an astrophysicist uses Codex to help simulate black holes

https://openai.com/index/using-codex-to-simulate-black-holes/
3•gmays•1h ago•0 comments

AI: Surgeon's Assistant or Commodity on a Meter?

https://replicated.wiki/blog/stream.html
2•gritzko•1h ago•0 comments

Show HN: Hanzi Popup – Chinese Language Reader for iOS

https://krmanik.github.io/hanzipopupapp/
1•krmani•1h ago•0 comments

Effective Note Taking

https://isgin01.github.io/posts/effective-note-taking/
1•pullshark91•1h ago•0 comments

SchemaFlow: Agentic Database Change Impact Analysis, SQL Gen and Eval Guardrails

https://developers.openai.com/cookbook/examples/partners/schemaflow_design_guide/schemaflow_cookbook
2•gmays•1h ago•0 comments

Making FlashAttention-4 faster for inference

https://modal.com/blog/flash-attention-4-faster
3•birdculture•1h ago•0 comments

A Missing Woman from Germany Reappears in the Epstein Files

https://www.spiegel.de/international/world/the-mystery-of-michele-a-missing-woman-from-germany-re...
6•doener•1h ago•0 comments