frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

https://sllm.cloud
32•jrandolf•2h ago
Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a spot with your card, and nobody is charged until the cohort fills. Prices start at $5/mo for smaller models.

The LLMs are completely private (we don't log any traffic).

The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models.

Comments

mmargenot•59m ago
This is a great idea! I saw a similar (inverse) idea the other day for pooling compute (https://github.com/michaelneale/mesh-llm). What are you doing for compute in the backend? Are you locked into a cohort from month to month?
vova_hn2•54m ago
1. Is the given tok/s estimate for the total node throughput, or is it what you can realistically expect to get? Or is it the worst case scenario throughput if everyone starts to use it simultaneously?

2. What if I try to hog all resources of a node by running some large data processing and making multiple queries in parallel? What if I try to resell the access by charging per token?

Edit: sorry if this comment sounds overly critical. I think that pooling money with other developers to collectively rent a server for LLM inference is a really cool idea. I also thought about it, but haven't found a satisfactory answer to my question number 2, so I decided that it is infeasible in practice.

jrandolf•43m ago
1. It's an average. 2. We have sophisticated rate limiter.
esafak•53m ago
Like vast.ai and TensorDock, and presumably others.
spuz•51m ago
It seems crazy to me that the "Join" button does not have a price on it and yet clicking it simply forwards you to a Stripe page again with no price information on it. How am I supposed to know how much I'm about to be charged?
jrandolf•46m ago
That was an error on our part lol. We'll update with the price.
peter_d_sherman•50m ago
What a brilliant idea!

Split a "it needs to run in a datacenter because its hardware requirements are so large" AI/LLM across multiple people who each want shared access to that particular model.

Sort of like the Real Estate equivalent of subletting, or splitting a larger space into smaller spaces and subletting each one...

Or, like the Web Host equivalent of splitting a single server into multiple virtual machines for shared hosting by multiple other parties, or what-have-you...

I could definitely see marketplaces similar to this, popping up in the future!

It seems like it should make AI cheaper for everyone... that is, "democratize AI"... in a "more/better/faster/cheaper" way than AI has been democratized to date...

Anyway, it's a brilliant idea!

Wishing you a lot of luck with this endeavor!

kaoD•46m ago
How is the time sharing handled? I assume if I submit a unit of work it will load to VRAM and then run (sharing time? how many work units can run in parallel?)

How large is a full context window in MiB and how long does it take to load the buffer? I.e. how many seconds should I expect my worst case wait time to take until I get my first token?

ninjha•32m ago
> how many work units can run in parallel

not original author but batching is one very important trick to make inference efficient, you can reasonably do tens to low hundreds in parallel (depending on model size and gpu size) with very little performance overhead

jrandolf•28m ago
vLLM handles GPU scheduling, not sllm. The model weights stay resident in VRAM permanently so there's no loading/unloading per request. vLLM uses continuous batching, so incoming requests are dynamically added to the running batch every decode step and the GPU is always working on multiple requests simultaneously. There is no "load to VRAM and run" per request; it's more like joining an already-running batch.

TTFT is under 2 seconds average. Worst case is 10-30s.

spuz•46m ago
Is this not a more restricted version of OpenRouter? With OpenRouter you pay for credits that can be used to run any commercial or open-source model and you only pay for what you use.
jrandolf•40m ago
OpenRouter is a little different. We are trying to experiment with maximizing a single GPU cluster.
singpolyma3•44m ago
25 t/s is barely usable. Maybe for a background runner
lelanthran•21m ago
> 25 t/s is barely usable. Maybe for a background runner

That's over a 1000 words/s if you were typing. If 1000 words/s is too slow for your use-case, then perhaps $5/m is just not for you.

I kinda like the idea of paying $5/m for unlimited usage at the specified speed.

It beats a 10x higher speed that hits daily restrictions in about 2 hours, and weekly restrictions in 3 days.

freedomben•40m ago
This is an excellent idea, but I worry about fairness during resource contention. I don't often need queries, but when I do it's often big and long. I wouldn't want to eat up the whole system when other users need it, but I also would want to have the cluster when I need it. How do you address a case like this?
jrandolf•19m ago
We implement rate-limiting and queuing to ensure fairness, but if there are a massive amount of people with huge and long queries, then there will be waits. The question is whether people will do this and more often than not users will be idle.
freedomben•7m ago
Is there any way to buy into a pool of people with similar usage patterns? Maybe I'm overthinking it, but just wondering
varunr89•30m ago
$40/mo for deepseek r1 seems steep compared to a pro sub on open ai /claude unless you run 24x7. im not sure how sharing is making this affirdable.
lelanthran•16m ago
> $40/mo for deepseek r1 seems steep compared to a pro sub on open ai /claude unless you run 24x7.

"Running 24x7" is what people want to do with openclaw.

Lalabadie•29m ago
This is the most "Prompted ourselves a Shadcn UI" page I've seen in a while lol

I dig the idea! I'm curious where the costs will land with actual use.

jrandolf•27m ago
Thanks lol. I actually like Shadcn's style. It's sad that people view it as AI now.

Show HN: Running local OpenClaw together with remote agents in an open network

https://github.com/hybroai/hybro-hub
1•kevinlu•37s ago•0 comments

Chat Control: The Technical and Legal Case Against Mass Scanning

https://vixen.moe/chat-control-the-technical-and-legal-case-against-mass-scanning/
1•DarkGodErebus•46s ago•0 comments

Floating point from scratch: Hard Mode

https://essenceia.github.io/projects/floating_dragon/
1•random__duck•1m ago•0 comments

Scientists capture how cells trigger inflammation

https://news.stanford.edu/stories/2026/03/immune-response-inside-cells-inflammation-research
1•ohjeez•2m ago•0 comments

Ask HN: Best build in public/regular updates blogs?

2•suralind•3m ago•0 comments

Batteries-included terminal UI framework for Go

https://useglyph.sh/
1•DeveloperOne•3m ago•0 comments

37,000 AI-generated podcasts on Kaggle

https://www.kaggle.com/datasets/listennotes/ai-generated-fake-podcasts-spams
3•wenbin•10m ago•0 comments

Aspire Docs in Your Terminal (and Your AI's Brain)

https://devblogs.microsoft.com/aspire/aspire-docs-in-your-terminal/
1•vyrotek•15m ago•0 comments

Bazaarly – A Thought Exercise

https://blog.sayemahmed.com/p/bazaarly-a-thought-exercise-universe
1•sayembd•16m ago•0 comments

AI Agents to Organise My Secret Society's Dinners

https://chillphysicsenjoyer.substack.com/p/ai-agents-to-organise-my-secret-societys
1•crescit_eundo•19m ago•0 comments

Deafness reversed: One injection restores hearing in just weeks – ScienceDaily

https://www.sciencedaily.com/releases/2026/04/260403044651.htm
3•bilsbie•19m ago•0 comments

Beyond the Verdict: Holding Big Tech Accountable Isn't as Simple as It Seems

https://connectsafely.org/beyond-the-verdict-holding-big-tech-accountable-isnt-as-simple-as-it-se...
1•ohjeez•21m ago•0 comments

Plague Ships

https://www.afloat.com.au/feature/plague-ships/
3•bryanrasmussen•22m ago•0 comments

Mapping AI into Production: A Field Experiment on Firm Performance

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6513481
2•senko•25m ago•0 comments

Artemis II crew snaps portrait of Earth on their way to the moon

https://www.popsci.com/science/earth-photo-artemis-ii/
1•geox•30m ago•0 comments

Across the social sciences, half of research doesn't replicate

https://www.science.org/content/article/across-social-sciences-half-research-doesn-t-replicate
2•XzetaU8•30m ago•0 comments

Polymarket apologizes for allowing wagers on fate of U.S. pilots downed in Iran

https://www.nbcnews.com/news/us-news/polymarket-apologizes-allowing-wagers-fate-us-pilots-downed-...
5•ceejayoz•32m ago•0 comments

Malaysia's age verification rules for social media could be strictest

https://www.biometricupdate.com/202604/malaysias-age-verification-rules-for-social-media-could-be...
1•anonhaven•34m ago•0 comments

IBM 3270 Information Display System: Color and Programmed Symbols (1979) [pdf]

https://bitsavers.org/pdf/ibm/3278/GA33-3056-0_3270_Information_Display_System_Color_and_Programm...
2•hggh•34m ago•1 comments

Not all of this is new

https://www.natemeyvis.com/not-all-of-this-is-new/
1•Brajeshwar•37m ago•0 comments

Artificial Intelligence Will Die – and What Comes After

https://comuniq.xyz/post?t=912
3•01-_-•41m ago•2 comments

Astronomers Find a Third Galaxy Missing Its Dark Matter

https://www.universetoday.com/articles/astronomers-find-a-third-galaxy-missing-its-dark-matter-va...
8•gostsamo•41m ago•1 comments

Token Price Discovery in the AI Diffusion Debate

https://davefriedman.substack.com/p/the-price-discovery-problem-in-the
1•walterbell•45m ago•0 comments

What does Open Source mean?

https://nesbitt.io/2026/04/04/what-does-open-source-mean.html
1•zdw•48m ago•1 comments

Laid Off from Oracle(OCI). Looking for Software Roles (USA)

1•bemindful•48m ago•1 comments

Iran's Network of Cameras Bolsters Air Defenses, Expert Says

https://www.wsj.com/livecoverage/iran-war-news-2026/card/iran-s-network-of-cameras-bolsters-air-d...
19•uxhacker•49m ago•0 comments

Detecting Defects in Software Systems

https://lasse.hels.dk/detecting-defects-in-software-systems/
1•seagrassalert•49m ago•0 comments

Ask HN: Regarding app rejection on 3.1.1 Guidelines

1•binaryvigilante•51m ago•0 comments

The AI-Native Fork

https://www.howardgray.net/the-fork/
3•walterbell•52m ago•0 comments

Sonos Play Review: Performance Meets Convenience

https://www.wired.com/review/sonos-play/
1•joozio•52m ago•0 comments