frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Time to build a GPU OS? Here is the first step

https://www.notion.so/yifanqiao/Solve-the-GPU-Cost-Crisis-with-kvcached-289da9d1f4d68034b17bf2774201b141
48•Jrxing•4h ago

Comments

CharlesW•1h ago
Actual title: "Solve the GPU Cost Crisis with kvcached: A library to enable virtualized, elastic KV cache for LLM serving on shared GPUs"
jewel•1h ago
In my imagination, I thought that the large GPU clusters were dynamically allocating whole machines to different tasks depending on load.

So, hypothetically, if ChatGPT's peak load and their minimum load were a 3× ratio, they'd reallocate 2/3 of their servers to training when it's not peak time.

Doing the same thing inside an individual GPU seems irrelevant to anyone operating at scale when they can approximate the same behavior with entire servers or even entire racks.

noxa•1h ago
Neat! As someone working in this space and feeling like I've been taking crazy pills from how these "duh, CPU solved this 30 years ago" things keep slipping it's great to see more people bridging the gap! Unfortunately CUDA/HIP (and the entire stack beneath them) virtual memory management ops are very expensive host APIs (remapping a big block of pages can be O(n^2) with page count and fully synchronize host/device (forced wait idle), take kernel locks, etc) so it hasn't been viable in all cases. If your workloads are submit/wait with host in the loop the VM tricks are ok but if you are trying to never block the GPU (pipeline depth > 0) you really want to avoid anything that does a page table modification (until we get GPUs that can pipeline those). vkQueueBindSparse is one of the few async APIs I've seen, and CUDA has cuMemMapArrayAsync but I haven't yet used it (because arrays are annoying and without being able to inspect the driver I'm sure it's probably doing the wrong thing).

I've had good luck with indirection tables used during lookup inside of the kernels consuming/producing the kvcache data - it's essentially user-mode remapping like they do here: you can publish a buffer offset table and threads are uniform, have coalesced reads to the table, and cache the offsets no problem. You have the same memory locality issues as VM (contiguous virtual but potentially random physical) but are not limited to device page sizes and since you can update while work is in-flight you can be much more aggressive about reuse and offload (enqueue DMA to cold storage to evict from VRAM, enqueue DMA to copy from cold memory into reused VRAM, enqueue offset table update, enqueue work using them, repeat - all without host synchronization). You can also defrag in-flight if you do want to try to restore the physical locality. It's nothing crazy and fairly normal in CPU land (or even classic virtual texturing), but in ML GPU land I could write a big paper on it and call it SuperDuperFancyAttention4 and publish press releases...

BergAndCo•1h ago
AI-written paper posted by JiaRong Xing

Username is Jrxing

"GPU OS" turns out to be just more LLM spam

Replacing a $3000/mo Heroku bill with a $55/mo server

https://disco.cloud/blog/how-idealistorg-replaced-a-3000mo-heroku-bill-with-a-55-server/
185•jryio•1h ago•104 comments

Doomsday Scoreboard

https://doomsday.march1studios.com/
73•diymaker•1h ago•32 comments

Build Your Own Database

https://www.nan.fyi/database
296•nansdotio•5h ago•57 comments

rlsw – Raylib software OpenGL renderer in less than 5k LOC

https://github.com/raysan5/raylib/blob/master/src/external/rlsw.h
25•fschuett•58m ago•1 comments

LLMs can get "brain rot"

https://llm-brain-rot.github.io/
242•tamnd•7h ago•127 comments

Neural audio codecs: how to get audio into LLMs

https://kyutai.org/next/codec-explainer
305•karimf•9h ago•92 comments

We rewrote OpenFGA in pure Postgres

https://getrover.substack.com/p/how-we-rewrote-openfga-in-pure-postgres
10•wbadart•1h ago•2 comments

Mathematicians have found a hidden 'reset button' for undoing rotation

https://www.newscientist.com/article/2499647-mathematicians-have-found-a-hidden-reset-button-for-...
71•mikhael•5d ago•39 comments

Minds, brains, and programs (1980) [pdf]

https://home.csulb.edu/~cwallis/382/readings/482/searle.minds.brains.programs.bbs.1980.pdf
28•measurablefunc•1w ago•0 comments

NASA chief suggests SpaceX may be booted from moon mission

https://www.cnn.com/2025/10/20/science/nasa-spacex-moon-landing-contract-sean-duffy
151•voxleone•9h ago•458 comments

Lottery-fication of Everything: 0 day options, perps, parlays are now mainstream

https://www.dopaminemarkets.com/p/the-lottery-fication-of-everything
6•_1729•53m ago•0 comments

Wikipedia says traffic is falling due to AI search summaries and social video

https://techcrunch.com/2025/10/18/wikipedia-says-traffic-is-falling-due-to-ai-search-summaries-an...
192•gmays•20h ago•185 comments

Foreign hackers breached a US nuclear weapons plant via SharePoint flaws

https://www.csoonline.com/article/4074962/foreign-hackers-breached-a-us-nuclear-weapons-plant-via...
278•zdw•6h ago•165 comments

The Salt and Pepper Shaker Museum

https://www.thesaltandpeppershakermuseum.com
9•NaOH•1w ago•0 comments

Getting DeepSeek-OCR working on an Nvidia Spark via brute force with Claude Code

https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/
90•simonw•1d ago•5 comments

Flexport Is Hiring SDRs in Chicago

https://job-boards.greenhouse.io/flexport/jobs/5690976?gh_jid=5690976
1•thedogeye•4h ago

Show HN: Katakate – Dozens of VMs per node for safe code exec

https://github.com/Katakate/k7
75•gbxk•6h ago•31 comments

ChatGPT Atlas

https://chatgpt.com/atlas
441•easton•4h ago•435 comments

Diamond Thermal Conductivity: A New Era in Chip Cooling

https://spectrum.ieee.org/diamond-thermal-conductivity
145•rbanffy•10h ago•47 comments

AWS multiple services outage in us-east-1

https://health.aws.amazon.com/health/status?ts=20251020
2210•kondro•1d ago•2000 comments

Ilo – a Forth system running on UEFI

https://asciinema.org/a/Lbxa2w9R5IbaJqW3INqVrbX8E
97•rickcarlino•8h ago•35 comments

The death of thread per core

https://buttondown.com/jaffray/archive/the-death-of-thread-per-core/
55•ibobev•1d ago•13 comments

Show HN: bbcli – A TUI and CLI to browse BBC News like a hacker

https://github.com/hako/bbcli
48•wesleyhill•2d ago•7 comments

Our modular, high-performance Merkle Tree library for Rust

https://github.com/bilinearlabs/rs-merkle-tree
116•bibiver•9h ago•26 comments

What do we do if SETI is successful?

https://www.universetoday.com/articles/what-do-we-do-if-seti-is-successful
95•leephillips•1d ago•120 comments

Binary Retrieval-Augmented Reward Mitigates Hallucinations

https://arxiv.org/abs/2510.17733
30•MarlonPro•5h ago•3 comments

The Programmer Identity Crisis

https://hojberg.xyz/the-programmer-identity-crisis/
154•imasl42•5h ago•150 comments

Apple alerts exploit developer that his iPhone was targeted with gov spyware

https://techcrunch.com/2025/10/21/apple-alerts-exploit-developer-that-his-iphone-was-targeted-wit...
224•speckx•6h ago•110 comments

60k kids have avoided peanut allergies due to 2015 advice, study finds

https://www.cbsnews.com/news/peanut-allergies-60000-kids-avoided-2015-advice/
233•zdw•18h ago•233 comments

The Greatness of Text Adventures

https://entropicthoughts.com/the-greatness-of-text-adventures
87•ibobev•5h ago•62 comments