frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
576•klaussilveira•10h ago•167 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
889•xnx•16h ago•540 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
91•matheusalmeida•1d ago•20 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
18•helloplanets•4d ago•10 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
21•videotopia•4d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
197•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•11h ago•91 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
307•vecti•13h ago•136 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
352•aktau•17h ago•175 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
350•ostacke•17h ago•91 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
453•todsacerdoti•19h ago•228 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
20•romes•4d ago•2 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
79•quibono•4d ago•18 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
52•kmm•4d ago•3 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
253•eljojo•13h ago•153 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
388•lstoll•17h ago•263 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
5•bikenaga•3d ago•1 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
231•i5heu•13h ago•175 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
12•neogoose•3h ago•7 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•10h ago•12 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
24•gmays•6h ago•6 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
116•SerCe•7h ago•94 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
135•vmatsiiako•16h ago•59 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
43•gfortaine•8h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
268•surprisetalk•3d ago•36 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
168•limoce•3d ago•87 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1039•cdrnsf•20h ago•431 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•18h ago•22 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
88•antves•1d ago•63 comments
Open in hackernews

Kvcached: Virtualized, elastic KV cache for LLM serving on shared GPUs

https://www.notion.so/yifanqiao/Solve-the-GPU-Cost-Crisis-with-kvcached-289da9d1f4d68034b17bf2774201b141
69•Jrxing•3mo ago
https://github.com/ovg-project/kvcached

Comments

CharlesW•3mo ago
Actual title: "Solve the GPU Cost Crisis with kvcached: A library to enable virtualized, elastic KV cache for LLM serving on shared GPUs"
dang•3mo ago
Yes, we've put that in the title above (shortened to fit HN's 80 char limit). Submitted title was "Time to build a GPU OS? Here is the first step".
jewel•3mo ago
In my imagination, I thought that the large GPU clusters were dynamically allocating whole machines to different tasks depending on load.

So, hypothetically, if ChatGPT's peak load and their minimum load were a 3× ratio, they'd reallocate 2/3 of their servers to training when it's not peak time.

Doing the same thing inside an individual GPU seems irrelevant to anyone operating at scale when they can approximate the same behavior with entire servers or even entire racks.

Jrxing•3mo ago
Sharing the big GPU cluster with non-latency critical load is one solution we also explored.

For this work, we are targeting more on the problem of smaller models running SOTA GPUs. Distilled/fine-tuned small models have shown comparable performance in vertial tasks.

noxa•3mo ago
Neat! As someone working in this space and feeling like I've been taking crazy pills from how these "duh, CPU solved this 30 years ago" things keep slipping it's great to see more people bridging the gap! Unfortunately CUDA/HIP (and the entire stack beneath them) virtual memory management ops are very expensive host APIs (remapping a big block of pages can be O(n^2) with page count and fully synchronize host/device (forced wait idle), take kernel locks, etc) so it hasn't been viable in all cases. If your workloads are submit/wait with host in the loop the VM tricks are ok but if you are trying to never block the GPU (pipeline depth > 0) you really want to avoid anything that does a page table modification (until we get GPUs that can pipeline those). vkQueueBindSparse is one of the few async APIs I've seen, and CUDA has cuMemMapArrayAsync but I haven't yet used it (because arrays are annoying and without being able to inspect the driver I'm sure it's probably doing the wrong thing).

I've had good luck with indirection tables used during lookup inside of the kernels consuming/producing the kvcache data - it's essentially user-mode remapping like they do here: you can publish a buffer offset table and threads are uniform, have coalesced reads to the table, and cache the offsets no problem. You have the same memory locality issues as VM (contiguous virtual but potentially random physical) but are not limited to device page sizes and since you can update while work is in-flight you can be much more aggressive about reuse and offload (enqueue DMA to cold storage to evict from VRAM, enqueue DMA to copy from cold memory into reused VRAM, enqueue offset table update, enqueue work using them, repeat - all without host synchronization). You can also defrag in-flight if you do want to try to restore the physical locality. It's nothing crazy and fairly normal in CPU land (or even classic virtual texturing), but in ML GPU land I could write a big paper on it and call it SuperDuperFancyAttention4 and publish press releases...

ivanium•3mo ago
(Disclaimer: I am one of the authors of the project) Thank you for the thoughtful and insightful comment. I really love the depth of your first paragraph. You highlighted a concern in this space that is often overlooked, and I am glad you raised it. We spent a significant amount of time dealing with the cost of dynamic GPU memory operations.

One useful observation is that LLM inference has almost no host API calls during steady state, since the GPU must stay busy with continuous kernel launches or CUDA graph replay. You are absolutely right that CUDA and HIP virtual memory operations are expensive on the host side and involve heavy driver work. However, they introduce only small stalls in the GPU pipeline, because most of the cost is paid on the host. These operations are also relatively infrequent compared to kernel launches in practice, so we offload them to a background thread to keep them off the critical path. The APIs are not cheap in general, but they happen to fit LLM inference surprisingly well.

On your second point, I guess I follow your idea, although please correct me if I misunderstood. Virtual memory does open the door to paging and offloading, which is also important for LLM systems. We are actively working on this direction in kvcached. Your defragmentation point also reminds me of classic techniques such as compaction and garbage collection. They could certainly help, though I guess the trade off between benefit and complexity would need more careful evaluation.

Thank you again for the thoughtful analysis. It was a pleasure to read. I would be happy to continue the discussion.

BergAndCo•3mo ago
[flagged]
Jrxing•3mo ago
Hi, thanks for digging out who I am. Yes, I am the author of the blog and the project.

We polished the blog for several days. I didn't get how you could conclude that this is AI generated. Is it too good to be human written?

anonymous908213•3mo ago
My impression was that it was likely LLM-written, human-reviewed. Due to a lack of knowledge on the subject/field, I can't comment on the substance of the technical details, which often reveal the shortcomings of LLM blabble, but the writing style certainly comes across as that of an LLM.

Most evidently, the incoherent usage of bold text littered constantly throughout the article, together with the infamous and poorly used em-dash spam. This snippet stood out to me particularly badly, as this does not seem like a case where even one of those odd humans who love em-dashes would use one:

"You might have heard that PagedAttention manages the KV cache using memory pages, which significantly improves memory utilization. That’s true—*but only within a single application.*"

Then you get lines like this one, which combine both random bold text and the em-dash with my most-hated LLMism, "it's not just X, but Y":

"The history of CPU systems shows that *efficiency is not just a hardware problem—it’s also a system design problem.*"

The introductory paragraph also has this (yet again, randomly bolded) LLM sensationalization that a human technical writer would be thoroughly embarrassed to have associated with their writing:

"Behind the $300 billion projected spend on GPU hardware in 2025 lies *a dark truth*: much of this expensive hardware sits *vastly underutilized.*"

Not to mention it's repeated...

"Yet behind the headlines of record spending lies a *quieter story*: much of this expensive hardware sits *vastly underutilized.*"

Your response of "is it too good to be human written" certainly doesn't restore confidence, notwithstanding the lack of humility that would be required to say that about what is allegedly your own writing. LLM writing is visible because it is awful, if you have any comprehension for what good writing looks like. The idea that LLM writing could possibly be "too good" is a truly despairing belief for someone to hold, because it means they themselves have so little understanding of good writing that they think an LLM can output good writing.

I almost wanted to give you a pass for having an LLM write an English article for you, since your response hints that English is not your native language ("I didn't get how you could conclude" is a very ESL-like mistaken tense). But you apparently have a Ph.D. and are working as a professor. I'm not familiar with academic standards these days, but is it really accepted to be claiming LLM output as your own writing...?

JonChesterfield•3mo ago
Glossing over "is academia commonly fraudulent" as rather too easy a target, LLMs do tend to write much better than people in languages said people don't really know.

If I wrote the above in Spanish it would be extremely difficult to guess what I'm trying to say. If I ask an llm to translate it, some of the ideas would get across.

anonymous908213•3mo ago
Sure, but you'd still expect people to know from seeing LLM outputs in their own language that it's not going to be "too good to be human". An LLM will write a better Chinese essay than I could because I don't speak Chinese very well at all, but it's quite a leap to get from "this writes Chinese essays better than myself, a non-speaker" to "wow, this writes Chinese better than any human!".
dang•3mo ago
Please don't cross into personal attack.

https://news.ycombinator.com/newsguidelines.html