frontpage.

We've been debugging slow cold-start times for large models (70B+) and found a significant bottleneck on A100 clusters related to local storage I/O.

We use snapshot-based loading to pull model states from local NVMe RAIDs directly into VRAM. When running benchmarks to compare A100 (PCIe Gen4) vs H100 (PCIe Gen5), we hit a performance cliff on the A100s.

Throughput results (loading 70GB+ snapshots):

| Configuration | A100 (Gen4) | H100 (Gen5) |

On the A100 setup, as soon as we parallelize the load across 2+ GPUs, the random-read throughput collapses to ~200MB/s. The H100 setup scales linearly up to ~2.2GB/s.

It appears the PCIe Gen4 lanes on the A100 host are getting saturated by the concurrent interrupt load from multiple GPUs requesting pages simultaneously. We initially thought this was a software lock in our runtime, but the H100/Gen5 comparison suggests it's a physical bandwidth/interrupt limitation.

Has anyone else building high-density inference rigs seen this specific degradation on Gen4 NVMe arrays?

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning