frontpage.

I benchmarked ROLV against dense cuBLAS on the actual Llama 4 Maverick MoE expert FFN layer (up_proj, 16384×5120, bfloat16) pulled directly from HuggingFace (model-00001-of-00084.safetensors). Numbers (Batch=512, 1000 iters, NVIDIA B200): Tokens/s: 369K (cuBLAS) → 7.66M (ROLV) — 20.7x faster TFLOPS (effective): 62 → 1,285 — 20.7x Time to First Token: 64.8ms → 0.37ms — 177x faster Energy: 232J → 43J — 81.5% savings ROLV exploits structured sparsity in MoE expert weights to skip large blocks of computation entirely, while producing canonically equivalent output (hash-verified). The TFLOPS figure is "effective" — computed as if doing the full dense multiply — so the 1285 TFLOPS isn't violating hardware peak; it's reflecting how much work was avoided. The TTFT speedup (177x) is especially relevant for interactive inference: MoE models spend a huge fraction of first-token latency in these expert projections, and collapsing that from 65ms to 0.4ms per layer changes what's possible for real-time applications. Verified with norm hashes at both ends (baseline and ROLV output) and a canonical check. Weights are real, not synthetic. Setup: PyTorch 2.8.0+cu128, CUDA 12.8, Python 3.12, NVIDIA B200.

The bitter $23M legal battle that ended the Sriracha-pepper partnership

Yakuza creator's new game in doubt as NetEase pulls funding

Freestiler – PMTiles vector tilesets from R and Python

How not to test LLM models

Utilization metrics across accelerators (GPUs, TPUs, and so on)

Behavioral Effects of High Peak Power Microwave Pulses (1992) [pdf]

Microsoft Outlook app now showing paid spam/phishing ad's

Show HN: PDF to JPG converter that runs in the browser (no uploads)

Show HN: ClarifyDoc – explains contracts in plain English

Small web publishing tools and frameworks

Self-hosted docs platform – 4 PHP files, no database, free GitBook alternative

Ask HN: What should an international dev do today?

AI Agent Site Score Scanner

Can the mental health benefits of exercise be bottled?

Coasts: Localhost service isolation and orchestration for Git worktrees

China's AI progress by the numbers: GLM-5 benchmarks, robotaxi, and Huawei chips

Show HN: VectorLens – See why your RAG hallucinates, no config

Agentic Debt

Show HN: Dashboard for monitoring multiple Claude Code sessions

Neuroscientists have pinpointed a potential biological signature for psychopathy

60 Minutes Havana Syndrome report finds U.S. government tested energy weapon

Flexible feline spines shed light on "falling cat" problem

Iran Transformed

Agent Skill to Use a Debugger

EU publishers won a piece of a shrinking pie

Fukushima at 15: Living with radioactive hot spots and stigma

Show HN: ChopChopGo – Sigma-based threat hunting for Linux forensic artifacts

Animator Pro (Autodesk Animator) Source Code

We strongly oppose the Unified Attestation initiative

Oscar Pool Ballot, 98th Academy Awards

Show HN: ROLV – 20x faster MoE FFN inference on Llama 4 Maverick vs. cuBLAS

Comments