frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ROLV – 20x faster MoE FFN inference on Llama 4 Maverick vs. cuBLAS

https://rolv.ai
1•heggenhougen•2h ago
I benchmarked ROLV against dense cuBLAS on the actual Llama 4 Maverick MoE expert FFN layer (up_proj, 16384×5120, bfloat16) pulled directly from HuggingFace (model-00001-of-00084.safetensors). Numbers (Batch=512, 1000 iters, NVIDIA B200): Tokens/s: 369K (cuBLAS) → 7.66M (ROLV) — 20.7x faster TFLOPS (effective): 62 → 1,285 — 20.7x Time to First Token: 64.8ms → 0.37ms — 177x faster Energy: 232J → 43J — 81.5% savings ROLV exploits structured sparsity in MoE expert weights to skip large blocks of computation entirely, while producing canonically equivalent output (hash-verified). The TFLOPS figure is "effective" — computed as if doing the full dense multiply — so the 1285 TFLOPS isn't violating hardware peak; it's reflecting how much work was avoided. The TTFT speedup (177x) is especially relevant for interactive inference: MoE models spend a huge fraction of first-token latency in these expert projections, and collapsing that from 65ms to 0.4ms per layer changes what's possible for real-time applications. Verified with norm hashes at both ends (baseline and ROLV output) and a canonical check. Weights are real, not synthetic. Setup: PyTorch 2.8.0+cu128, CUDA 12.8, Python 3.12, NVIDIA B200.

Comments

heggenhougen•2h ago
Happy to answer questions. Quick note on methodology: the TFLOPS figure is effective (computed as if doing the full dense multiply) — ROLV doesn't violate hardware peak, it avoids work entirely via structured sparsity. Weights are pulled directly from HuggingFace, output verified with norm hashes and a canonical check. If you want to run a baseline on your own hardware, there's a validation kit at rolv.ai.

The bitter $23M legal battle that ended the Sriracha-pepper partnership

https://nearlyright.com/the-pepper-farmer-the-hot-sauce-king-and-the-23-million-betrayal-that-bro...
1•speckx•1m ago•0 comments

Yakuza creator's new game in doubt as NetEase pulls funding

https://www.polygon.com/gang-of-dragon-toshihiro-nagoshi-studio-netease/
1•sagacity•1m ago•0 comments

Freestiler – PMTiles vector tilesets from R and Python

https://walker-data.com/freestiler/
1•carnevalem•3m ago•0 comments

How not to test LLM models

https://theartificialq.github.io/2026/03/08/how-not-to-test-llm-models.html
1•HonzaT•5m ago•0 comments

Utilization metrics across accelerators (GPUs, TPUs, and so on)

https://github.com/gpusprint/gpusprint
1•heyjupiter•6m ago•0 comments

Behavioral Effects of High Peak Power Microwave Pulses (1992) [pdf]

https://apps.dtic.mil/sti/tr/pdf/ADA258136.pdf
2•anonu•7m ago•0 comments

Microsoft Outlook app now showing paid spam/phishing ad's

https://imgur.com/a/O9bjjQQ
1•xvxvx•8m ago•1 comments

Show HN: PDF to JPG converter that runs in the browser (no uploads)

https://privatepdftojpg.com/
1•touchsomegrass•9m ago•0 comments

Show HN: ClarifyDoc – explains contracts in plain English

https://clarifydoc.xyz/
1•tgdaimov•11m ago•0 comments

Small web publishing tools and frameworks

https://codeberg.org/thgie/awesome-small-web-publishing
2•smartmic•11m ago•0 comments

Self-hosted docs platform – 4 PHP files, no database, free GitBook alternative

https://github.com/webstudio-ltd/docs
3•webstudioltd•11m ago•4 comments

Ask HN: What should an international dev do today?

2•jzu•12m ago•1 comments

AI Agent Site Score Scanner

https://prodlint.com/score
1•AMARCOVECCHIO99•13m ago•0 comments

Can the mental health benefits of exercise be bottled?

https://medicalxpress.com/news/2026-02-mental-health-benefits-bottled.html
1•PaulHoule•13m ago•0 comments

Coasts: Localhost service isolation and orchestration for Git worktrees

https://github.com/coast-guard/coasts
1•handfuloflight•14m ago•0 comments

China's AI progress by the numbers: GLM-5 benchmarks, robotaxi, and Huawei chips

https://medium.com/ai-advances/china-winning-ai-race-deepseek-nvidia-ca7de8a727ec
1•Aedelon•15m ago•0 comments

Show HN: VectorLens – See why your RAG hallucinates, no config

1•gustav-proxi•15m ago•0 comments

Agentic Debt

https://neilkakkar.com/agentic-debt.html
2•neilkakkar•15m ago•0 comments

Show HN: Dashboard for monitoring multiple Claude Code sessions

https://github.com/Stargx/claude-code-dashboard
1•Stargx•17m ago•1 comments

Neuroscientists have pinpointed a potential biological signature for psychopathy

https://www.psypost.org/neuroscientists-have-pinpointed-a-potential-biological-signature-for-psyc...
2•amichail•19m ago•0 comments

60 Minutes Havana Syndrome report finds U.S. government tested energy weapon

https://www.cbsnews.com/news/60-minutes-havana-syndrome-report-finds-u-s-government-tested-energy...
6•jonas21•22m ago•1 comments

Flexible feline spines shed light on "falling cat" problem

https://arstechnica.com/science/2026/03/tuck-and-turn-or-bend-and-twist-how-falling-cats-land-on-...
2•Tomte•22m ago•0 comments

Iran Transformed

https://www.nybooks.com/online/2026/03/08/iran-transformed/
1•mitchbob•26m ago•1 comments

Agent Skill to Use a Debugger

https://github.com/AlmogBaku/debug-skill
1•talolard•26m ago•1 comments

EU publishers won a piece of a shrinking pie

https://mediaindustryshift.substack.com/p/eu-publishers-won-a-piece-of-a-shrinking
3•taubek•26m ago•0 comments

Fukushima at 15: Living with radioactive hot spots and stigma

https://thebulletin.org/2026/03/fukushima-at-15-living-with-radioactive-hot-spots-and-stigma/
2•CqtGLRGcukpy•28m ago•0 comments

Show HN: ChopChopGo – Sigma-based threat hunting for Linux forensic artifacts

https://github.com/M00NLIG7/ChopChopGo
1•M00NL1G7•28m ago•1 comments

Animator Pro (Autodesk Animator) Source Code

https://github.com/AnimatorPro/Animator-Pro-C
1•reconnecting•29m ago•1 comments

We strongly oppose the Unified Attestation initiative

https://xcancel.com/i/status/2031041385554386960
5•ledoge•29m ago•2 comments

Oscar Pool Ballot, 98th Academy Awards

http://fxrant.blogspot.com/2026/03/oscar-pool-ballot-98th-academy-awards.html
1•speckx•30m ago•0 comments