frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

ROLV – Beats Vendor Kernels, Cross‑Platform

https://img1.wsimg.com/blobby/go/68eb7843-76aa-4f45-831c-bf0cbe513bde/downloads/52638717-8fe8-4b3b-a46e-4fdd0effb72b/ROLV%20Benchmarks.pdf?ver=1772452185674
1•heggenhougen•1h ago

Comments

heggenhougen•1h ago
rolv.ai publishing reproducible benchmarks showing that ROLV outperforms vendor dense kernels and vendor sparse libraries across GPUs, CPUs, and TPUs while producing the same hash/artifact on every platform. Highlights: commodity Intel Xeon Kimi K2.5 expert FFN slice runs 228.38 ms → 6.36 ms per iteration (35.9×), token throughput 2,240 → 80,500 t/s, and kernel energy drops 97.8%. Nsight‑validated traces and full methodology are in the PDF below; results cover LLM FFNs, recommendation GEMMs, FEM, GNNs, mobile and EV proxies, and production subsamples. We welcome technical critique on methodology, requests for additional baselines, and calls for the exact scripts to reproduce any specific run.
heggenhougen•1h ago
I’m on the Rolv.ai team — below is a detailed expansion on our benchmarks and what to inspect in the PDF. This focuses on artifact parity, methodology, standout results, and the exact verification checks reviewers can run to validate our claims.

Reproducibility and artifact parity We publish reproducible artifact hashes and full environment manifests for NVIDIA, AMD, Intel CPU, AMD CPU, Apple M4, and Google TPU. We do not distribute proprietary binaries or IP; instead the PDF lists the ROLV artifact hash (identical across platforms), container manifests, and the exact command lines and verification tests you can run to confirm matching outputs, checksums, Nsight/perf traces, and power logs.

What we validated and why it matters Cross‑platform parity — identical outputs and checksums across vendor GPUs, CPUs, and TPUs to eliminate measurement drift from build differences.

Vendor comparisons — benchmarks against vendor dense kernels and vendor sparse libraries (cuBLAS/cuSPARSE, ROCm sparse, vendor BLAS on CPUs, TPU sparse primitives where available) with per‑kernel wall time, memory transfer time, and conversion overheads.

Energy and throughput — kernel energy where measurable and end‑to‑end token throughput for LLM slices and iteration times for non‑LLM workloads; Nsight traces and power logs are referenced. Standout, independently validated numbers (March 2026) Kimi K2.5 expert FFN (7168×2048, batch=512, ~87% sparsity) on a commodity Intel Xeon (13 GB usable RAM): dense baseline 228.38 ms → ROLV 6.36 ms per iteration (35.9×); token throughput 2,240 → 80,500 t/s; kernel energy 16,283.97 J → 350.74 J (97.8% saved).

Finite Element Solver (mobile phone chassis drop test): 193.16× speedup; 99.5% energy saved (multi‑CPU).

LLM proxy matrix (4096×5120, 50% sparsity) on NVIDIA B200: 158.72× speedup; 99.37% energy saved; 40.5M t/s with Nsight‑validated tolerance harness.

Large recommendation GEMM (Meta‑style ranking): 98.76× speedup; 99.0% energy saved.

Additional production and research workloads (GNNs, ViT attention, MusicGen, Llama shapes) are listed with per‑run sparsities and exact matrix shapes in the PDF. Methodology highlights (what to inspect in the PDF) Exact shapes and sparsities — matrix dimensions, sparsity pattern (random/pruned/structured), and batch sizes.

Baseline definitions — vendor dense kernel and vendor sparse library baselines include conversion costs; we report raw kernel times and end‑to‑end times.

Measurement rig — wall‑clock timing, Nsight kernel timelines, and device power sampling points; CPU runs include perf counters and the exact kernel invocation sequence.

Tolerance and correctness — numerical tolerance checks, output checksums, and unit tests used to validate functional equivalence.

Repro scripts — container manifests and run_benchmark verification commands are referenced so reviewers can run the verification tests and compare hashes and checksums.

How to Recover Your Stolen Crypto After a Scam–Guidance from Intelligence Wizard

1•Petersrobert•20s ago•0 comments

Prohibited Countries – Mercury Bank

https://support.mercury.com/hc/en-us/articles/28771710754580-Prohibited-countries
1•absqueued•30s ago•0 comments

API to Clean Markdown Docs for AI Agents (No More Stale Endpoints)

https://www.apiflora.dev/
1•LuigiR1•1m ago•1 comments

Dr Seuss Day: 'Without Oxford University, We Don't Get Dr. Seuss'

https://www.bbc.com/news/articles/clywx08zqevo
1•1659447091•2m ago•0 comments

Connected Claude to a 1983 oscilloscope [video]

https://www.youtube.com/shorts/MJHImAx0dAc
1•kmikeym•3m ago•1 comments

FFmpeg at Meta: Media Processing at Scale

https://engineering.fb.com/2026/03/02/video-engineering/ffmpeg-at-meta-media-processing-at-scale/
1•root670•4m ago•0 comments

Managed OpenClaw hosting your own AI assistant in 60 seconds, no server needed

https://www.myopenclaw.cloud
1•danielthego•4m ago•1 comments

People reporting Twitter leaking real names to Israel

https://twitter.com/isfjmocha/status/2028407560382841305
3•smashah•6m ago•0 comments

What Happens When 2 College Dropouts with No Budget Solve Real-Time Translation

https://getseagull.com/
1•saintcya•8m ago•0 comments

The Poison of Inertia

https://cloudedjudgement.substack.com/p/clouded-judgement-22726-the-poison
1•mooreds•8m ago•0 comments

BullshitBench: Models Answering Nonsense Questions

https://petergpt.github.io/bullshit-benchmark/viewer/index.html
1•simianwords•8m ago•0 comments

Show HN: I built a sub-500ms latency voice agent from scratch

https://www.ntik.me/posts/voice-agent
3•nicktikhonov•8m ago•1 comments

Thunderstorms conjure ghostly coronae in treetops

https://phys.org/news/2026-02-thunderstorms-conjure-ghostly-coronae-treetops.html
1•wglb•10m ago•1 comments

More Is Different (1972) [pdf]

https://www.tkm.kit.edu/downloads/TKM1_2011_more_is_different_PWA.pdf
2•mooreds•10m ago•0 comments

Catch exhaustion before it burns out your engineers

https://github.com/Rootly-AI-Labs/On-Call-Health
1•jjtang1•10m ago•0 comments

CIAM Weekly: An Interview with Brian Bell

https://ciamweekly.substack.com/p/an-interview-with-brian-bell
1•mooreds•11m ago•0 comments

Yukon Time Zone

https://en.wikipedia.org/wiki/Yukon_Time_Zone
1•The_Fox•11m ago•0 comments

The AWS SDK for .NET: A Code Quality Wake-Up Call

https://dotnettips.wordpress.com/2026/03/01/inside-the-aws-sdk-for-net-a-code-quality-wake-up-call/
1•SBArbeit•11m ago•1 comments

AI Won't Automatically Accelerate Clinical Trials

https://press.asimov.com/articles/ai-clinical-trials
1•gmays•12m ago•0 comments

Rembrandt's Vision of Zacharias in the Temple rediscovered after 65 years

https://www.bbc.com/news/articles/c1kgln0yg3po
1•1659447091•12m ago•0 comments

SerpApi Filed Motion to Dismiss Google's Lawsuit

https://daringfireball.net/linked/2026/03/02/serpapi-motion-to-dismiss
3•hartator•13m ago•0 comments

Show HN: I built an AI sound effects generator for game devs

https://www.audiomus.com/
3•rayediaz•14m ago•1 comments

Firefox 149 beta develops a split personality

https://www.theregister.com/2026/03/02/firefox_149_beta/
5•speckx•20m ago•1 comments

Show HN: IndieMe – AI for building music artist identity and release strategy

https://www.indie-me.ai/
2•JY058•22m ago•0 comments

Welcome (Back) to Macintosh

https://take.surf/2026/03/01/welcome-back-to-macintosh
2•Udo_Schmitz•22m ago•0 comments

Show HN: Ed – A modern take on ancient codebook technology

https://gitlab.com/here_forawhile/ed
1•smalltorch•24m ago•0 comments

SDK code mode shows SotA accuracy for operating APIs via MCP

https://www.stainless.com/blog/sdk-code-mode
2•kwhinnery•26m ago•0 comments

Ask HN: Would engineers be interested in a technical prep consultant?

2•TechPrepper•27m ago•2 comments

Show HN: Flowly – a macOS app that brings smooth, fluid scrolling to any mouse

https://flowlyapp.dev
3•simonij•28m ago•1 comments

18,000 lines to replace a screenshot

https://www.meetblueberry.com/blog/18000-lines-to-replace-a-screenshot
3•andrewmichael27•28m ago•3 comments