frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLM FFN benchmarks on a 4‑core HP All‑in‑One

https://rolv.ai/
1•heggenhougen•1h ago

Comments

heggenhougen•1h ago
I ran four feed‑forward network (FFN) layers from real LLMs on a consumer HP All‑in‑One PC (Intel i7‑1165G7, 4 cores, 64 GB RAM). Each test compares:

vendor dense baseline

vendor sparse baseline

a custom sparse operator

identical inputs

identical weights

identical precision

SHA‑256‑verified outputs

wall‑clock timing

psutil‑based power measurement

All baselines (dense and vendor sparse) run normally on this machine. The custom operator only changes runtime performance; it is not required to execute the models.

Below are the raw results and full JSON for reproducibility.

Mistral‑7B Wanda FFN (4096×14336 @ 55% sparsity) Speedup vs dense: 84.3×

Speedup vs vendor sparse: 163.4×

Energy reduction: 98.8%

Tokens/s: 53,330

Dense TFLOPS: 0.07

Effective TFLOPS: 6.26

TTFT: 0.000590 s

python { "benchmark": "Mistral-7B Wanda 55% Sparse FFN", "sparsity_pct": 55.0, "matrix": "4096x14336", "speedup_vs_dense_x": 84.3, "speedup_vs_sparse_x": 163.4, "energy_savings_pct": 98.8, "tokens_per_s_rolv": 53330.0, "tokens_per_s_dense": 633.0, "tokens_per_s_sparse": 326.0, "nominal_gflops_per_iter": 0.94, "dense_tflops": 0.07, "eff_tflops_rolv": 6.26, "ttft_s": 0.00059, "platform": "Intel Core i7-1165G7", "hardware": "4 cores — 63.7 GB RAM" } GPT‑J‑6B FFN (4096×16384 @ 40% sparsity) Speedup vs dense: 90.6×

Speedup vs vendor sparse: 174.8×

Energy reduction: 98.9%

Tokens/s: 38,191

Dense TFLOPS: 0.06

Effective TFLOPS: 5.13

TTFT: 0.000387 s

python { "benchmark": "GPT-J-6B 40% Sparse FFN", "sparsity_pct": 40.0, "matrix": "4096x16384", "speedup_vs_dense_x": 90.6, "speedup_vs_sparse_x": 174.8, "energy_savings_pct": 98.9, "tokens_per_s_rolv": 38191.0, "tokens_per_s_dense": 422.0, "tokens_per_s_sparse": 218.0, "nominal_gflops_per_iter": 1.074, "dense_tflops": 0.06, "eff_tflops_rolv": 5.13, "ttft_s": 0.000387, "platform": "Intel Core i7-1165G7", "hardware": "4 cores — 63.7 GB RAM" } Llama‑2‑7B FFN (4096×11008 @ 70% sparsity) Speedup vs dense: 87.4×

Speedup vs vendor sparse: 116.1×

Energy reduction: 98.9%

Tokens/s: 73,916

Dense TFLOPS: 0.08

Effective TFLOPS: 6.67

TTFT: 0.000392 s

python { "benchmark": "Llama-2-7B 70% Sparse FFN", "sparsity_pct": 70.0, "matrix": "4096x11008", "speedup_vs_dense_x": 87.4, "speedup_vs_sparse_x": 116.1, "energy_savings_pct": 98.9, "tokens_per_s_rolv": 73916.0, "tokens_per_s_dense": 845.0, "tokens_per_s_sparse": 637.0, "nominal_gflops_per_iter": 0.721, "dense_tflops": 0.08, "eff_tflops_rolv": 6.67, "ttft_s": 0.000392, "platform": "Intel Core i7-1165G7", "hardware": "4 cores — 63.7 GB RAM" } BERT‑Base FFN (3072×768, dense) Speedup vs dense: 4.8×

Speedup vs vendor sparse: 23.9×

Energy reduction: 79.0%

Tokens/s: 104,131

Dense TFLOPS: 0.10

Effective TFLOPS: 0.49

TTFT: 0.000322 s

python { "benchmark": "BERT-Base Real FFN", "sparsity_pct": 0.0, "matrix": "3072x768", "speedup_vs_dense_x": 4.8, "speedup_vs_sparse_x": 23.9, "energy_savings_pct": 79.0, "tokens_per_s_rolv": 104131.0, "tokens_per_s_dense": 21895.0, "tokens_per_s_sparse": 4349.0, "nominal_gflops_per_iter": 0.038, "dense_tflops": 0.10, "eff_tflops_rolv": 0.49, "ttft_s": 0.000322, "platform": "Intel Core i7-1165G7", "hardware": "4 cores — 63.7 GB RAM" } Notes All baselines (dense and vendor sparse) run on the same machine.

“Effective TFLOPS” = nominal dense FLOPs ÷ wall‑clock time.

Values above hardware peak indicate fewer multiply‑accumulate operations executed than dense.

Dense TFLOPS is the actual hardware utilization.

Power readings from psutil; not calibrated against external instrumentation.

All outputs are SHA‑256‑verified to match dense results.

heggenhougen•1h ago
Methodology and reproducibility details:

All benchmarks were run on the same machine: HP All‑in‑One, Intel i7‑1165G7 (4 cores), 64 GB RAM.

All tests use identical inputs, identical weights, identical precision, and identical batch size.

Dense baseline uses the system BLAS (MKL/oneDNN depending on environment).

Vendor sparse baseline uses standard CSR/COO kernels.

The custom sparse operator runs in the same Python environment and on the same CPU.

All baselines (dense and vendor sparse) run normally on this hardware; the custom operator only changes runtime performance, not model executability.

Wall‑clock time is measured with time.perf_counter() around the matmul call.

Power readings come from psutil.sensors_battery() and psutil.cpu_freq(); these are not calibrated against external instrumentation.

“Effective TFLOPS” = nominal dense FLOPs ÷ wall‑clock time.

Values above hardware peak indicate fewer multiply‑accumulate operations executed than dense.

Dense TFLOPS is the actual hardware utilization number.

“Tokens/s” is computed as 1 ÷ (per‑iteration wall‑clock time).

TTFT is measured as the time from operator invocation to first output.

All outputs are SHA‑256‑verified to match dense results bit‑for‑bit.

No quantization, no weight modification, and no model retraining were used.

All JSON blocks in the post are the raw outputs from the benchmark script.

Nan Da on Ethics of AIs as Engines of Transductive Inference

https://humanscodes.com/quote/da-transductive-inference
1•ethanmiller•49s ago•1 comments

I built an ephemeral P2P chat with WebRTC, without servers

https://ephemchat.vercel.app/
1•zRinexD•1m ago•1 comments

Show HN: Check for indicators of AI code in a project and dependencies

https://github.com/mat-1/slopcheck
1•matdoesdev•1m ago•0 comments

KakaoTalk's Billionaire Creator Ignited a Global Messaging War (2015)

https://www.forbes.com/sites/ryanmac/2015/03/02/kakaotalk-billionaire-brian-kim-mobile-messaging-...
1•networked•4m ago•0 comments

What Is Agentic Engineering?

https://simonwillison.net/guides/agentic-engineering-patterns/what-is-agentic-engineering/
1•lumpa•5m ago•0 comments

UK rights group launches legal action vs. Steam for distributing music in games

https://www.pcgamer.com/games/notorious-uk-rights-group-launches-legal-action-against-valve-for-d...
1•healsdata•7m ago•0 comments

Simple command-line tool and Emacs package for managing diary entries

https://github.com/radian-software/diary-manager
1•doener•10m ago•0 comments

CachyOS Dethrones Arch as ProtonDB's Top Linux Gamer Desktop Distro

https://www.xda-developers.com/cachyos-dethrones-arch-as-the-top-desktop-distro-for-linux-gamers-...
2•m463•14m ago•0 comments

Lecture 1 – How to Start a Startup (Sam Altman, Dustin Moskovitz)

https://www.youtube.com/watch?v=CBYhVcO4WgI
1•Brysonbw•15m ago•0 comments

An experiment to use GitHub Actions as a control plane for a PaaS

https://towlion.github.io
2•baijum•21m ago•0 comments

Curiosity Candy: Our Inner Toddler on the Internet

https://jaacv.substack.com/p/curiosity-candy-our-inner-toddler
2•jaacv•28m ago•0 comments

Is Tesla Chasing Short-Term Profits Ahead of Long-Term Loyalty?

https://www.notateslaapp.com/news/3759/opinion-tesla-is-chasing-short-term-profits-ahead-of-long-...
3•wscott•28m ago•1 comments

The First Open-Source Agentic AI Physicist [video]

https://www.youtube.com/watch?v=ey9m8v7YDYQ
2•Jupe•29m ago•2 comments

Slop Creep: The Great Enshittification of Software

https://boristane.com/blog/slop-creep-enshittification-of-software/
2•wilhelmklopp•30m ago•1 comments

White House administration set to be paid $10B for brokering TikTok deal

https://www.theguardian.com/technology/2026/mar/14/tiktok-trump-administration-10bn
18•Jimmc414•39m ago•9 comments

Homebrewed Beer Works as a Vaccine

https://reason.com/2026/03/15/enjoy-a-refreshing-diy-beer-vaccine/
3•bilsbie•41m ago•0 comments

Florida Marine Veteran to Leave US After Long Citizenship Battle

https://www.military.com/daily-news/investigations-and-features/2026/03/12/florida-marine-veteran...
3•Jimmc414•42m ago•1 comments

Cannabinoids remove plaque-forming Alzheimer's proteins from brain cells

https://www.salk.edu/news-release/cannabinoids-remove-plaque-forming-alzheimers-proteins-from-bra...
31•anjel•50m ago•10 comments

Why wired headphones are swinging back into style

https://www.cnn.com/world/wired-headphones-comeback-spc
2•paulpauper•53m ago•2 comments

British Post Office Scandal

https://en.wikipedia.org/wiki/British_Post_Office_scandal
3•luu•53m ago•0 comments

AI has exposed age-old problems with university coursework

https://www.theguardian.com/technology/2026/mar/15/ai-has-exposed-age-old-problems-with-universit...
2•paulpauper•54m ago•0 comments

The Joy of Building Slow

https://notbor.ing/words/the-joy-of-building-slow
4•YounesDz•56m ago•0 comments

Show HN: Lockstep – A data-oriented programming language

https://github.com/seanwevans/lockstep
3•goosethe•58m ago•0 comments

The 8kB state container that replaces the infrastructure you'd otherwise rebuild

https://github.com/Baloperson/TinyOp
2•Displayusername•59m ago•0 comments

The Shadow Dev Problem: Why your engineering team is quietly fracturing

https://intentsolved.com/insights/the-shadow-dev-problem
2•donutshop•59m ago•0 comments

What is wisdom, and can it be taught?

https://knowablemagazine.org/content/article/mind/2026/what-is-wisdom-can-it-be-taught
3•wjb3•1h ago•0 comments

Show HN: File converters and 75 tools that run in the browser

2•kalinuxer•1h ago•0 comments

Another Prettier killer has entered the villa

https://bytes.dev/archives/468
3•karlmush•1h ago•0 comments

169 Substacks and Nothing to Read

https://tikiver.se/posts/many-substacks-nothing-to-read/
4•news_hacker•1h ago•0 comments

The new robber barons are the tech tycoons

https://english.elpais.com/opinion/2026-03-15/the-new-robber-barons-are-the-tech-tycoons.html
2•voxadam•1h ago•0 comments