frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
177•ColinWright•1h ago•161 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
124•AlexeyBrin•7h ago•24 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
20•valyala•2h ago•7 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
16•valyala•2h ago•1 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
65•vinhnx•5h ago•9 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
153•alephnerd•2h ago•105 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
831•klaussilveira•22h ago•250 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
57•thelok•4h ago•8 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
117•1vuio0pswjnm7•8h ago•148 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1060•xnx•1d ago•612 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
79•onurkanbkrc•7h ago•5 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
4•gnufx•55m ago•1 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
486•theblazehen•3d ago•177 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
212•jesperordrup•12h ago•72 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
567•nar001•6h ago•258 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
225•alainrk•6h ago•354 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
39•rbanffy•4d ago•7 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
9•momciloo•2h ago•0 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
19•brudgers•5d ago•4 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
8•languid-photic•3d ago•1 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
29•marklit•5d ago•3 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
114•videotopia•4d ago•32 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
77•speckx•4d ago•82 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
274•isitcontent•22h ago•38 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
201•limoce•4d ago•112 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
287•dmpetrov•22h ago•155 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
22•sandGorgon•2d ago•12 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
557•todsacerdoti•1d ago•269 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
155•matheusalmeida•2d ago•48 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
427•ostacke•1d ago•111 comments
Open in hackernews

Continuous Nvidia CUDA Profiling in Production

https://www.polarsignals.com/blog/posts/2025/10/22/gpu-profiling
98•brancz•3mo ago

Comments

gnurizen•3mo ago
Author here, would be happy to field any questions or feedback!
sirhcm•3mo ago
Does the profiler read any of the GPU's performance counters? Would be super cool to have an open source tool that can capture the same data nsight compute does.
gnurizen•3mo ago
This profiler is focused on kernel execution but we do scrape high level metrics (https://www.polarsignals.com/blog/posts/2025/06/04/latest-in... which is based on https://github.com/polarsignals/gpu-metrics-agent). What performance counters in particular were you interested in?
sirhcm•3mo ago
Cache hit rate is probably the most immediately useful. Although given that this is for always-on profiling maybe this project isn't as geared towards optimizing kernels as I originally thought? In theory reading the counters should be low overhead though.
porridgeraisin•3mo ago
It depends on what counter.

[ All from my experience on home GPUs, and in lah with 2 nodes with 2 80GB H100 each. Not extensively benchmarked ]

Events like kernel launch, which this profiler reads right now, is a very small overhead (1-2%). Kernel level metrics like DRAM utilisation, cache hit rate, SM occupancy, etc usually give you a 5-10% overhead. If you want to plot a flame graph at a instruction level (mostly useful for learning purposes) then you go off the rails - even 25% overhead I have seen. And finally full traces add tons of overhead but that's pretty much expected - they anyways produce GBs of profiling data.

sirhcm•3mo ago
Occupancy and RAM utilization are available from static analysis. A sampling profiler would also obviously not be suitable for this always-on profiler case. But reading the counters [0] from the GSP should be cheap.

[0] https://en.wikipedia.org/wiki/Hardware_performance_counter

embedding-shape•3mo ago
This "low-overhead always on GPU profiler" seems really cool and useful, but we're not using Kubernetes for anything, and the instructions for how to use it seems to only include Kubernetes. Is there a way of running this without Kubernetes?
gnurizen•3mo ago
Yeah the quickstart guide covers docker, k8s and "raw" binary options:

https://www.parca.dev/docs/quickstart/

knlb•3mo ago
Thanks for the post, this is pretty cool!

I feel like I've seen Cupti have fairly high overhead depending on the cuda version, but I'm not very confident -- did you happen to benchmark different workloads with cupti on/off?

---

If you're taking feature requests: a way to subscribe to -- and get tracebacks for -- cuda context creation would be very useful; I've definitely been surprised by finding processes on the wrong gpu and being easily able to figure out where they came from would be great.

I did a hack by using LD_PRELOAD to subscribe/publish the event, but never really followed through on getting the python stack trace.

gnurizen•3mo ago
CUPTI is kind of a choose your own adventure thing, as you subscribe to more stuff the overhead goes up, this is kind of minimalist profiler that just subscribes to the kernel launches and nothing else. Still to your point depending on kernel launch frequency/granularity it may be higher overhead than some would want in production, we have plans to address that with some probabilistic sampling instead of profiling everything but wanted to get this into folks hands and get some real world feedback first.