frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

2% of ICML papers desk rejected because the authors used LLM in their reviews

https://blog.icml.cc/2026/03/18/on-violations-of-llm-review-policies/
28•sergdigon•23m ago•8 comments

Conway's Game of Life, in real life

https://lcamtuf.substack.com/p/conways-game-of-life-in-real-life
144•surprisetalk•6h ago•34 comments

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

https://gitlab.com/IsolatedOctopi/nvidia_greenboost
340•mmastrac•3d ago•69 comments

Stdwin: Standard window interface by Guido Van Rossum [pdf]

https://ir.cwi.nl/pub/5998/5998D.pdf
24•ivanbelenky•1d ago•6 comments

Warranty Void If Regenerated

https://nearzero.software/p/warranty-void-if-regenerated
350•Stwerner•13h ago•200 comments

OpenRocket

https://openrocket.info/
571•zeristor•3d ago•101 comments

Austin’s surge of new housing construction drove down rents

https://www.pew.org/en/research-and-analysis/articles/2026/03/18/austins-surge-of-new-housing-con...
548•matthest•10h ago•645 comments

LotusNotes

https://computer.rip/2026-03-14-lotusnotes.html
92•TMWNN•4d ago•39 comments

A sufficiently detailed spec is code

https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code
372•signa11•8h ago•198 comments

Autoresearch for SAT Solvers

https://github.com/iliazintchenko/agent-sat
128•chaisan•10h ago•24 comments

Wander – A tiny, decentralised tool to explore the small web

https://susam.net/wander/
286•susam•1d ago•73 comments

Why Cloudflare rule order matters?

https://www.brzozowski.io/web-applications/2025/03/11/why-cloudflare-rule-order-matters.html
27•redfr0g•3d ago•4 comments

Eniac, the First General-Purpose Digital Computer, Turns 80

https://spectrum.ieee.org/eniac-80-ieee-milestone
18•baruchel•4h ago•9 comments

Nvidia NemoClaw

https://github.com/NVIDIA/NemoClaw
317•hmokiguess•19h ago•215 comments

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

https://github.com/alainnothere/llm-circuit-finder
122•xlayn•13h ago•38 comments

RX – a new random-access JSON alternative

https://github.com/creationix/rx
93•creationix•10h ago•38 comments

Cook: A simple CLI for orchestrating Claude Code

https://rjcorwin.github.io/cook/
206•staticvar•8h ago•55 comments

Show HN: I built 48 lightweight SVG backgrounds you can copy/paste

https://www.svgbackgrounds.com/set/free-svg-backgrounds-and-patterns/
276•visiwig•18h ago•53 comments

The math that explains why bell curves are everywhere

https://www.quantamagazine.org/the-math-that-explains-why-bell-curves-are-everywhere-20260316/
136•ibobev•2d ago•71 comments

Show HN: Pano, a bookmarking tool built around shareable shelves

https://www.panoit.com
18•uelbably•4d ago•7 comments

Show HN: Browser grand strategy game for hundreds of players on huge maps

https://borderhold.io/play
35•sgolem•3d ago•16 comments

Show HN: Will my flight have Starlink?

225•bblcla•17h ago•296 comments

Czech Man's Stone in Barn's Foundations Is Rare Bronze Age Spearhead Mold

https://www.smithsonianmag.com/smart-news/a-czech-man-used-this-stone-in-his-barns-foundations-it...
49•bookofjoe•2d ago•11 comments

The Serial Safety Net: Efficient Concurrency Control on Modern Hardware

http://muratbuffalo.blogspot.com/2026/03/the-serial-safety-net-efficient.html
5•ingve•4d ago•0 comments

Book: The Emerging Science of Machine Learning Benchmarks

https://mlbenchmarks.org/00-preface.html
121•jxmorris12•4d ago•6 comments

Mozilla to launch free built-in VPN in upcoming Firefox 149

https://cyberinsider.com/mozilla-to-launch-free-built-in-vpn-in-upcoming-firefox-149/
136•adrianwaj•7h ago•93 comments

What 81,000 people want from AI

https://www.anthropic.com/features/81k-interviews
133•dsr12•5h ago•108 comments

Rob Pike’s Rules of Programming (1989)

https://www.cs.unc.edu/~stotts/COMP590-059-f24/robsrules.html
934•vismit2000•1d ago•435 comments

CVE-2026-3888: Important Snap Flaw Enables Local Privilege Escalation to Root

https://blog.qualys.com/vulnerabilities-threat-research/2026/03/17/cve-2026-3888-important-snap-f...
138•askl•18h ago•88 comments

OpenAI Has New Focus (on the IPO)

https://om.co/2026/03/17/openai-has-new-focus-on-the-ipo/
231•aamederen•23h ago•207 comments
Open in hackernews

Faster sorting with SIMD CUDA intrinsics (2024)

https://winwang.blog/posts/bitonic-sort/
92•winwang•10mo ago
Code at https://github.com/wiwa/blog-code/

Comments

ashvardanian•10mo ago
The article covers extremely important CUDA warp-level synchronization/exchange primitives, but it's not what is generally called SIMD in the CUDA land .

Most "CUDA SIMD" intrinsics are designed to process a 32-bit data pack containing 2x 16-bit or 4x 8-bit values (<https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/gro...>). That significantly shrinks their applicability in most domains outside of video and string processing. I've had pretty high hopes for DPX on Hopper (<https://developer.nvidia.com/blog/boosting-dynamic-programmi...>) instructions and started integrating them in StringZilla last year, but the gains aren't huge.

winwang•10mo ago
Oh wow, TIL, thanks. I usually call stuff like that SWAR, and every now-and-then I try to think of a way to (fruitfully) use it. The "SIMD" in this case was just an allusion to warp-wide functions looking like how one might use SIMD in CPU code, as opposed to typical SIMT CUDA.

Also, StringZilla looks amazing -- I just became your 1000th Github follower :)

ashvardanian•10mo ago
Thanks, appreciate the gesture :)

Traditional SWAR on GPUs is a fascinating topic. I've begun assembling a set of synthetic benchmarks to compare DP4A vs. DPX (<https://github.com/ashvardanian/less_slow.cpp/pull/35>), but it feels incomplete without SWAR. My working hypothesis is that 64-bit SWAR on properly aligned data could be very useful in GPGPU, though FMA/MIN/MAX operations in that PR might not be the clearest showcase of its strengths. Do you have a better example or use case in mind?

winwang•10mo ago
I don't -- unfortunately not too well-versed in this field! But I was a bit fascinated with SWAR after I randomly thought of how to prefix-sum with int multiplication, later finding out that it is indeed an old trick as I suspected (I'm definitely not on this thread btw): https://mastodon.social/@dougall/109913251096277108

As for 64-bit... well, I mostly avoid using high-end GPUs, but I was of the impression that i64 is just simulated. In fact, I was thinking of using the full warp as a "pipeline" to implement u32 division (mostly as a joke), almost like anti-SWAR. There was some old-ish paper detailing arithmetic latencies in GPUs and division was approximately more than 32x multiplication (...or I could be misremembering).

bobmcnamara•10mo ago
Parallel compares: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInW...
DennisL123•10mo ago
Interesting stuff. Not sure if I read this right that it‘s 16 und 32 bit values of integers that get sorted. If yes, I‘d love to see if the GPU implementation can beat a competitive Radix sort implementation on a CPU.
winwang•10mo ago
It's 32 32-bit values which get sorted. I don't think a GPU sort would beat a CPU sort at this scale, even if you don't take kernel launch time into account. CPUs are simply too fast for (super-)small data, especially with AVX-512. But if we're talking about a larger amount of data, that would be a different story, i.e. as part of a normal gpu mergesort.
maeln•10mo ago
It is also useful if your data already lives on the GPU memory. For example, when you need to z-sort a bunch of particles in a 3d renderer particle system.
exDM69•10mo ago
A 32 way GPU sorting algorithm might be just what I need for sorting and deduplicating triangle id's in a visibility buffer renderer I am working on.

Thanks for sharing.

winwang•10mo ago
As someone who doesn't know very much about graphics (ironically), you're welcome and hope it helps!
fourseventy•10mo ago
What are the biggest use cases of GPU accelerated sorting?