frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Field of clones: How horse replicas came to dominate polo

https://knowablemagazine.org/content/article/technology/2026/cloned-polo-horses
26•gscott•1h ago•7 comments

Valve P2P networking broken for more than 2 months

https://github.com/ValveSoftware/GameNetworkingSockets/issues/398
22•babuskov•34m ago•4 comments

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

https://arxiv.org/abs/2601.14470
40•Anon84•2h ago•5 comments

Harness engineering: Leveraging Codex in an agent-first world

https://openai.com/index/harness-engineering/
110•pramodbiligiri•1d ago•65 comments

Ntsc-rs – open-source video emulation of analog TV and VHS artifacts

https://ntsc.rs/
292•gregsadetsky•8h ago•69 comments

Public Domain Image Archive

https://pdimagearchive.org/
53•davidbarker•3h ago•10 comments

Introducing Boron Buckyballs: Theory that B80 cages can’t be made is disproved

https://cen.acs.org/materials/nanomaterials/buckyballs-boron-buckminster-fullerene-nanomaterials/...
50•crescit_eundo•2d ago•9 comments

An Ohio Valley 100k-Watt FM Signal Is Severed in Broad Daylight – Radio World

https://www.radioworld.com/news-and-business/headlines/an-ohio-valley-100000-watt-fm-signal-is-se...
69•pkaeding•2h ago•56 comments

Show HN: Oproxy – inspect and modify network traffic from the browser

https://github.com/sauravrao637/oproxy
13•sauravrao637•1h ago•1 comments

Show HN: TakoVM – Isolated model and tool execution used by enterprises

https://github.com/las7/TakoVM
8•sakuraiben•1h ago•0 comments

Biohub releases a world model of protein biology

https://biohub.org/news/world-model-of-protein-biology/
12•gmays•3d ago•0 comments

Moving beyond fork() + exec()

https://lwn.net/SubscriberLink/1076018/16f01bbbb8e0d1f0/
272•jwilk•13h ago•268 comments

Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot

https://this.weekinsecurity.com/meta-confirms-thousands-of-instagram-accounts-were-hacked-by-abus...
477•speckx•9h ago•169 comments

Zeroserve: A zero-config web server you can script with eBPF

https://su3.io/posts/introducing-zeroserve
202•losfair•12h ago•52 comments

Nvidia is proposing a beast of a CPU system for Windows PCs

https://twitter.com/lemire/status/2062880075117113739
250•tosh•15h ago•444 comments

Sem: New primitive for code understanding – not LSPs, but entities on top of Git

https://ataraxy-labs.github.io/sem/
74•rohanucla•7h ago•29 comments

Symbolica 2.0: Programmable Symbols for Python and Rust

https://symbolica.io/posts/symbolica_2_0_release/
6•mmastrac•1d ago•0 comments

Show HN: DomainTasker – avoid losing domains and surprise renewals

https://domaintasker.com/
17•si_164•3h ago•9 comments

Google to pay SpaceX $920M a month for compute capacity at xAI data centers

https://www.cnbc.com/2026/06/05/google-to-pay-spacex-920-million-a-month-for-xai-compute-capacity...
177•toephu2•1d ago•755 comments

Pokemon Emerald Ported to WebAssembly (100k FPS)

https://pokeemerald.com/
288•tripplyons•16h ago•83 comments

Unicode Fonts and Tools for X11

https://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
22•kristianp•2d ago•7 comments

Ask HN: What was your "oh shit" moment with GenAI?

572•andrehacker•2d ago•963 comments

Show HN: Infinite canvas notes in the non-Euclidean Poincaré disk

https://uonr.github.io/poincake/
132•uonr•4d ago•23 comments

Motorola effectively bricked its entire line of WiFi routers without explanation

https://mashable.com/tech/motorola-wifi-routers-stop-working-motosync-plus-app-down
89•thisislife2•13h ago•35 comments

You Can Run

https://magazine.atavist.com/2026/mccann-cocaine-fugitives
110•bryanrasmussen•12h ago•60 comments

Computex 2026: Are We Heading for the Agentic PC Era Yet?

https://www.eetimes.com/computex-2026-are-we-heading-for-the-agentic-pc-era-yet/
28•rbanffy•7h ago•29 comments

The Russian who invented semiconductors 25 years before the USA

https://www.semidoped.com/p/til-the-man-who-invented-the-future
12•johncole•55m ago•3 comments

The new bibliomaniacs

https://engelsbergideas.com/notebook/the-new-bibliomaniacs/
72•RickJWagner•15h ago•66 comments

Benchmarks in Leipzig

https://arxiv.org/abs/2606.05818
124•root-parent•13h ago•44 comments

Pentagon raised threat of Israeli spying on U.S. to highest level, sources say

https://www.nbcnews.com/politics/national-security/pentagon-raised-threat-israeli-spying-us-highe...
468•MilnerRoute•9h ago•352 comments
Open in hackernews

Faster sorting with SIMD CUDA intrinsics (2024)

https://winwang.blog/posts/bitonic-sort/
92•winwang•1y ago
Code at https://github.com/wiwa/blog-code/

Comments

ashvardanian•1y ago
The article covers extremely important CUDA warp-level synchronization/exchange primitives, but it's not what is generally called SIMD in the CUDA land .

Most "CUDA SIMD" intrinsics are designed to process a 32-bit data pack containing 2x 16-bit or 4x 8-bit values (<https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/gro...>). That significantly shrinks their applicability in most domains outside of video and string processing. I've had pretty high hopes for DPX on Hopper (<https://developer.nvidia.com/blog/boosting-dynamic-programmi...>) instructions and started integrating them in StringZilla last year, but the gains aren't huge.

winwang•1y ago
Oh wow, TIL, thanks. I usually call stuff like that SWAR, and every now-and-then I try to think of a way to (fruitfully) use it. The "SIMD" in this case was just an allusion to warp-wide functions looking like how one might use SIMD in CPU code, as opposed to typical SIMT CUDA.

Also, StringZilla looks amazing -- I just became your 1000th Github follower :)

ashvardanian•1y ago
Thanks, appreciate the gesture :)

Traditional SWAR on GPUs is a fascinating topic. I've begun assembling a set of synthetic benchmarks to compare DP4A vs. DPX (<https://github.com/ashvardanian/less_slow.cpp/pull/35>), but it feels incomplete without SWAR. My working hypothesis is that 64-bit SWAR on properly aligned data could be very useful in GPGPU, though FMA/MIN/MAX operations in that PR might not be the clearest showcase of its strengths. Do you have a better example or use case in mind?

winwang•1y ago
I don't -- unfortunately not too well-versed in this field! But I was a bit fascinated with SWAR after I randomly thought of how to prefix-sum with int multiplication, later finding out that it is indeed an old trick as I suspected (I'm definitely not on this thread btw): https://mastodon.social/@dougall/109913251096277108

As for 64-bit... well, I mostly avoid using high-end GPUs, but I was of the impression that i64 is just simulated. In fact, I was thinking of using the full warp as a "pipeline" to implement u32 division (mostly as a joke), almost like anti-SWAR. There was some old-ish paper detailing arithmetic latencies in GPUs and division was approximately more than 32x multiplication (...or I could be misremembering).

bobmcnamara•1y ago
Parallel compares: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInW...
DennisL123•1y ago
Interesting stuff. Not sure if I read this right that it‘s 16 und 32 bit values of integers that get sorted. If yes, I‘d love to see if the GPU implementation can beat a competitive Radix sort implementation on a CPU.
winwang•1y ago
It's 32 32-bit values which get sorted. I don't think a GPU sort would beat a CPU sort at this scale, even if you don't take kernel launch time into account. CPUs are simply too fast for (super-)small data, especially with AVX-512. But if we're talking about a larger amount of data, that would be a different story, i.e. as part of a normal gpu mergesort.
maeln•1y ago
It is also useful if your data already lives on the GPU memory. For example, when you need to z-sort a bunch of particles in a 3d renderer particle system.
exDM69•1y ago
A 32 way GPU sorting algorithm might be just what I need for sorting and deduplicating triangle id's in a visibility buffer renderer I am working on.

Thanks for sharing.

winwang•1y ago
As someone who doesn't know very much about graphics (ironically), you're welcome and hope it helps!
fourseventy•1y ago
What are the biggest use cases of GPU accelerated sorting?