frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
576•klaussilveira•10h ago•167 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
888•xnx•16h ago•540 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
90•matheusalmeida•1d ago•20 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
18•helloplanets•4d ago•9 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
20•videotopia•3d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
197•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•11h ago•90 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
307•vecti•13h ago•136 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
352•aktau•17h ago•175 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
350•ostacke•17h ago•91 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
452•todsacerdoti•18h ago•228 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
20•romes•4d ago•2 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
78•quibono•4d ago•17 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
52•kmm•4d ago•3 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
253•eljojo•13h ago•152 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
388•lstoll•17h ago•263 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
5•bikenaga•3d ago•1 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
230•i5heu•13h ago•174 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
12•neogoose•3h ago•7 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
24•gmays•6h ago•5 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•10h ago•12 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
116•SerCe•7h ago•94 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
135•vmatsiiako•16h ago•59 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
268•surprisetalk•3d ago•36 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
42•gfortaine•8h ago•13 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
168•limoce•3d ago•87 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1039•cdrnsf•20h ago•431 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•18h ago•22 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
88•antves•1d ago•63 comments
Open in hackernews

Counting Words at SIMD Speed

https://healeycodes.com/counting-words-at-simd-speed
58•healeycodes•5mo ago

Comments

kragen•5mo ago
As it happens, splitting input text into words fast is one of the things I most want to do this week! But maybe that's because it's a distraction from benchmarking hash tables.
davidst•5mo ago
It's a wonderful problem for optimizing code. Michael Abrash hosted a performance contest for word counting back in... 1991? (If my memory serves.) The article and code can be found here:

There Ain’t No Such Thing as the Fastest Code: Lessons Learned in the Pursuit of the Ultimate Word Counter

Article: https://www.phatcode.net/res/224/files/html/ch16/16-01.html

Code: https://www.phatcode.net/res/224/files/html/ch16/16-05.html

kevmo314•5mo ago
I was able to get a little bit faster than the multithreaded version with a single thread using page-aligned reads and Grand Central Dispatch for asynchronous read operations: https://github.com/healeycodes/counting-words-at-simd-speed/...

This was a nice opportunity to learn about Grand Central Dispatch through AI for me as well. I knew about the page alignment and async read techniques but not on OSX.

Edit: actually you can get even faster with mmap https://github.com/healeycodes/counting-words-at-simd-speed/...

kwillets•5mo ago
I think you can just xor the whitespace mask with the shifted one.

Also, when counting 0xFF bytes from a boolean etc., sub the mask; 0xFF == -1.

re_spond•5mo ago
Would be interesting to see how far this is with exaloop's codon compiler.
diath•5mo ago
On semi-related note, it's worth noting that if you're trying to make a Python script run faster and don't have the know-how to re-write your program in C or how to write SIMD (if applicable), you can always try to run the script through pypy, merely replacing python3 with pypy3 in bench.sh, with no other changes, brings the runtime of the first program down from 104s to 9s on my machine:

    Benchmark 1: python3 0_mvp.py bench.txt
      Time (mean ± σ):     104.739 s ±  3.982 s    [User: 104.213 s, System: 0.158 s]
      Range (min … max):   100.303 s … 108.005 s    3 runs
    Benchmark 2: python3 1_c_regex.py bench.txt
      Time (mean ± σ):     14.777 s ±  0.017 s    [User: 14.563 s, System: 0.158 s]
      Range (min … max):   14.759 s … 14.791 s    3 runs
    Benchmark 1: pypy3 0_mvp.py bench.txt
      Time (mean ± σ):      9.381 s ±  0.204 s    [User: 9.110 s, System: 0.234 s]
      Range (min … max):    9.245 s …  9.616 s    3 runs
    Benchmark 2: pypy3 1_c_regex.py bench.txt
      Time (mean ± σ):      4.296 s ±  0.031 s    [User: 4.038 s, System: 0.236 s]
      Range (min … max):    4.262 s …  4.324 s    3 runs
dh2022•5mo ago
Thanks for the advice - I never heard of pypy. Are there any downsides to making puppy the default Python interpreter? Thanks!
diath•5mo ago
It's not universally supported by all packages, for instance C-based packages will not work, more info about it here: https://pypy.org/posts/2018/09/inside-cpyext-why-emulating-c...

With that being said, when it works, it works great but you have to evaluate whether it's suitable on a per-project/script basis.

healeycodes•5mo ago
Author here, one of the fastest improvements I've seen is @cloud11665's idea here: https://x.com/cloud11665/status/1955958965046595699
perihelions•5mo ago
I don't know ARM, but an alternate approach, if it's available, is to store the query constants as bitmasks in SIMD registers; and use the input bytes as indices into those constants, using a shuffle instruction. Two levels, to pull out a bit from a 256-bit mask: part of an input byte is used to index a byte (SIMD shuffle), and another part indices a bit within the byte (bit shifts).

Idea being, this is constant in the size of the query set.

ncruces•5mo ago
But that's slower for small query sizes.

This describes a few algorithms: http://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html

Both the alternative version by Geoff Langdale, and the special case for small sets, are substantially similar to the algorithms used in Hyperscan (truffle and shufti). https://github.com/intel/hyperscan

Having something hard coded for spaces can be much faster, especially since 5 of the 6 characters are a range: a wrap-around subtraction and an unsigned less-than does the first 5; an equality compare does the other.

ashvardanian•5mo ago
You can avoid hard-coding the whitespace symbols and have a generic byte-set search kernel via `vpshufb` AVX512BW-capable CPUs [1] or via `tbl` instructions on NEON-capable CPUs [2].

[1]: https://github.com/ashvardanian/StringZilla/blob/2f4b1386ca2...

[2]: https://github.com/ashvardanian/StringZilla/blob/2f4b1386ca2...

Sesse__•5mo ago
You don't need AVX512BW for shuffle, SSSE3 will do. (Of course, if you want wider registers, you'll need the newer versions such as AVX2 or AVX512, but they don't shuffle cross-lane.)
ncruces•5mo ago
> Perform a bitwise AND with 0xFB and check if the result equals 0x09. Both 0x0D & 0xFB = 0x09 and 0x09 & 0xFB = 0x09.

This explanation was a bit unsatisfying. This works because 0x09 and 0x0D differ by a single bit, and 0xFB masks that bit (and only that bit) out.

If they differed by more than one bit, the fact that they & the same would be necessary but not sufficient.

teo_zero•5mo ago
I'm wondering what the gain would be if we considered "white space" every char <= 0x20. I know it's changing the rules of the game, but who would want to count the words in a text full of control characters?