frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

My Favorite Bugs: Invalid Surrogate Pairs

https://george.mand.is/2026/05/my-favorite-bugs-invalid-surrogate-pairs/
30•meysamazad•2h ago

Comments

jonhohle•1h ago
Once I ran into this it became hard to treat strings “normally” in any situation or, alternatively, I’d force hard encoding requirements in the domain. Regardless, handling grapheme clusters properly is hard and easy to get wrong.

I recently ported a program from python to rust and the original author used string regexes. Input and output document encoding mattered but the characters that needed to be matched were always lower ASCII. The python program could have used binary regexes, but instead forced an input encoding (UTF-8) and made the user choose an output encoding. When the input comes from an unknown process or legacy data, however, you don’t always get the luxury of assuming the encoding. Switching to binary regexes and ignoring encoding altogether simplified logic, eliminated classes of errors, and made the program work in scenarios it couldn’t earlier. Getting rid of the last decoding/encoding code gave me so much relief, especially when all of the whacky encoding tests I had already written continued to work.

wupatz•53m ago
it's good to know about surrogate pairs in unicode. It was new to me too when being part of tracking down incomplete uniode flags in the (excellent) phanpy mastodon client.

Author went for Intl.Segmenter too: https://github.com/cheeaun/phanpy/issues/1491

georgemandis•21m ago
My recollection (that I didn't add to the story): I don't think Intl.Segmenter had great browser support then (2022). Even if it had it still wasn't a quick/obvious fix for our problem with where it was occurring in our stack. But I do remember looking at it then.
georgemandis•44m ago
Just noticed this is getting some traffic! It's a little buried in the post, but I made an interactive tool for exploring surrogate pairs as part of this:

- https://george.mand.is/invalid-surrogate-pairs/

I thought it was something that's easier to play with and feel than necessarily just read about.

skybrian•32m ago
Writing property tests on functions that work with strings is a good way to find lots of Unicode issues.
BobbyTables2•31m ago
Damn, I’ve never really had to deal with Unicode all that much.

Was already bad enough that instead of bytes, we have to worry about code points. Now even that isn’t enough?

It would have been expensive, but all characters should have been fixed size 64bit values.

agus4nas•15m ago
Great write-up. Do most modern languages handle invalid surrogates gracefully, or is it still a "good luck" situation depending on the runtime?

SANA-WM, a 2.6B open-source world model for 1-minute 720p video

https://nvlabs.github.io/Sana/WM/
119•mjgil•3h ago•52 comments

Accelerando (2005)

https://www.antipope.org/charlie/blog-static/fiction/accelerando/accelerando.html
119•eamag•4h ago•55 comments

Accelerate

https://github.com/AccelerateHS/accelerate
24•tosh•1h ago•3 comments

Δ-Mem: Efficient Online Memory for Large Language Models

https://arxiv.org/abs/2605.12357
129•44za12•6h ago•27 comments

Moving away from Tailwind, and learning to structure my CSS

https://jvns.ca/blog/2026/05/15/moving-away-from-tailwind--and-learning-to-structure-my-css-/
140•mpweiher•6h ago•70 comments

My Favorite Bugs: Invalid Surrogate Pairs

https://george.mand.is/2026/05/my-favorite-bugs-invalid-surrogate-pairs/
31•meysamazad•2h ago•9 comments

Greek Alphabet Cards

https://labs.randomquark.com/alphabet_cards/
33•ricochet11•3h ago•8 comments

How an Australian Teen Team Is Making Radio Astronomy Affordable for Schools

https://mag.openrockets.com/p/how-an-australian-teen-team-is-making-radio-astronomy-affordable-fo...
3•openrockets•29m ago•0 comments

Futhark by Example

https://futhark-lang.org/examples.html
74•tosh•5h ago•20 comments

Project Gutenberg – keeps getting better

https://www.gutenberg.org/
1055•JSeiko•23h ago•225 comments

DeepSeek-V4-Flash means LLM steering is interesting again

https://www.seangoedecke.com/steering-vectors/
15•Brajeshwar•43m ago•0 comments

After 8 years, I rewrote my open-source PyTorch curvature library

https://github.com/noahgolmant/pytorch-hessian-eigenthings
16•noahgolmant•2d ago•1 comments

Kyber (YC W23) Is Hiring a Founding Marketer

https://www.ycombinator.com/companies/kyber/jobs/1rLQAro-founding-marketer-content-community
1•asontha•3h ago

Points are a weird and inconsistent unit of measure

https://buttondown.com/hillelwayne/archive/points-are-a-weird-and-inconsistent-unit-of/
37•danborn26•2d ago•21 comments

Nearly 50 Years Later, WKRP in Cincinnati Becomes a Real Radio Station

https://www.openculture.com/2026/05/nearly-50-years-later-wkrp-in-cincinnati-becomes-a-real-radio...
55•bookofjoe•3d ago•28 comments

I believe there are entire companies right now under AI psychosis

https://twitter.com/mitchellh/status/2055380239711457578
1598•reasonableklout•19h ago•824 comments

Ploopy Bean: a trackpoint for every computer

https://ploopy.co/shop/bean-pointing-stick/
141•jibcage•3d ago•61 comments

Gaining control of every projector and camera on campus

https://www.edna.land/blogs/posts/scanning/
77•ednaordinary•2d ago•23 comments

Tesla reveals two Robotaxi crashes involving teleoperators

https://techcrunch.com/2026/05/15/tesla-reveals-two-robotaxi-crashes-involving-teleoperators/
6•Brajeshwar•19m ago•0 comments

The bird eye was pushed to an evolutionary extreme

https://www.quantamagazine.org/how-the-bird-eye-was-pushed-to-an-evolutionary-extreme-20260513/
173•sohkamyung•2d ago•61 comments

Frontier AI has broken the open CTF format

https://kabir.au/blog/the-ctf-scene-is-dead
244•frays•8h ago•214 comments

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

https://github.com/chiennv2000/orthrus
160•FranckDernoncou•17h ago•25 comments

Fecal transplants for autism deliver success in clinical trials

https://refractor.io/adhd-autism/fecal-transplants-for-autism-delivers-success-in-clinical-trials/
161•breve•6h ago•116 comments

What Were Ancient Greco-Roman Curse Tablets?

https://www.history.com/articles/what-were-ancient-roman-curse-tablets
3•speckx•3d ago•0 comments

The Physics–and Physicality–Of Extreme Juggling (2018)

https://www.wired.com/story/the-physicsand-physicalityof-extreme-juggling/
14•ColinWright•3d ago•2 comments

Where to buy a non-Apple, non-Google smartphone

https://www.theregister.com/on-prem/2026/05/01/where-to-buy-a-non-apple-non-google-smartphone/521...
143•_____k•7h ago•88 comments

The main thing about P2P meth is that there's so much of it (2021)

https://dynomight.net/p2p-meth/
161•tomjakubowski•16h ago•188 comments

A 0-click exploit chain for the Pixel 10

https://projectzero.google/2026/05/pixel-10-exploit.html
407•happyhardcore•1d ago•221 comments

The sigmoids won't save you

https://www.astralcodexten.com/p/the-sigmoids-wont-save-you
250•Tomte•1d ago•238 comments

Naturally Occurring Quasicrystals

https://johncarlosbaez.wordpress.com/2026/05/14/naturally-occurring-quasicrystals/
118•lukeplato•2d ago•10 comments